Archive

Tag Archives: security

One of the continuing issues that I see is accountability (who did what on what server) on Linux or other operating systems that are using ssh.

The traditional solution for this problem is personal accounts, which provides logon and logoff registration.
Personal accounts often do have the least amount of rights which means typical administration tasks can not be done with them, in order to actually do something you need to sudo to an application account or root, which is logged by sudo, on enterprise linuxes to /var/log/secure. So some important facts are logged (logon/logoff/switching users), but a lot of important things are not logged at all, or maybe can be indirectly derived by combining data of operating system audit together with application specific logging.

With cloud, everything gotten a lot more dynamic, and it’s really common for me to see client cloud environments where there aren’t personal accounts at all, everybody just uses the common cloud user (ec2-user, opc, etc.). Needless to say, this is an error and an issue if you want to build a professional setup. Maybe it’s me and I am unlucky, but I have a sneaking suspicion this is actually quite common.

First things first: if you are serious about security and want to actually track what is happening, you MUST have central logging. Otherwise, if a user can switch to root, it can stop the logging and overwrite or change the evidence. Of course with central logging when a user can switch to root, the logging can still be stopped, but then at least you should have proof who did that, because the logging on another machine can not be changed. If not, you might as well not take any action at all, because you can’t be sure what you see is true.

Personal accounts: I think in almost any situation that is even a tiny bit dynamic, creating and removing users locally on each server is not a sustainable practice; even if you can get it up to date, you need to continuously maintain it. Some cloud and some on-premises situations have LDAP. By centralising the user administration, user management becomes manageable. However, the amount of audit information you get from it is limited (the aforementioned logon, logoff and switch user administration). It allows you to grant access to an individual server per user. One downside is that you need to add operating system packages to provide the LDAP authentication functionality, and obviously you explicitly need to configure each server to authenticate users with an LDAP server.

There is another way, which is way less intrusive, which is using native secure shell daemon functionality (so no extra installs): that is authenticate CA (certificate authority) signed user keys. To use this, you need an additional key pair, which serves as the CA keys, put the CA public key on each server and point the secure shell daemon to this key being the CA key. This serves as an additional authentication, so all existing secure shell daemon functionality, like username/password authentication and public key authentication using the authorized_keys file is still valid. This authentication mechanism is also self contained: there is no communication needed to an external authentication service (like an LDAP server); it uses a key that is placed on the server. The downside is that it’s an all-or-nothing setting, which means that you get access to all hosts which have a certain CA key set, you can’t disable a certain host for a certain user.

The authentication of with using CA key based authentication is based on the user’s signed key instead of named account. This is a huge difference from using LDAP/named accounts, where the authentication is done based on the actual account name. If you let users sign their key with the CA key theirselves, you need to keep track of the identity of who signed the key, because that declares the actual identity, not the account that logged on to a server.

This functionality is provided by native ssh key management software. If you want to use authentication of users based on CA signed keys, you need to manually securely store and protect CA private key, and keep an impeccable registration of who signed which key, because that declares the actual user identity. For the user there isn’t a lot of difference, its private key is still the ultimate declaration of its true identity (as you will see, this mechanism requires a user to specify its public and private keys).

Luckily, there is a tool that provides the key signing functionality together with auditing (the registration), and secure authentication of a user: Hashicorp vault.

It’s my advise to let users use certificates for authentication to vault rather than passwords, additional to their ssh key pair. Probably you can write an entire blogpost about it, but passwords are cumbersome.

1. User side initial setup.
A user needs two key pairs, one for authenticating to vault using a pem encoded pair:

$ openssl req -newkey rsa:2048 -new -nodes -x509 -days 365 -keyout fritshoogland_key.pem -out fritshoogland_pub.pem

This will ask some questions, like country, state, locality, etc.
It’s important that a user fills out its true name with ‘common name’ and password with ‘password’.
As an administrator you can check the contents of the public key with ‘openssl x509 -in PUBKEY.pem -text’.

The other one is the regular ssh keypair:

$ ss-keygen -t rsa

It will ask for a password, and save the files in ~/.ssh, ‘id_rsa’ for the private key, and ‘id_rsa.pub’ for the public key.

It is important to realise that the ultimate identity is dependent on the private keys of both key pairs, which should be kept securely by the user and never handed over anyone else. This setup and any normal usage ever only needs the public keys to be provided publicly, which is the function of a public key, hence the name.

2. vault side initial setup.
There are two mechanisms that are going to be used inside vault, which is an authentication mechanism based on certificates, and a secrets service for ssh.
The certificate authentication facility needs to be setup using a vault root/admin account. The first thing is to setup the certificate authentication method:

$ vault auth enable cert

It only needs to be done once for a given path, and the authentication mechanism can optionally be mounted at a different path by adding ‘-path=’ if you want to use separate the certificates from others, because this allows the same authentication mechanism to be used more than once.

The next thing that needs to be setup is the ssh secrets service (this too can be mounted at a different path by adding ‘-path’):

$ vault secrets enable ssh

Now we can extract the CA public key from this setup, which is used as the CA public key on the servers to validate the user keys signed with it:

$ vault write ssh/config/ca > trusted-user-ca-key.pem

The next thing we need is a signing role that will be used to sign a user key with the CA. This allows you to specify any properties you want the user CA signed key to hold, like ttl (time to live) and allowed_users:

echo "{
  \"allow_user_certificates\": true,
  \"allowed_users\": \"vagrant\",
  \"default_extensions\": [
    {
      \"permit-pty\": \"\"
    }
  ],
  \"key_type\": \"ca\",
  \"default_user\": \"vagrant\",
  \"ttl\": \"24h\"
}" | vault write ssh-client/roles/sign-role -

There are a couple of limitations set here:
Line 3: this specifies that only a user named ‘vagrant’ is allowed (which is the default user in my lab, alike ec2-user, opc, etc.).
Line 6: permit-pty is needed to let the session get access to the terminal.
Line 11: the TTL, time to live, of the generated signed key is 24 hours. This means every day a new signed key needs to be obtained. Because a user should be able to do this itself, this doesn’t generate an administrative burden. This does allow you to disable access for any user using this signed keys authentication method in a day.

The last thing in this step is to create a policy that allows access to the sign role. This policy must be granted to a user (certificate) so it can actually perform the CA key signing. In order to deny access, you can simply remove this policy from a certificate, which then disables the ability to perform the CA key signing.

echo "path \"ssh/sign/sign-role\" {
  capabilities = [\"create\", \"update\"]
} " | vault policy write policy_ssh-client -

3. ssh daemon setup to allow CA signed key authentication
The changes to the servers that must allow CA signed authentication are really modest.

First of all, the CA key that is obtained in step 2 must be transferred to the server. Currently I am doing this in the /etc/ssh directory, where some other keys are stored too, together with the other ssh settings.

The second thing that needs to be done, is the ssh daemon configuration file, /etc/ssh/sshd_config must be changed to include the setting TrustedUserCAKeys, which must be set to trusted-user-ca-key.pem file:

TrustedUserCAKeys trusted-user-ca-key.pem

After this change, the ssh daemon must be restarted to pick up the configuration change (systemctl restart ssh). This should not interrupt current sessions.

4. Enable a user by uploading its key into vault and assign the ssh policy
The next step is to have a user that must be provided access upload it’s certificate public key in vault and bound to the policy . This of course must be done by the vault administrator. This task does two things at the same time: a) upload the key (certificate=) and b) attach the policy policy_ssh-client (policies=):

$ vault write auth/cert/certs/frits.hoogland display_name="Frits Hoogland" certificate=@fritshoogland_pub.pem policies="policy_ssh-client"

Please mind only the public key is needed.
Also mind that at this point, no access is possible for the user at this point, vault has been configured to allow the user to be authenticated by the key pair for which the public key fritshoogland_pub.pem is uploaded, but no user public key is signed yet.

5. Let the user sign its key with the CA key using vault
In order to let a user sign its key, it can use its public and private certificate pair for authentication, and let vault sign the public key:

$ vault write -field=signed_key ssh/sign/sign-role public_key=@$HOME/.ssh/id_rsa.pub -client-cert=fritshoogland_pub.pem -client-key=fritshoogland_key.pem > ~/.ssh/signed-cert.pub

Now a signed key has been created, it can be used to log on to a server that has the CA key that signed this key set. To do this, simply specify ‘-i’ (for identity) a second time (after the private key):

$ ssh -i ~/.ssh/id_rsa -i ~/.ssh/signed-cert.pub vagrant@192.168.66.51
Last login: Sun Oct 13 13:09:03 2019 from 192.168.66.50

As you can see, the amount of work for a user needing access using this mechanism is really modest, it’s a single command to obtain/refresh the signed key. With signed keys that last 24 hours, you need to obtain a new signed key every day.

The audit log of vault will tell which certificate authenticated and shows the public ssh key that is signed, and the response, which contains the signed key and the serial of the signed key (use the ‘jq’ executable to format the json audit log; cat /var/log/vault/audit.log | jq):

{
  "time": "2019-10-13T13:05:36.524628543Z",
  "type": "response",
  "auth": {
    "client_token": "s.yr6lGhRxwK8ySc4cgEC6HBIi",
    "accessor": "w4Q5bVyabJ9c3hfgeG1gdyuS",
    "display_name": "cert-Test User",
    "policies": [
      "default",
      "policy_ssh-client"
    ],
    "token_policies": [
      "default",
      "policy_ssh-client"
    ],
    "metadata": {
      "authority_key_id": "f5:02:c8:54:6b:bd:36:66:1f:55:d2:4d:60:a8:0c:d0:19:32:e0:bb",
      "cert_name": "test_user",
      "common_name": "testuser",
      "serial_number": "12954687727334453172",
      "subject_key_id": "f5:02:c8:54:6b:bd:36:66:1f:55:d2:4d:60:a8:0c:d0:19:32:e0:bb"
    },
    "entity_id": "64cd1dd1-f94e-6370-8f8d-bc9ae68babf3",
    "token_type": "service"
  },
  "request": {
    "id": "c28e810a-8a52-1760-b346-1c4bd44e3800",
    "operation": "update",
    "client_token": "s.yr6lGhRxwK8ySc4cgEC6HBIi",
    "client_token_accessor": "w4Q5bVyabJ9c3hfgeG1gdyuS",
    "namespace": {
      "id": "root"
    },
    "path": "ssh-client/sign/sign-role",
    "data": {
      "-client-cert": "test_pub.pem",
      "-client-key": "test_key.pem",
      "public_key": "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCue3QBya7lZLt2JxOwMTPQQF0NrV/ahCNXr/NX0iFkC6PDtSQZ00YN34JXviR8IL4NHuvW/MMXGJFMHk9Y6sXgY6coIkhan/DVhJlt+fSzUEdUXAWygR4Nsq8Bmk8a3YEU5lBjOfdHGHLM42lG3FmZpTdNDLMGaLkAvFjsYqklsT4mEfxjkBHeY3fbt5zKoPkNnLS3m/O4oFO0uwT6Qo8CjlN6lFibiFNpUC2t+2b0knfrFZn0Jc5u+4JdJoFMvh8xGckCL85r2hFSS85ENHEwLq3kKMb2V7AOW06RaneFC5LDp93q31aUWi8nc2xtKMTQDzf/qUcpgB+xhKW0Bejl test@localhost.localdomain\n"
    },
    "remote_address": "127.0.0.1"
  },
  "response": {
    "data": {
      "serial_number": "f11b7664d9c1ed4b",
      "signed_key": "ssh-rsa-cert-v01@openssh.com AAAAHHNzaC1yc2EtY2VydC12MDFAb3BlbnNzaC5jb20AAAAgmqm82e96fHybofd18OK9X42OSX9Y8sBReFxNtU0gSdoAAAADAQABAAABAQCue3QBya7lZLt2JxOwMTPQQF0NrV/ahCNXr/NX0iFkC6PDtSQZ00YN34JXviR8IL4NHuvW/MMXGJFMHk9Y6sXgY6coIkhan/DVhJlt+fSzUEdUXAWygR4Nsq8Bmk8a3YEU5lBjOfdHGHLM42lG3FmZpTdNDLMGaLkAvFjsYqklsT4mEfxjkBHeY3fbt5zKoPkNnLS3m/O4oFO0uwT6Qo8CjlN6lFibiFNpUC2t+2b0knfrFZn0Jc5u+4JdJoFMvh8xGckCL85r2hFSS85ENHEwLq3kKMb2V7AOW06RaneFC5LDp93q31aUWi8nc2xtKMTQDzf/qUcpgB+xhKW0Bejl8Rt2ZNnB7UsAAAABAAAAVXZhdWx0LWNlcnQtVGVzdCBVc2VyLWFlYTkwNGFiOTE3YjNlZGM1MzJjMDBhNDU5NDc1NDFmYTE5NmIwNjA5ZjJkZDdmNmY1MDI4YWJhZTBmODgwMWQAAAALAAAAB3ZhZ3JhbnQAAAAAXaMhAgAAAABdpHKgAAAAAAAAABIAAAAKcGVybWl0LXB0eQAAAAAAAAAAAAACFwAAAAdzc2gtcnNhAAAAAwEAAQAAAgEArPomMYoG/HajnbzfLVdFOjGP64lXS1+wdnG97C0glHHyvP7E8kcK8Iqt7PbCTY7hbpajF2Z/PTqAgp2DNtdvEftD4HKxUF7Qpa40fToBcj0SVcA/Ht4qfJ1dH8fOdnOCOL9/wUgUmOQlprwYbvvPkjX9Rg4kYkoxkBrT1ZLg5+0QTmly/44ZVphrsqyXk3vOydcnyK8MTd5IVZ0hLNTNx/cDBeCnLwBkg2cs1us5b6uRqbUchqjNP61eHyPCEJykhsFpRSFCdqVHU9gkynj00/6JWuNRBtP0DFiJPJ8oUI9BLUBeBQn5jEgfhh8obnZ2ih/M7LOHF1cYggAVSgtG+XQ+jOzYr39pvLbABYncXCATSW2M62F6bnFFMcCixx5vBwvhAXMiOJpENjmfmmaCpa17t4SmSs284taNmPa5Upq17zyy/QBofeCrz35qBuGfAlO7G9jGP+/tkTOv2lbMw+BJGRobaH/1uypkX7NpuG8rEDdht5xm9pr1Xwdb1iD632nKEkLDtOjrH2X/PR9k9EWPbBhF3HtAPzv/esKYVa7Cm6heLEVPRn5ZsBXJA2+4j3Kq2cwDVp5DFpEttAf4trMx3S4AP9rHwS02Y5zWdRT16HdYAwjfpY0+m0kAdNdKwStdxTTG4wgzjYN8dZSOZP2UIe1LA5bFm1HbBieCjysAAAIPAAAAB3NzaC1yc2EAAAIAqrEuTPFL9ULr/46Qx7kPCY292yxgqilLqlcqEmJ8fwFiukRk/w4wk+qgnLLAc72aPD4tUqjpw0xB3QaCmdh0YK6TAjQ+RC8hXYKz6OHZMlbFBSXRwa7poKWSs4YVu+M+WtQ9oibYpHrZcEj5N3mO+XHP/mbrEa/jZi9yvqwI4bJ4OY1ktH6bihLd7q79pejWq5c1+1ppswSyO2tcyJShGYb8V/UPmIRgqo2OvMjnrgTtF7VnPZHh+H/kSlB+6PiTgdQQDVSf72cUi9hGcgXCas71bAFeamq/fvoeB2dKfl7ZrhjGKE5Mx90G8gVGulz8a2kbMOgP3bjNvKlc6DfKiJuHbpfxyNn/9P/cvYFYONMFxup3H/3/rDVs4a4M+Qp9nDmalGe+4muwdlLdt6Y/dkG3WAbJvKvVPpXjFca234Y2gSAv3lJVqURHbaxkE1fus3gCmNtjRNcHA/rGQW/vnEvVjXLfRBHRyAdT+TY38iewG1tk7iEx6EKx77PgFtgMO01pQLYe94VfG2ynuBOlUfIDms/gm6jwVfo/PUR1hP/Q5vTMCNBt1RwgEwa0EWI7LhNApVE66FXDAJ6a4aUvulXc8KdWODytJzaHMhM5mpn88xNFeH7SK0aeEs4C4Fu4XQVrobm8eE0Xz9K7faRXCpdNtrtRvh8joU0H+GfVsHE=\n"
    }
  }
}

The important part here to link information in /var/log/secure to signed certificate is the serial.
In this case, the serial is f11b7664d9c1ed4b. You need to convert that hexadecimal number to decimal:

echo "ibase=16; F11B7664D9C1ED4B" | bc
17373610163033992523

(please mind I needed to install bc on Centos7, and the hexadecimal number must be specified in uppercase)

Now if we move over to the server where this key was used, we can simply search for ‘17373610163033992523’ in the /var/log/messages file:

# grep 17373610163033992523 /var/log/secure
Oct 13 13:09:03 localhost sshd[4959]: Accepted publickey for vagrant from 192.168.66.50 port 50936 ssh2: RSA-CERT ID vault-cert-Test User-aea904ab917b3edc532c00a45947541fa196b0609f2dd7f6f5028abae0f8801d (serial 17373610163033992523) CA RSA SHA256:Jcv3wpnbWWRlHDCRNqm6jfhB9qKnvCByBRIR4wr7CLI
Oct 13 13:09:09 localhost sshd[4986]: Accepted publickey for vagrant from 192.168.66.50 port 50938 ssh2: RSA-CERT ID vault-cert-Test User-aea904ab917b3edc532c00a45947541fa196b0609f2dd7f6f5028abae0f8801d (serial 17373610163033992523) CA RSA SHA256:Jcv3wpnbWWRlHDCRNqm6jfhB9qKnvCByBRIR4wr7CLI

The second session (at 13:09:09) used sshd process 4986. We can use the linux auditing facility to display all use of that process id:

# ausearch -p 4959 -i
type=USER_AUTH msg=audit(10/13/2019 13:09:09.152:905) : pid=4986 uid=root auid=unset ses=unset subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=key algo=ssh-rsa-cert-v01@openssh.com size=2048 fp=none rport=50938 acct=vagrant exe=/usr/sbin/sshd hostname=? addr=192.168.66.50 terminal=? res=success'
----
type=USER_ACCT msg=audit(10/13/2019 13:09:09.154:906) : pid=4986 uid=root auid=unset ses=unset subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=PAM:accounting grantors=pam_unix,pam_localuser acct=vagrant exe=/usr/sbin/sshd hostname=192.168.66.50 addr=192.168.66.50 terminal=ssh res=success'
----
type=CRYPTO_KEY_USER msg=audit(10/13/2019 13:09:09.160:907) : pid=4986 uid=root auid=unset ses=unset subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=session fp=? direction=both spid=4987 suid=sshd rport=50938 laddr=192.168.66.51 lport=22  exe=/usr/sbin/sshd hostname=? addr=192.168.66.50 terminal=? res=success'
----
type=USER_AUTH msg=audit(10/13/2019 13:09:09.160:908) : pid=4986 uid=root auid=unset ses=unset subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=success acct=vagrant exe=/usr/sbin/sshd hostname=? addr=192.168.66.50 terminal=ssh res=success'
----
type=CRED_ACQ msg=audit(10/13/2019 13:09:09.160:909) : pid=4986 uid=root auid=unset ses=unset subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=PAM:setcred grantors=pam_env,pam_unix acct=vagrant exe=/usr/sbin/sshd hostname=192.168.66.50 addr=192.168.66.50 terminal=ssh res=success'
----
type=LOGIN msg=audit(10/13/2019 13:09:09.160:910) : pid=4986 uid=root subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 old-auid=unset auid=vagrant tty=(none) old-ses=4294967295 ses=29 res=yes
----
type=USER_ROLE_CHANGE msg=audit(10/13/2019 13:09:09.276:911) : pid=4986 uid=root auid=vagrant ses=29 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='pam: default-context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 selected-context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 exe=/usr/sbin/sshd hostname=192.168.66.50 addr=192.168.66.50 terminal=ssh res=success'
----
type=USER_START msg=audit(10/13/2019 13:09:09.296:912) : pid=4986 uid=root auid=vagrant ses=29 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=PAM:session_open grantors=pam_selinux,pam_loginuid,pam_selinux,pam_namespace,pam_keyinit,pam_keyinit,pam_limits,pam_systemd,pam_unix,pam_lastlog acct=vagrant exe=/usr/sbin/sshd hostname=192.168.66.50 addr=192.168.66.50 terminal=ssh res=success'
----
type=USER_LOGIN msg=audit(10/13/2019 13:09:09.348:917) : pid=4986 uid=root auid=vagrant ses=29 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=login id=vagrant exe=/usr/sbin/sshd hostname=192.168.66.50 addr=192.168.66.50 terminal=/dev/pts/1 res=success'

What you see here is at first the process of the secure shell daemon working with the authentication of the process, before it’s actually providing shell access. This is noticeable by the session-id, ses, which unset. Once it has authenticated, it is given a true audit session number, which is 29. Now we can look at everything this session id, including changing user, using this session id:

# ausearch --session 29 -i
----
type=LOGIN msg=audit(10/13/2019 13:09:09.160:910) : pid=4986 uid=root subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 old-auid=unset auid=vagrant tty=(none) old-ses=4294967295 ses=29 res=yes
----
type=USER_ROLE_CHANGE msg=audit(10/13/2019 13:09:09.276:911) : pid=4986 uid=root auid=vagrant ses=29 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='pam: default-context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 selected-context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 exe=/usr/sbin/sshd hostname=192.168.66.50 addr=192.168.66.50 terminal=ssh res=success'
----
type=USER_START msg=audit(10/13/2019 13:09:09.296:912) : pid=4986 uid=root auid=vagrant ses=29 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=PAM:session_open grantors=pam_selinux,pam_loginuid,pam_selinux,pam_namespace,pam_keyinit,pam_keyinit,pam_limits,pam_systemd,pam_unix,pam_lastlog acct=vagrant exe=/usr/sbin/sshd hostname=192.168.66.50 addr=192.168.66.50 terminal=ssh res=success'
----
type=CRYPTO_KEY_USER msg=audit(10/13/2019 13:09:09.296:913) : pid=4989 uid=root auid=vagrant ses=29 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:22:dd:a8:71:eb:4d:44:5f:61:6a:4e:eb:55:9b:b5:f1:3c:bb:44:d2:3f:56:9d:a5:f8:3a:74:69:e4:02:4b:01 direction=? spid=4989 suid=root  exe=/usr/sbin/sshd hostname=? addr=? terminal=? res=success'
----
type=CRYPTO_KEY_USER msg=audit(10/13/2019 13:09:09.296:914) : pid=4989 uid=root auid=vagrant ses=29 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:77:fc:eb:26:0c:37:76:9d:b6:89:98:ca:5a:25:ad:d2:b9:c0:0b:01:4f:fb:e1:0d:a8:b8:45:41:56:68:ee:49 direction=? spid=4989 suid=root  exe=/usr/sbin/sshd hostname=? addr=? terminal=? res=success'
----
type=CRYPTO_KEY_USER msg=audit(10/13/2019 13:09:09.296:915) : pid=4989 uid=root auid=vagrant ses=29 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:9e:58:bc:ae:c9:68:4c:91:cc:a9:65:0a:a2:cd:e5:a8:62:40:14:22:79:80:52:da:0f:cd:78:87:f1:6c:d6:7f direction=? spid=4989 suid=root  exe=/usr/sbin/sshd hostname=? addr=? terminal=? res=success'
----
type=CRED_ACQ msg=audit(10/13/2019 13:09:09.296:916) : pid=4989 uid=root auid=vagrant ses=29 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=PAM:setcred grantors=pam_env,pam_unix acct=vagrant exe=/usr/sbin/sshd hostname=192.168.66.50 addr=192.168.66.50 terminal=ssh res=success'
----
type=USER_LOGIN msg=audit(10/13/2019 13:09:09.348:917) : pid=4986 uid=root auid=vagrant ses=29 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=login id=vagrant exe=/usr/sbin/sshd hostname=192.168.66.50 addr=192.168.66.50 terminal=/dev/pts/1 res=success'
----
type=USER_START msg=audit(10/13/2019 13:09:09.348:918) : pid=4986 uid=root auid=vagrant ses=29 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=login id=vagrant exe=/usr/sbin/sshd hostname=192.168.66.50 addr=192.168.66.50 terminal=/dev/pts/1 res=success'
----
type=CRYPTO_KEY_USER msg=audit(10/13/2019 13:09:09.355:919) : pid=4986 uid=root auid=vagrant ses=29 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:9e:58:bc:ae:c9:68:4c:91:cc:a9:65:0a:a2:cd:e5:a8:62:40:14:22:79:80:52:da:0f:cd:78:87:f1:6c:d6:7f direction=? spid=4990 suid=vagrant  exe=/usr/sbin/sshd hostname=? addr=? terminal=? res=success'
----
type=USER_END msg=audit(10/13/2019 13:58:38.973:922) : pid=4986 uid=root auid=vagrant ses=29 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=login id=vagrant exe=/usr/sbin/sshd hostname=? addr=? terminal=/dev/pts/1 res=success'
----
type=USER_LOGOUT msg=audit(10/13/2019 13:58:38.973:923) : pid=4986 uid=root auid=vagrant ses=29 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=login id=vagrant exe=/usr/sbin/sshd hostname=? addr=? terminal=/dev/pts/1 res=success'
----
type=CRYPTO_KEY_USER msg=audit(10/13/2019 13:58:38.982:924) : pid=4986 uid=root auid=vagrant ses=29 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:9e:58:bc:ae:c9:68:4c:91:cc:a9:65:0a:a2:cd:e5:a8:62:40:14:22:79:80:52:da:0f:cd:78:87:f1:6c:d6:7f direction=? spid=4989 suid=vagrant  exe=/usr/sbin/sshd hostname=? addr=? terminal=? res=success'
----
type=CRYPTO_KEY_USER msg=audit(10/13/2019 13:58:38.983:925) : pid=4986 uid=root auid=vagrant ses=29 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=session fp=? direction=both spid=4989 suid=vagrant rport=50938 laddr=192.168.66.51 lport=22  exe=/usr/sbin/sshd hostname=? addr=192.168.66.50 terminal=? res=success'
----
type=USER_END msg=audit(10/13/2019 13:58:38.992:926) : pid=4986 uid=root auid=vagrant ses=29 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=PAM:session_close grantors=pam_selinux,pam_loginuid,pam_selinux,pam_namespace,pam_keyinit,pam_keyinit,pam_limits,pam_systemd,pam_unix,pam_lastlog acct=vagrant exe=/usr/sbin/sshd hostname=192.168.66.50 addr=192.168.66.50 terminal=ssh res=success'
----
type=CRED_DISP msg=audit(10/13/2019 13:58:38.992:927) : pid=4986 uid=root auid=vagrant ses=29 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=PAM:setcred grantors=pam_env,pam_unix acct=vagrant exe=/usr/sbin/sshd hostname=192.168.66.50 addr=192.168.66.50 terminal=ssh res=success'
----
type=CRYPTO_KEY_USER msg=audit(10/13/2019 13:58:38.992:928) : pid=4986 uid=root auid=vagrant ses=29 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:22:dd:a8:71:eb:4d:44:5f:61:6a:4e:eb:55:9b:b5:f1:3c:bb:44:d2:3f:56:9d:a5:f8:3a:74:69:e4:02:4b:01 direction=? spid=4986 suid=root  exe=/usr/sbin/sshd hostname=? addr=? terminal=? res=success'
----
type=CRYPTO_KEY_USER msg=audit(10/13/2019 13:58:38.992:929) : pid=4986 uid=root auid=vagrant ses=29 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:77:fc:eb:26:0c:37:76:9d:b6:89:98:ca:5a:25:ad:d2:b9:c0:0b:01:4f:fb:e1:0d:a8:b8:45:41:56:68:ee:49 direction=? spid=4986 suid=root  exe=/usr/sbin/sshd hostname=? addr=? terminal=? res=success'
----
type=CRYPTO_KEY_USER msg=audit(10/13/2019 13:58:38.992:930) : pid=4986 uid=root auid=vagrant ses=29 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:9e:58:bc:ae:c9:68:4c:91:cc:a9:65:0a:a2:cd:e5:a8:62:40:14:22:79:80:52:da:0f:cd:78:87:f1:6c:d6:7f direction=? spid=4986 suid=root  exe=/usr/sbin/sshd hostname=? addr=? terminal=? res=success'

This shows the information that the linux audit facility logged for this session.

Conclusion
This is a solution for accountability in dynamic environments, where traditional solutions like LDAP would lesser easy fit in.
It works by using native secure shell daemon functionality, which is using an additional certificate that is used as “authority”, hence the name “certificate authority” (CA), which is set in the secure shell daemon as CA public certificate.
The essence is that an additional key is produced using the CA key, which is called a signed key. This key is linked with the CA public key, and therefore is authenticated by it. A single person should be the only owner of this signed key, because the signed key is what is what determines the identity.

Vault acts as a server where the signing and the administration of the signing is audited and automated so users can self-service generating CA signed keys, the only thing an administrator has to do is upload a certificate and grant the ssh policy, and remove the policy from a certificate of a person that should not get access anymore.

Because authentication and ssh functionality can be mounted multiple times in vault, you can create multiple groups of certificates which can use multiple CAs (using multiple mounts of the ssh functionality).

There is more functionality that can be set, allowed, disallowed etc. with CA signed keys.

Advertisements

This post is about using using hashicorp vault and ansible.

Everyone that has used ansible knows you sometimes can’t get around storing secrets (passwords mostly) in an ansible playbook because for example an installer requires them. Or even simpler, because authentication must be done via a username and password.

The ansible embedded solution is to use ansible vault. To me, ansible vault is a solution to the problem of storing plain secrets in an ansible playbook by obfuscating them. However, these secrets are static, and still require the actual decryption key on runtime. In a lot of cases, it is delivered by putting the password in a file.

This is where hashicorp vault comes in. Vault is a standalone server for authentication and storing secrets. Using vault as a server, you can request information on runtime from the playbook, so that information is stored and maintained totally outside and independent from the ansible playbook.

In order to setup vault, I created a playbook to do that on Centos 7: https://gitlab.com/FritsHoogland/hashicorp_vault/blob/master/install_vault.yml

In order to use ansible with vault, a plugin (lookup plugin ‘hashi_vault’) can be used, however it has some python dependencies which must be resolved first, for which I created a playbook for Centos 7 too: https://gitlab.com/FritsHoogland/hashicorp_vault/blob/master/install_hashi_vault_plugin.yml

For the sake of testing, I assume this is installed on the same server. Of course in a true deployment situation, you don’t want to have anything else running on the vault server than vault, in order to keep potential attacks as far from the credentials away as possible.

After installation the vault server is “unsealed”, which means “usable”. However, it will be sealed after any stop and start, which means the server is not usable. You have to provide an “unseal token” in order for the server to be able to provide secrets. The default (production) installation provides 5 unseal tokens, and a minimum of 3 tokens necessary to unseal the vault. This installation is done using 1 unseal token and 1 that is needed to unseal vault.

At this point, the vault is empty (it contains no secrets) and there is a root token (which does not expire) to access the vault in root (superuser) mode.

Both the unseal token (unseal_key_1.txt) and the root token (root_token.txt) are left at the filesystem after the installation. Obviously, in a real deployment you don’t want these there. But for the sake of a proof-of-concept setup, I stored them on the filesystem. I also created a file that can be used to set some environment variables which are needed for the ‘vault’ commandline executable, and a script that can be used to set the root token:

$ . ./vault.env
$ . ./set_root_token.sh

The next thing to do is enable an authentication method, username and password, to use, and set a username and password:

$ vault auth enable userpass
$ vault write auth/userpass/users/test_read_user password=mypass

Next up, enable key-value store version 1 (‘kv’) and store dummy secrets:

$ vault secrets enable kv
$ vault kv put kv/test/demo bar=foo pong=ping

What is needed additionally, is something that defines the rights which ‘test_read_user’ must have on it. This is done using a policy (file policy_test_read_kv.hcl):

path "kv/test/demo" {
   capabilities = [ "list", "read" ]
}

This can be loaded as a policy in vault using:

$ vault policy write test_read_kv policy_test_read_kv.hcl

And then write this as a policy for test_read_user:

$ vault write auth/userpass/users/test_read_user policies="test_read_kv"

Now we can first test if this works on the CLI:

$ unset VAULT_TOKEN
$ vault login -method=userpass username=test_read_user
Password (will be hidden):
Success! You are now authenticated. The token information displayed below
is already stored in the token helper. You do NOT need to run "vault login"
again. Future Vault requests will automatically use this token.

Key                    Value
---                    -----
token                  s.OHNC9AFjnMC824pvjNPZ5aZ6
token_accessor         5AG7c00IPmqLofpwocp9yhHc
token_duration         768h
token_renewable        true
token_policies         ["default" "test_read_kv"]
identity_policies      []
policies               ["default" "test_read_kv"]
token_meta_username    test_read_user
$ export VAULT_TOKEN=s.OHNC9AFjnMC824pvjNPZ5aZ6
$ vault vault kv get kv/test/demo
==== Data ====
Key     Value
---     -----
bar     foo
pong    ping

Okay, now let’s do this in an ansible playbook (https://gitlab.com/FritsHoogland/hashicorp_vault/blob/master/1_kv_with_obtained_token.yml):

$ ansible-playbook 1_kv_with_obtained_token.yml
 [WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all'


PLAY [localhost] *********************************************************************************************************************************************************

TASK [Gathering Facts] ***************************************************************************************************************************************************
ok: [localhost]

TASK [show foo] **********************************************************************************************************************************************************
/home/vagrant/.local/lib/python2.7/site-packages/urllib3/connectionpool.py:1004: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecureRequestWarning,
ok: [localhost] => {}

MSG:

{u'pong': u'ping', u'bar': u'foo'}


PLAY RECAP ***************************************************************************************************************************************************************
localhost                  : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

This shows all the key-values/secrets as a dict. You can do several things here, like specify the key explicitly:

lookup('hashi_vault', 'secret=kv/test/demo:bar token=s.OHNC9AFjnMC824pvjNPZ5aZ6 url=https://localhost:8200 validate_certs=false')
foo

Or specify it when you use the variable:

lookup('hashi_vault', 'secret=kv/test/demo token=s.OHNC9AFjnMC824pvjNPZ5aZ6 url=https://localhost:8200 validate_certs=false')
msg: "{{ demo.bar }}"
foo

I like the idea of handing out a token, so we don’t even have to think about username and passwords that need to be changed, a playbook gets to use a token, which holds all the access it needs, and expires automatically. If you watched closely, you saw that the token expiry is rather long (768 hours; 32 days), but you can specify the token duration in the policy. 24 hours look like a reasonable duration.

However, you could use the vault username and password in the lookup:

lookup('hashi_vault', 'secret=kv/test/demo auth_method=userpass username=test_read_user password=mypass url=https://localhost:8200 validate_certs=false')

Now there a second version of the key-value store, dubbed kv-v2. This version, as the name suggests, is a bit more advanced. It keeps more data about the key-value combinations, like versions and dates of versions. However, how to use this is not clearly documented, especially the ansible part.

This is how to setup kv-v2, insert some dummy secrets, create a policy and then retrieve them:

$ . ./vault.env
$ . ./set_root_token.sh
$ vault secrets enable kv-v2
$ vault kv put kv-v2/test/demo foo=bar ping=pong
$ vault policy write test_read_kv-v2 policy_test_read_kv-v2.hcl
$ vault write auth/userpass/users/test_read_user password="mypass" policies="test_read_kv,test_read_kv-v2"

So far it looks rather straightforward. However, if you look at the policy, you’ll see what is less obvious:

$ cat policy_test_read_kv-v2.hcl
path "kv-v2/data/test/demo" {
   capabilities = [ "list", "read" ]
}

The data and metadata have been split, and explicit access to the DATA part of the secret must be written to.

This also causes the dict that is returned to be a bit different (https://gitlab.com/FritsHoogland/hashicorp_vault/blob/master/1_kv-v2_with_obtained_token.yml):

$ ansible-playbook 1_kv-v2_with_obtained_token.yml
 [WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all'


PLAY [localhost] *********************************************************************************************************************************************************

TASK [Gathering Facts] ***************************************************************************************************************************************************
ok: [localhost]

TASK [show demo] *********************************************************************************************************************************************************
/home/vagrant/.local/lib/python2.7/site-packages/urllib3/connectionpool.py:1004: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecureRequestWarning,
ok: [localhost] => {}

MSG:

{u'data': {u'foo': u'bar', u'ping': u'pong'}, u'metadata': {u'created_time': u'2019-10-06T13:48:04.378215987Z', u'destroyed': False, u'version': 1, u'deletion_time': u''}}


PLAY RECAP ***************************************************************************************************************************************************************
localhost                  : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

As you can see, some extra data is provided in the dict that is returned. In order to just list the value for the key ‘foo’, use:

msg: "{{ demo.data.foo }}"

Yes, this is another ‘data’ that is added. So the request in the lookup filter needs an added ‘data’, and when you want the value of a specific key, you need to add another ‘data’.

Actually, this is a follow up post from my performance deep dive into tablespace encryption. After having investigated how tablespace encryption works, this blogpost is looking at the other encryption option, column encryption. A conclusion that can be shared upfront is that despite they basically perform the same function, the implementation and performance consequences are quite different.

Column encryption gives you the ability to choose to encrypt per individual column, that’s kind of obvious. However, having to choose which columns to encrypt is what I see as the biggest downside of this encryption option. In most cases, especially with boxed applications, it is quite hard to try to figure out which columns you exactly want to encrypt in order to protect your sensitive data. Which columns do exactly contain your primary sensitive data, and which columns do contain secondary sensitive data (data derived from sensitive data). Do you, when you have to apply encryption, know what EXACTLY is defined as sensitive data, and what isn’t? I bet there isn’t a clear technical description.

A logical reaction then would be ‘couldn’t I then just encrypt all columns’? Well, that is what tablespace encryption is for, isn’t it? To summarise this: I do think the correct use of column encryption in reality is hard to implement and this very limited in usefulness, in most cases tablespace encryption should be used.

Okay…for this test I created a table with two columns, of which one is encrypted:

SQL> create table column_encryption (id number, a varchar2(10) encrypt);
SQL> insert into column_encryption values (1, 'AAAAAAAAAA');
SQL> commit;

The same table, but without encryption:

SQL> create table no_column_encryption (id number, a varchar2(10) );
SQL> insert into no_column_encryption values (1, 'AAAAAAAAAA');
SQL> commit;

And the same table with a lot of rows:

SQL> create table column_encryption_large (id number, a varchar2(10) encrypt);
SQL> begin
 	     for counter in 1..32000000 loop
 		     insert into column_encryption_large values ( counter, dbms_random.string('l',10) );
 	     end loop;
 end;
/

Let’s follow the path of the previous TDE post, and profile the execution of a SQL on the big table to see the impact of column encryption. The first test is a ‘select count(*) from column_encryption_large’ in one session, and ‘perf record -g -p PID’ in another. If you need more explanation on how to run it, please look at the previous blogpost. This is the output of ‘perf report –sort comm –max-stack 2’:

# perf report --sort comm --max-stack 2
# To display the perf.data header info, please use --header/--header-only options.
#
# Samples: 1K of event 'cycles'
# Event count (approx.): 1418165467
#
# Children      Self  Command
# ........  ........  ...............
#
   100.00%   100.00%  oracle_6919_aob
            |--29.21%-- kdstf00000010000100kmP
            |--12.58%-- kdbulk
            |--3.32%-- gup_pte_range
            |--2.58%-- kdst_fetch0
            |--2.54%-- kcbgtcr
            |--2.25%-- __blk_bios_map_sg
            |--2.21%-- kcbhvbo
            |--2.18%-- unlock_page
            |--1.98%-- ktrgcm
            |--1.93%-- do_direct_IO
            |--1.86%-- kcbldrget
            |--1.52%-- kcoapl

This shows IO related functions, both Oracle and operating system level; kdstf is kernel data scan table full for example, gup_pte_range, do_direct_IO, unlock_page and __blk_bios_map_sg are Linux kernel functions. Most notably there are no encryption related functions, which is a big difference with tablespace encryption!
This is actually very logical if you understand the differences between column encryption and tablespace encryption. First let’s look at a block dump from a data block from segment in an encrypted tablespace:

Block dump from cache:
Dump of buffer cache at level 4 for pdb=0 tsn=5 rdba=907
Block dump from disk:
Encrypted block <5, 907> content will not be dumped. Dumping header only.
buffer tsn: 5 rdba: 0x0000038b (1024/907)
scn: 0x0.4e9af4 seq: 0x01 flg: 0x16 tail: 0x9af40601
frmt: 0x02 chkval: 0xf23a type: 0x06=trans data

Yes…you read that right: the block is encrypted, so it will not be dumped. Luckily, you can set the undocumented parameter “_sga_clear_dump” to true to make Oracle dump the block:

SQL> alter session set "_sga_clear_dump"=true;
SQL> alter system dump datafile 5 block 907;

This will make Oracle dump the block. The dump will show the decrypted version of the tablespace level encrypted block:

Block header dump:  0x0000038b
 Object id on Block? Y
 seg/obj: 0x17bc3  csc: 0x00.4e9aed  itc: 2  flg: E  typ: 1 - DATA
     brn: 0  bdba: 0x388 ver: 0x01 opc: 0
     inc: 0  exflg: 0

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0007.01d.000001d0  0x00000987.0390.27  --U-    1  fsc 0x0000.004e9af4
0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
bdba: 0x0000038b
data_block_dump,data header at 0x7f140f335374
===============
tsiz: 0x1f98
hsiz: 0x14
pbl: 0x7f140f335374
     76543210
flag=--------
ntab=1
nrow=1
frre=-1
fsbo=0x14
fseo=0x1f8a
avsp=0x1f76
tosp=0x1f76
0xe:pti[0]      nrow=1  offs=0
0x12:pri[0]     offs=0x1f8a
block_row_dump:
tab 0, row 0, @0x1f8a
tl: 14 fb: --H-FL-- lb: 0x1  cc: 1
col  0: [10]  41 41 41 41 41 41 41 41 41 41
end_of_block_dump

For the count(*), there is no need to read the data, the only thing needed is to read the row directory to fetch the number of rows (row 19). However, to do that, the block must be decrypted.

Now look at a block dump of a column encrypted data block:

Block header dump:  0x0000032b
 Object id on Block? Y
 seg/obj: 0x1821d  csc: 0x00.676d7e  itc: 2  flg: E  typ: 1 - DATA
     brn: 0  bdba: 0x328 ver: 0x01 opc: 0
     inc: 0  exflg: 0

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x000a.007.000078a9  0x00000117.2246.07  --U-    1  fsc 0x0000.00676d7f
0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
bdba: 0x0000032b
data_block_dump,data header at 0x7f140f333264
===============
tsiz: 0x1f98
hsiz: 0x14
pbl: 0x7f140f333264
     76543210
flag=--------
ntab=1
nrow=1
frre=-1
fsbo=0x14
fseo=0x1f5d
avsp=0x1f49
tosp=0x1f49
0xe:pti[0]      nrow=1  offs=0
0x12:pri[0]     offs=0x1f5d
block_row_dump:
tab 0, row 0, @0x1f5d
tl: 59 fb: --H-FL-- lb: 0x1  cc: 2
col  0: [ 2]  c1 02
col  1: [52]
 fd e0 87 66 55 f7 e6 43 de be 31 f6 71 4f 7f 4e f1 75 fb 88 98 9d 13 ed 8e
 cb 69 02 bc 29 51 bd 21 ea 22 04 6b 70 e9 ec 01 9d d6 e4 5a 84 01 1d 90 b0
 e9 01
end_of_block_dump

The block and the row directory can be read normally without any need for decryption. The only thing encrypted is the column (“a”). That perfectly explains the absence of any functions that indicate decryption, because there isn’t any decryption taking place!

Now let’s rewrite the SQL to touch the data, and thus involve decryption: ‘select avg(length(a)) from column_encryption_large’. This way the row needs to be decrypted and read. This is how the output of a perf recording looks like:

# perf report --sort comm --max-stack 2
# To display the perf.data header info, please use --header/--header-only options.
#
# Samples: 65K of event 'cycles'
# Event count (approx.): 229042607170
#
# Children      Self  Command
# ........  ........  ...............
#
   100.00%   100.00%  oracle_6919_aob
            |--24.73%-- ztchsh1h
            |--14.91%-- ztchsh1n
            |--6.10%-- y8_ExpandRijndaelKey
            |--5.90%-- ownGetReg
            |--5.50%-- __intel_ssse3_rep_memcpy
            |--4.99%-- ztchsh1f
            |--4.28%-- ztcxi
            |--2.60%-- ipp_is_GenuineIntel
            |--1.88%-- _intel_fast_memcpy
            |--1.74%-- _intel_fast_memcpy.P
            |--1.52%-- kspgip
            |--1.16%-- kgce_init

The functions starting with ‘ztc’ are probably related to security (“zecurity”), and also probably related to decryption. The function name “y8_ExpandRijndaelKey” is clearly related to encryption. When you look up the function address of “ownGetReg”, it’s close to the “y8_ExpandRijndaelKey” function. The last group of functions are memcpy related functions, that seems consistent with decrypting: moving data.

On the performance side, it’s clear that the majority of the time is spend in the functions ztchsh1h and ztchsh1n. In order to understand more about these functions, let’s expand the stack:

# perf report --sort comm
# To display the perf.data header info, please use --header/--header-only options.
#
# Samples: 65K of event 'cycles'
# Event count (approx.): 229035032972
#
# Children      Self  Command
# ........  ........  ...............
#
   100.00%   100.00%  oracle_6919_aob
            |
            |--25.01%-- ztchsh1h
            |          |
            |          |--99.63%-- ztchsh1n
            |          |          |
            |          |          |--50.85%-- ztchsh1f
            |          |          |          ztchf
            |          |          |          ztcxf
            |          |          |          ztcx
            |          |          |          kztsmohmwl
            |          |          |          kztsmhmwl
            |          |          |          kzekmetc
            |          |          |          kzecsqen
            |          |          |          kzecctex
            |          |          |          evaopn2
            |          |          |          evaopn2
            |          |          |          qesaAggNonDistSS
            |          |          |          kdstf00001010000000km
            |          |          |          kdsttgr
            |          |          |          qertbFetch
            |          |          |          qergsFetch
            |          |          |          opifch2
            |          |          |          kpoal8
------------------------------------------------------
            |--14.90%-- ztchsh1n
            |          |
            |          |--85.25%-- ztchsh1f
            |          |          ztchf
            |          |          ztcxf
            |          |          ztcx
            |          |          kztsmohmwl
            |          |          kztsmhmwl
            |          |          kzekmetc
            |          |          kzecsqen
            |          |          kzecctex
            |          |          evaopn2
            |          |          evaopn2
            |          |          qesaAggNonDistSS
            |          |          kdstf00001010000000km
            |          |          kdsttgr
            |          |          qertbFetch
            |          |          qergsFetch
            |          |          opifch2
            |          |          kpoal8

I fetched the stack of the two functions in which the most time was spend. The most important thing to see is that the encryption now takes place as part of processing the fetched data (qesaAggNonDistSS probably has something to do with aggregating data, evaopn2 probably is a function to evaluate operands) rather than performing the (logical) IO; mind the absence of the kcbgtcr function.

The reason for doing the decryption during operand evaluation rather than during doing the IO is because the data is stored encrypted in the block, and thus also in the buffer cache. So during IO time, there is no need to decrypt anything. This also has another rather important consequence: for every row that has an encrypted column that is processed, decryption needs to take place. There does not seem to be any caching of the decrypted value for column encryption, which is logical from a security point of view, but has a severe performance consequence.

When doing a pin tools debugtrace on the above SQL for the processing of a single row (the table ‘column_encryption’, rather than ‘column_encryption_large’), applying the sed filters, and then grepping for a selective set of functions, this is how the processing looks like:

 | | | | > qergsFetch(0x294512030, 0x7f871c9fa2f0, ...)
 | | | | | > qeaeAvg(0x7f8717ce9968, 0xe, ...)
 | | | | | < qeaeAvg+0x000000000063 returns: 0  | | | | | > qertbFetch(0x294512178, 0x7f871ca08a68, ...)
 | | | | | | | | | | > kcbgtcr(0x7ffe2f9b3ae0, 0, ...)
 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | > pread64@plt(0x100, 0x1f428c000, ...)
 | | | | | | | | | | < kcbgtcr+0x000000003221 returns: 0x1f428c014  | | | | | | | | | | | | | | > kcbgtcr(0x7ffe2f9b35d0, 0, ...)
 | | | | | | | | | | | | | | < kcbgtcr+0x0000000009a1 returns: 0x1f428c014  | | | | | | > kdsttgr(0x7f871c9f9918, 0, ...)
 | | | | | | | > kdstf00001010000000km(0x7f871c9f9918, 0, ...)
 | | | | | | | | > kdst_fetch(0x1, 0x7f871c9f9918, ...)
 | | | | | | | | | > kdst_fetch0(0x1, 0x7f871c9f9918, ...)
 | | | | | | | | | | | > kcbgtcr(0x7f871c9f9930, 0, ...)
 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | > pread64@plt(0x100, 0x2b1115000, ...)
 | | | | | | | | | | | < kcbgtcr+0x000000003221 returns: 0x1e4aa6014
 | | | | | | | | | < kdst_fetch0+0x0000000004d0 returns: 0x1e4aa6076
 | | | | | | | | < kdst_fetch+0x000000000048 returns: 0x1e4aa6076  | | | | | | | | > qesaAggNonDistSS(0x7ffe2f9b45d0, 0x7fff, ...)
 | | | | | | | | | > evaopn2(0x294511ef0, 0x294512030, ...)
 | | | | | | | | | | > evaopn2(0x294511e68, 0x294512030, ...)
 | | | | | | | | | | | | | | | | | | | > ztchsh1n(0x7ffe2f9b1ef8, 0x11c4e8d0, ...)
 | | | | | | | | | | | | | | | | | | | > ztchsh1f(0x7ffe2f9b1ef8, 0x7ffe2f9b3100, ...)
 --> 168 times in total of ztchsh1n or ztchsh1f
 | | | | | | | | | | < evaopn2+0x0000000002dc returns: 0x7f871c9fa2c0  | | | | | | | | | | > evalen(0x294511ef0, 0x7f871c9fa2c0, ...)
 | | | | | | | | | | < evalen+0x000000000147 returns: 0x2
 | | | | | | | | | < evaopn2+0x0000000002dc returns: 0x7f871c9fa2d0  | | | | | | | | | > qeaeAvg(0x7f8717ce9968, 0xb, ...)
 | | | | | | | | | < qeaeAvg+0x000000000063 returns: 0x7f8717ce99c9
 | | | | | | | | < qesaAggNonDistSS+0x000000000193 returns: 0x7fff  | | | | | | | | > kdst_fetch(0x1, 0x7f871c9f9918, ...)
 | | | | | | | | | > kdst_fetch0(0x1, 0x7f871c9f9918, ...)
 | | | | | | | | | | | > kcbgtcr(0x7f871c9f9930, 0, ...)
 | | | | | | | | | | | < kcbgtcr+0x0000000009a1 returns: 0x1dec30014
 | | | | | | | | | < kdst_fetch0+0x0000000004d0 returns: 0x1dec30072
 | | | | | | | | < kdst_fetch+0x000000000048 returns: 0x1dec30072  | | | | | | | | > kdst_fetch(0x1, 0x7f871c9f9918, ...)
 | | | | | | | | | > kdst_fetch0(0x1, 0x7f871c9f9918, ...)
 | | | | | | | | | | | > kcbgtcr(0x7f871c9f9930, 0, ...)
 | | | | | | | | | | | < kcbgtcr+0x0000000009a1 returns: 0x1deca4014
 | | | | | | | | | < kdst_fetch0+0x0000000004d0 returns: 0x1deca4072
 | | | | | | | | < kdst_fetch+0x000000000048 returns: 0x1deca4072  | | | | | | | | > kdst_fetch(0x1, 0x7f871c9f9918, ...)
 | | | | | | | | | > kdst_fetch0(0x1, 0x7f871c9f9918, ...)
 | | | | | | | | | | | > kcbgtcr(0x7f871c9f9930, 0, ...)
 | | | | | | | | | | | < kcbgtcr+0x0000000009a1 returns: 0x1e4be0014
 | | | | | | | | | < kdst_fetch0+0x0000000004d0 returns: 0x1e4be0072
 | | | | | | | | < kdst_fetch+0x000000000048 returns: 0x1e4be0072  | | | | | | | | > kdst_fetch(0x1, 0x7f871c9f9918, ...)
 | | | | | | | | | > kdst_fetch0(0x1, 0x7f871c9f9918, ...)
 | | | | | | | | | | | > kcbgtcr(0x7f871c9f9930, 0, ...)
 | | | | | | | | | | | < kcbgtcr+0x0000000009a1 returns: 0x1dedb2014
 | | | | | | | | | < kdst_fetch0+0x0000000004d0 returns: 0x1dedb2072
 | | | | | | | | < kdst_fetch+0x000000000048 returns: 0x1dedb2072  | | | | | | | | > kdst_fetch(0x1, 0x7f871c9f9918, ...)
 | | | | | | | | | > kdst_fetch0(0x1, 0x7f871c9f9918, ...)
 | | | | | | | | | < kdst_fetch0+0x0000000011c9 returns: 0
 | | | | | | | | < kdst_fetch+0x000000000048 returns: 0
 | | | | | | | < kdstf00001010000000km+0x00000000035d returns: 0x7fff
 | | | | | | < kdsttgr+0x00000000085f returns: 0x7fff
 | | | | | < qertbFetch+0x000000001301 returns: 0x7fff  | | | | | > qeaeAvg(0x7f8717ce9968, 0x294511f78, ...)
 | | | | | < qeaeAvg+0x000000000063 returns: 0x2  | | | | | | > evaopn2(0x294511f78, 0, ...)
 | | | | | | < evaopn2+0x0000000002dc returns: 0x7f871c9fa2e0
 | | | | < qergsFetch+0x000000000f25 returns: 0

This is how the explain plan of the ‘select avg(length(a)) from column_encryption’ SQL:

----------------------------------------------------------------------------------------
| Id  | Operation	   | Name	       | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |		       |       |       |     3 (100)|	       |
|   1 |  SORT AGGREGATE    |		       |     1 |    53 |	    |	       |
|   2 |   TABLE ACCESS FULL| COLUMN_ENCRYPTION |     1 |    53 |     3	 (0)| 00:00:01 |
----------------------------------------------------------------------------------------

If you look back to the grepped debugtrace, and the execution plan:
Line 1: the sort aggregate rowsource (qergsFetch).
Line 4: the table acces full (qertbFetch).
Line 5: here a logical read (kcbgtcr) is issued, and because the block didn’t exist in the cache, was physically read (line 6: pread64). This is the segment header, the “real” scan of data blocks has not been started yet.
Line 11: this is the ultra fast full table scan (kdstf00001010000000km). My guess is this function is a full table scan function with certain decisions hard coded, instead of choices made on runtime, so the in-CPU execution prediction runs into less branch mispredictions.
Line 12: this is the part of the full table scan for fetching (loading) the data (kdst_fetch). What is special here is that a multiblock read is done, the kcbgtcr function triggers a single physical read for multiple logical blocks, which are later fetched per block (kdst_fetch and kcbgtcr functions starting from line 32, 38, etc).
Line 19: this function executes row based functions and aggregates the results per block/fetch (qesaAggNonDistSS).
Line 20: as part of fetching the row and executing functions, the row value is evaluated first (evaopn2).
Line 21/22: here the column is decrypted (made visible by the ztchsh1n/ztchsh1f functions, not necessarily the decryption functions theirselves).
Line 26/29: here probably the length (evalen) and average (qeaeAvg) row functions are executed.
Line 32: the next block is processed, but no rows are found, and thus no need to execute rowsource (qe*) functions afterwards.

So, what do we know at this point regarding column encryption?
– Columns that are encrypted are stored encrypted in the block in the buffer cache.
– Which means they have to be decrypted every time the column values are read, which is different from tablespace encryption, for which a block is encrypted, and is decrypted whenever a block is read into the buffer cache.
– Functions related to column encryption specifically (different functions than seen with tablespace encryption) take roughly 40% of the time in my case.

Can the time spend on column decryption be optimised?
There are multiple ways you can change the way Oracle applies column encryption. There are four encryption types: 3DES168, AES128, AES192 and AES256. The default is AES192.
Here are query timings of doing a select avg(length(a)) from TABLE on my “large” table with 32 million rows:

3DES168 4:53
AES256 1:09
AES192 1:06
AES128 1:03

A way to optimise column encryption is to specify not to use an extra verification by specifying ‘nomac’ at the encryption definition of the column. This saves space (by default, extra space is used for the message abstract that is used by the verification for every column). These are the timings with the ‘nomac’ addition to disable encryption verification:

3DES168 3:59
AES256 0:26
AES192 0:23
AES128 0:22

This shows a significant reduction of time. However, if no encryption at all is applied to the column, the query timing is 0:03.

Internals background information
The functions ztchsh1n/ztchsh1 are related to verification (again, if you read the tablespace encryption blogpost, where the most time consuming functions were verification too). Once ‘nomac’ is specified with the encryption definition of the column, the ztchsh* function vanish, and the top time consuming functions are y8_ExpandRijndaelKey and ownGetReg, which clearly are directly related to decryption. The performance gain of ‘nomac’ is lesser with 3DES168 encryption.

Conclusion
I think tablespace encryption is the encryption method of choice for a normal implementation. In most cases it will be too much work to exactly figure out which columns to encrypt. If you still consider column encryption, you also should be aware that the column value is stored encrypted in the block and (as a consequence) in the cache. Every use of the encrypted column involves encryption or decryption, for which the overhead is significant, even with ‘nomac’ specified to disable (additional) verification.

Recently, I was trying to setup TDE. Doing that I found out the Oracle provided documentation isn’t overly clear, and there is a way to do it in pre-Oracle 12, which is done using ‘alter system’ commands, and a new-ish way to do it in Oracle 12, using ‘administer key management’ commands. I am using version 12.1.0.2.170117, so decided to use the ‘administer key management’ commands. This blogpost is about an exception which I see is encountered in the Januari 2017 (170117) PSU of the Oracle database, which is NOT happening in Oracle 12.2 (no PSU’s for Oracle 12.2 at the time of writing) and Oracle 12.1.0.2 April 2016 and October 2016 PSU’s.

In order to test the wallet functionality for TDE, I used the following commands:

SQL> select status, wrl_parameter from v$encryption_wallet;

STATUS
------------------------------
WRL_PARAMETER
--------------------------------------------------------------------------------
NOT_AVAILABLE
/u01/app/oracle/admin/test/wallet

SQL> !mkdir /u01/app/oracle/admin/test/wallet

SQL> administer key management create keystore '/u01/app/oracle/admin/test/wallet' identified by "this_is_the_keystore_password";

keystore altered.

SQL> administer key management set keystore open identified by "this_is_the_keystore_password";

keystore altered.

SQL> administer key management set key identified by "this_is_the_keystore_password" with backup;
administer key management set key identified by "this_is_the_keystore_password" with backup
*
ERROR at line 1:
ORA-28374: typed master key not found in wallet

SQL> select status, wrl_parameter from v$encryption_wallet;

STATUS
------------------------------
WRL_PARAMETER
--------------------------------------------------------------------------------
CLOSED
/u01/app/oracle/admin/test/wallet

SQL> administer key management set keystore open identified by "this_is_the_keystore_password";

keystore altered.

SQL> select status, wrl_parameter from v$encryption_wallet;

STATUS
------------------------------
WRL_PARAMETER
--------------------------------------------------------------------------------
OPEN
/u01/app/oracle/admin/test/wallet

Notes:
Line 1-10: The DB_UNIQUE_NAME of the instance is ‘test’, and therefore the default wallet location is /u01/app/oracle/admin/test/wallet (ORACLE_BASE/admin/DB_UNIQUE_NAME/wallet). The wallet directory doesn’t exist by default, so I created it (line 10).
Line 12: Here the keystore/wallet is created with a password.
Line 16: After the wallet is created without auto-login, the wallet must be opened using the ‘set keystore open’ command.
Line 20: After the wallet has been created, it does not contain a master key. This is done using the ‘set key’ command. However, this throws an ORA-28374 error.
Line 26: After an error involving the wallet has occurred, the wallet closes.
Line 35: The wallet can simply be opened using the earlier used ‘set keystore open’ command.
Line 39: This is where the surprise is: after opening, the master key “magically” appeared (visible by the status ‘OPEN’, without a master key this would be ‘OPEN_NO_MASTER_KEY’).

I yet have to start creating encrypted table spaces. There might be more surprises, I can’t tell at this moment because I didn’t try it. However, once I discovered this oddity, I talked to my colleague Matt who gave me his own runbook for enabling TDE, which turned out to be the exact same list of commands as I compiled, however he did not encounter the ORA-28374 which I did. I tested the same sequence of commands on 12.2.0.1, 12.1.0.2.161018 (October 2016) and 12.1.0.2.160419 (April 2016) and there the ORA-28374 was not raised during execution of the ‘set key’ command.

Update!
Reading through My Oracle Support note Master Note For Transparent Data Encryption ( TDE ) (Doc ID 1228046.1), I found the following text:

All the versions after 12.1.0.2

=====================

As of 12.1.0.2 If the key associated with the SYSTEM, SYSAUX or UNDO tablespaces is not present in the wallet you cannot associate a new master key with the database (i.e. you cannot activate that master key for the database) unless you set a hidden parameter :

SQL> administer key management use key ‘AUQukK/ZR0/iv26nuN9vIqcAAAAAAAAAAAAAAAAAAAAAAAAAAAAA’ identified by “welcome1” with backup;
administer key management use key ‘AUQukK/ZR0/iv26nuN9vIqcAAAAAAAAAAAAAAAAAAAAAAAAAAAAA’ identified by “welcome1” with backup
*
ERROR at line 1:
ORA-28374: typed master key not found in wallet

alter system set “_db_discard_lost_masterkey”=true;

SQL> administer key management use key ‘AUQukK/ZR0/iv26nuN9vIqcAAAAAAAAAAAAAAAAAAAAAAAAAAAAA’ identified by “welcome1” with backup;

The heading and first line read weird, the heading indicates the paragraph is about ‘all the versions after 12.1.0.2’ (which to me means 12.2), and the first line in the paragraph says ‘as of 12.1.0.2’, which very clearly says this is about version 12.1.0.2 and higher. However, a little further it shows the exact error (ORA-28374) I encountered, and explains that if a current key is used in the data dictionary (mind data dictionary, not wallet), you must set “_db_discard_lost_masterkey” to true before you can create and use another master key for a wallet if you start over (wipe or move the wallet directory).

This makes sense to me now! I tried dropping and creating new wallets in my current 170117 PSU instance, and only tried creating an encryption wallet in a brand new freshly created instance. So if I would have EXACTLY done the same in the instances with the other PSU’s, which is repeatedly create and drop a wallet for TDE, I would have encountered the same ORA-28374 error. Well…I see this as a safety mechanism, be it not a very obvious one, not exuberant documented, and probably causing more grief than it would save if you run into the need the change the master key.

Recently I was asked to analyse the security impact of the snmp daemon on a recent Exadata. This system was running Exadata image version 12.1.2.1.3. This blog article gives you an overview of a lot of the things that surround snmp and security.

First of all what packages are installed doing something with snmp? A list can be obtained the following way:

# rpm -qa | grep snmp
net-snmp-utils-5.5-54.0.1.el6_7.1.x86_64
net-snmp-libs-5.5-54.0.1.el6_7.1.x86_64
net-snmp-5.5-54.0.1.el6_7.1.x86_64
sas_snmp-14.02-0103.x86_64

Essentially the usual net-snmp packages and a package called ‘sas_snmp’.

A next important thing is how the firewall is configured. However, the default setting of the firewall on the compute nodes with exadata is the firewall turned off:

# iptables -L -v
Chain INPUT (policy ACCEPT 437M packets, 216G bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 343M packets, 748G bytes)
 pkts bytes target     prot opt in     out     source               destination

So if there is something running that listens to a network port that can benefit ‘attackers’, there is no firewall to stop them.

Next obvious question is what snmp processes are actually running:

# ps -ef |grep snmp
root       7088      1  0 Aug16 ?        00:51:32 /usr/sbin/snmpd -LS0-6d -Lf /dev/null -p /var/run/snmpd.pid
root      33443      1  0 03:14 ?        00:00:49 /usr/sbin/lsi_mrdsnmpagent -c /etc/snmp/snmpd.conf
root      33454  33443  0 03:14 ?        00:00:00 /usr/sbin/lsi_mrdsnmpagent -c /etc/snmp/snmpd.conf

The snmpd process is the net-snmp snmp daemon. However, there are two additional processes running with the name ‘snmp’ in them, one is owned by init, and a processes that this process has spawned. The name ‘lsi_mrdsnmpagent’ probably means LSI MegaRaid SNMP agent. That gives a fair hint this processes is doing something snmp related specifically for the LSI MegaRaid adapter, which is the disk controller.

Are there any open ports related to snmp processes?

# netstat -anp | grep snmp
tcp        0      0 127.0.0.1:199               0.0.0.0:*                   LISTEN      7088/snmpd
udp        0      0 0.0.0.0:161                 0.0.0.0:*                               7088/snmpd
udp        0      0 0.0.0.0:22917               0.0.0.0:*                               7088/snmpd

1. tcp port 199
This is support for the SMUX protocol (RFC 1227) to communicate with SMUX-based subagents. This is a deprecated feature in favour of AgentX. It is considered a bug (https://bugzilla.redhat.com/show_bug.cgi?id=110931) the daemon is still using this port. However the port is opened on localhost (127.0.0.1) and as such not reachable from outside of the machine, which means it is not a direct security problem.

2. udp port 161
This is the default snmpd port. This port is open to the outside world on the compute node, which can be seen from the address 0.0.0.0 in the above ‘source’ column. The port being open can be verified using another machine and the ‘nmap’ tool:

$ sudo nmap -Pn -sU -p 161 311.1.1.1
Password:

Starting Nmap 6.47 ( http://nmap.org ) at 2016-10-26 15:00 CEST
Nmap scan report for 311.1.1.1
Host is up (0.087s latency).
PORT    STATE SERVICE
161/udp open  snmp

The status ‘open’ shows this udp port does respond to requests.

3. udp port 22917 (in this case; this port number is random)
This is a random port that gets set for trapsink directive set in the /etc/snmp/snmpd.conf. A trap sink is the destination for snmp traps that get triggered. Although this udp port is in use, it does not respond to network traffic:

$ sudo nmap -Pn -sU -p 22917 311.1.1.1
Password:

Starting Nmap 6.47 ( http://nmap.org ) at 2016-10-26 15:22 CEST
Nmap scan report for 311.1.1.1
Host is up.
PORT      STATE         SERVICE
22917/udp open|filtered unknown

The status ‘open|filtered’ does mean the udp port does not respond to requests.

Now let’s look how the actual configuration file of the snmp daemon looks like on exadata. The configuration file is /etc/snmp/snmpd.conf:

snmp daemon configuration file:
trapcommunity public
trapsink 127.0.0.1 public
rocommunity public 127.0.0.1
rwcommunity public 127.0.0.1

access  RWGroup         ""      any       noauth    exact all all all
com2sec snmpclient      127.0.0.1               public
group   RWGroup                 v1                              snmpclient

pass .1.3.6.1.4.1.4413.4.1 /usr/bin/ucd5820stat
pass .1.3.6.1.4.1.3582 /usr/sbin/lsi_mrdsnmpmain

syscontact Root <root@localhost> (configure /etc/snmp/snmp.local.conf)
syslocation Unknown (edit /etc/snmp/snmpd.conf)

view    all             included      .1                80

The snmpd.conf file shows:
– trapsink destination (127.0.0.1, localhost) and community string (public).
– the ro and rw communities are set to ‘public 127.0.0.1’
In general it is advised to change the community strings to something unique to avoid being easy guessable. However, in this case there’s also a network description following the ro and rw community, which is: 127.0.0.1. This means snmp access is restricted to localhost.
This can be verified by running snmpwalk from another machine:

$ snmpwalk -v 2c -c public 311.1.1.1
Timeout: No Response from 311.1.1.1

This means there is no way to communicate to the snmp daemon from outside of the machine. We can see from the snmp daemon configuration file that access to the snmp deamon is limited to localhost.

It seems the LSI megaraid snmp agent works together with snmpd:

root      33443      1  0 03:14 ?        00:01:01 /usr/sbin/lsi_mrdsnmpagent -c /etc/snmp/snmpd.conf
root      33454  33443  0 03:14 ?        00:00:00 /usr/sbin/lsi_mrdsnmpagent -c /etc/snmp/snmpd.conf

Obviously it reads the snmpd.conf (-c /etc/snmp/snmpd.conf as seen above), but it has got a configuration file of its own. This configuration file of ‘lsi_mrdsnmpagent’ process can be seen when looking at the files inside the sas_snmp rpm package (rpm -ql sas_snmp), but the main evidence it is using the file can be derived from looking at the open file descriptors of the lsi_mrdsnmpagent process:

# ls -ls /proc/$(pgrep -f lsi_mrdsnmpagent | head -1)/fd
total 0
0 lr-x------. 1 root root 64 Oct 26 03:14 0 -> /dev/null
0 lr-x------. 1 root root 64 Oct 26 03:14 1 -> /etc/lsi_mrdsnmp/sas/sas_TrapDestination.conf
0 l-wx------. 1 root root 64 Oct 26 03:14 2 -> /var/log/cellos/cron_daily_cellos.stderr (deleted)
0 lrwx------. 1 root root 64 Oct 26 03:14 3 -> socket:[2923149143]
0 l-wx------. 1 root root 64 Oct 26 03:14 4 -> /var/log/cellos/cellos.log (deleted)
0 l-wx------. 1 root root 64 Oct 26 03:14 5 -> /var/log/cellos/cellos.trc (deleted)
0 lr-x------. 1 root root 64 Oct 26 03:14 6 -> /etc/snmp/snmpd.conf
0 lr-x------. 1 root root 64 Oct 26 03:14 7 -> /etc/redhat-release
0 lr-x------. 1 root root 64 Oct 26 03:14 8 -> /dev/megaraid_sas_ioctl_node
0 lr-x------. 1 root root 64 Oct 26 03:14 9 -> pipe:[2919419375]

Line 4 shows ‘/etc/lsi_mrdsnmp/sas/sas_TrapDestination.conf’! Let’s look inside that configuration file:

# cat /etc/lsi_mrdsnmp/sas/sas_TrapDestination.conf
#################################################
# Agent Service needs the IP addresses to sent trap
# The trap destination may be specified in this file or
# using snmpd.conf file. Following indicators can be set
# on "TrapDestInd" to instruct the agent to pick the IPs
# as the destination.
# 1 - IPs only from snmpd.conf
# 2 - IPs from this file only
# 3 - IPs from both the files
#################################################
TrapDestInd 3
#############Trap Destination IP##################
# Add port no after IP address with no space after
# colon to send the SNMP trap message to custom port.
# Community is to be mentioned after IP. If no community
# is mentioned, default SNMP community 'public' shall be
# used. 'trapcommunity' token is also used in snmpd.conf.
# Alternatively, you can also use trapsink command
# in snmpd.conf to send the SNMP trap message to
# custom port, else default SNMP trap port '162' shall
# be used.
127.0.0.1	public
# 145.147.201.88:1234	public
# 145.146.180.20:3061	testComm
127.0.0.1:8162 public

It is a configuration file that works alongside the snmpd.conf configuration. What is important to see, is ‘TrapDestInd’, which is set at ‘3’, which means that traps are send to trap destinations set in the snmpd.conf file AND set in the sas_TrapDestionation.conf file. Two traps are defined in the file, 127.0.0.1 with community string public, which means it sends a trap to udp port 161 (at which the snmpd process is listening, as we saw earlier in the open ports list), but the most interesting thing here is there’s also a trap send to 127.0.0.1 at port 8162. That is a port number I do not know from the top of my head!

However, it’s simple to find out. The first thing to check is to see what process is running at port 8162:

# netstat -anp | grep 8162
udp        0      0 :::8162                     :::*                                    15233/java

That’s a java process! Let’s grep the process number to see if the full command line gives more clues what this java process is:

# ps -ef | grep 15233
dbmsvc    15233  15136  0 Aug16 ?        05:32:25 /usr/java/jdk1.7.0_80/bin/java -client -Xms256m -Xmx512m -XX:CompileThreshold=8000 -XX:PermSize=128m -XX:MaxPermSize=256m -Dweblogic.Name=msServer -Djava.security.policy=/opt/oracle/dbserver_12.1.2.1.3.151021/dbms/deploy/wls/wlserver_10.3/server/lib/weblogic.policy -XX:-UseLargePages -XX:ParallelGCThreads=8 -Dweblogic.ListenPort=7878 -Djava.security.egd=file:/dev/./urandom -Xverify:none -da -Dplatform.home=/opt/oracle/dbserver_12.1.2.1.3.151021/dbms/deploy/wls/wlserver_10.3 -Dwls.home=/opt/oracle/dbserver_12.1.2.1.3.151021/dbms/deploy/wls/wlserver_10.3/server -Dweblogic.home=/opt/oracle/dbserver_12.1.2.1.3.151021/dbms/deploy/wls/wlserver_10.3/server -Dweblogic.management.discover=true -Dwlw.iterativeDev= -Dwlw.testConsole= -Dwlw.logErrorsToConsole= -Dweblogic.ext.dirs=/opt/oracle/dbserver_12.1.2.1.3.151021/dbms/deploy/wls/patch_wls1036/profiles/default/sysext_manifest_classpath weblogic.Server

That’s java running weblogic, with the name ‘msServer’. That is something that is part of the daemons that serve dbmcli (alike the daemons that service cellcli on the cells)!

This actually makes sense. The daemons that manage the database server fetch hardware status information and hardware failures from the BMC using the IPMI device (/dev/ipmi0). However the LSI MegaRaid adapter can not provide its status in that way. So in order for the management daemons to keep track of events on the LSI MegaRaid adapter (hardware issues), a daemon that works together with the snmp daemon is setup, which sends snmp traps if something occurs. The management daemon has setup a port that listens for these traps.

I do not know if the community strings is processed by the management deamon processing the trap. However, the port number on which the daemon is listening for traps is defined in ‘/opt/oracle/dbserver/dbms/deploy/config/cellinit.ora’ with the directive BMC_SNMP_PORT.

Conclusion
As far as I can see, the reason the snmp deamon is running is to be able to run the LSI MegaRaid SNMP agent process, so it can send traps to the compute node’s management daemons. Since most Exadata compute nodes do not have the firewall enabled, udp port 161 is exposed. The settings of the snmp daemon itself limits access to localhost.

In the first post of the hardening serie, I described how to scan a website with Nikto. Another scanner, which not only scans HTTP servers, but does FTP, SMB/CIFS, telnet, ssh, Oracle listener and operating systems too is Nessus. Let me say right away: Nessus was started as an open source effort, but has been made closed source some time ago. A fork of the open source version is called OpenVAS.

Nessus is free for personal and non-commercial use. This means you can download it and use it to look how it works. If you want to scan your corporate webserver or websites, or use it as a tool for doing consultancy, you must purchase a ‘ProfessionalFeed’.

Nessus is a very advanced scanner, with a repository of checks (called plugins), which is kept up-to-date by the company which provides nessus, called ‘Tenable Network Security’. The updates can be downloaded manually (via a script), but are updated automatically every 24 hours by default.

These are the plugin families (from the nessus website):

  • CGI abuses – This plugin family checks for anything that is ‘CGI’ related, unless it is XSS (and only a XSS vulnerability), in which case it falls into the “CGI abuses : XSS” family. These checks use a combination of detection techniques, including checking version of the application and testing for the actual vulnerability. The attacks include software detection, information disclosure, XSS, SQLi, LFI, RFI, overflows and more.
  • CGI abuses : XSS – Specific CGI checks for reflective and persistent XSS vulnerabilities in common web applications.
  • Database – Typically a web server will run a database that is used by various web applications.
  • FTP – Web pages need to be updated, and FTP is a popular protocol used to allow your web developers to send files to the server.
  • Gain a Shell Remotely – If you can obtain a shell on the remote web server, testing the application is somewhat moot.
  • Gain root remotely – Same thing as above, if you gain root, resolve this problem before the application is tested.
  • General – Contains the operating system fingerprinting plugins, including ones that will identify the OS over HTTP. Identifying the underlying operating system is very important for web application testing, as it will determine the syntax of commands sent via injection (command and SQL) attacks.
  • Remote file access- Includes checks for specific web server/application vulnerabilities that lead to remote file disclosure.
  • Service detection – Contains checks for several different services, including detecting Apache running HTTPS, HTTP CONNECT proxy settings and other services that may host web applications.
  • Web servers – Plugins in this family detect approximately 300 specific vulnerabilities in popular web servers, such as Apache, IIS and generic vulnerabilities associated with the HTTP protocol itself.

Also, operating system updates are checked if Nessus is able to log on (either by credentials specified or if it found a username/password combination from an internal list: for example username ‘oracle’ with password ‘oracle’ is tested).

With all these tests it’s still very important to note that nothing beats proper design and validation of applications. Nessus (and most vulnerability checking software I know) checks for known issues. Customizations and self build applications are probably not thoroughly checked for vulnerabilities. This means Nessus could easily lead to a false sense of safety.

The Oracle HTTP Server (OHS) is a version of the Apache HTTP daemon, modified by Oracle to work with it’s Application Server suite. The Application Server suite is called ‘Fusion Middleware’ with version 11. The OHS acts/can act as a frontend to application servers like OC4J and Weblogic. In my opinion an application server should always have an http server in front of it, to act as:
– Logger
– SSL offloader
– Firewall
In my opinion it would also be very good to let the OHS be a frontend for APEX.

The OHS is only a decent firewall if you made sure it can only do what you have intended it to do. Sadly, this isn’t the case by default. Whilst the most recent (11.1) version is quite safe (stripped of functionality), it only is because it’s better than its predecessors, which where quite ‘open’.

Hardening is essentially done the same way as firewalling is: disable everything (disable all functionality) and enable only the needed functionality as limited as possible.

A decent way of checking enabled functionality, is using Nikto 2. Nikto is a web server scanner.

Here is an example:

$ ./nikto.pl -host example.com
- Nikto v2.1.1
---------------------------------------------------------------------------
+ Target IP: xxx.xxx.xxx.xxx
+ Target Hostname: example.com
+ Target Port: 80
+ Start Time: 2010-05-26 10:33:35
---------------------------------------------------------------------------
+ Server: Oracle-Application-Server-10g/10.1.2.0.2 Oracle-HTTP-Server OracleAS-Web-Cache-10g/10.1.2.0.2 (G;max-age=0+0;age=0;ecid=201564454590,0)
+ Uncommon header 'tcn' found, with contents: choice
+ ETag header found on server, inode: 3425418, size: 20042, mtime: 0x43e7685d;493cf3fe
+ Number of sections in the version string differ from those in the database, the server reports: oracle-application-server-10g/10.1.2.0.2oracle-http-serveroracleas-web-cache-10g/10.1.2.0.2(g;max-age=0+0;age=0;ecid=201564454590,0) while the database has: 10.1.3.1.0. This may cause false positives.
+ Oracle-Application-Server-10g/10.1.2.0.2Oracle-HTTP-ServerOracleAS-Web-Cache-10g/10.1.2.0.2(G;max-age=0+0;age=0;ecid=201564454590,0) appears to be outdated (current is at least 10.1.3.1.0)
+ Allowed HTTP Methods: GET, HEAD, OPTIONS, TRACE, POST, PUT, DELETE, CONNECT, PATCH, PROPFIND, PROPPATCH, MKCOL, COPY, MOVE, LOCK, UNLOCK
+ OSVDB-397: HTTP method ('Allow' Header): 'PUT' method could allow clients to save files on the web server.
+ OSVDB-5646: HTTP method ('Allow' Header): 'DELETE' may allow clients to remove files on the web server.
+ HTTP method ('Allow' Header): 'CONNECT' may allow server to proxy client requests.
+ HTTP method ('Allow' Header): 'PROPFIND' may indicate DAV/WebDAV is installed. This may be used to get directory listings if indexing is allow but a default page exists.
+ HTTP method ('Allow' Header): 'PROPPATCH' indicates WebDAV is installed.
+ OSVDB-5647: HTTP method ('Allow' Header): 'MOVE' may allow clients to change file locations on the web server.
+ OSVDB-877: HTTP TRACE method is active, suggesting the host is vulnerable to XST
+ OSVDB-27487: Apache is vulnerable to XSS via the Expect header
+ OSVDB-700: /fcgi-bin/echo?foo=alert('Vulnerable'): Fast-CGI has two default CGI programs (echo.exe/echo2.exe) vulnerable to Cross Site Scripting (XSS). http://www.cert.org/advisories/CA-2000-02.html.
+ OSVDB-3954: /fcgi-bin/echo2?foo=alert('Vulnerable'): Fast-CGI has two default CGI programs (echo.exe/echo2.exe) vulnerable to Cross Site Scripting (XSS). http://www.cert.org/advisories/CA-2000-02.html.
+ OSVDB-561: /server-status: This reveals Apache information. Comment out appropriate line in httpd.conf or restrict access to allowed hosts.
+ OSVDB-3233: /index.html.de: Apache default foreign language file found. All default files should be removed from the web server as they may give an attacker additional system information.
+ OSVDB-3233: /index.html.en: Apache default foreign language file found. All default files should be removed from the web server as they may give an attacker additional system information.
+ OSVDB-3233: /index.html.es: Apache default foreign language file found. All default files should be removed from the web server as they may give an attacker additional system information.
+ OSVDB-3233: /index.html.fr: Apache default foreign language file found. All default files should be removed from the web server as they may give an attacker additional system information.
+ OSVDB-3233: /index.html.it: Apache default foreign language file found. All default files should be removed from the web server as they may give an attacker additional system information.
+ OSVDB-3092: /fcgi-bin/echo: The FastCGI echo program may reveal system info or lead to other attacks.
+ OSVDB-3092: /fcgi-bin/echo2: The FastCGI echo2 program may reveal system info or lead to other attacks.
+ OSVDB-3233: /j2ee/: j2ee directory found--possibly an Oracle app server directory.
+ OSVDB-3233: /icons/README: Apache default file found.
+ 3823 items checked: 25 item(s) reported on remote host
+ End Time: 2010-05-26 10:44:18 (643 seconds)
---------------------------------------------------------------------------
+ 1 host(s) tested

There are all kinds of things to see here (some highlights):
-The server describes into detail what software is used. Not a problem on itself, but it gives very much information which can perfectly be used to attack the server, or which can be used to determine if the server is vulnerable:

+ Server: Oracle-Application-Server-10g/10.1.2.0.2 Oracle-HTTP-Server OracleAS-Web-Cache-10g/10.1.2.0.2

-Nikto tells there are newer versions of the OHS:

+ Oracle-Application-Server-10g/10.1.2.0.2Oracle-HTTP-ServerOracleAS-Web-Cache-10g/10.1.2.0.2(G;max-age=0+0;age=0;ecid=201564454590,0) appears to be outdated (current is at least 10.1.3.1.0)

-Nikto has determined this host is vulnerable to Cross Site Scripting (XSS):

+ OSVDB-27487: Apache is vulnerable to XSS via the Expect header

-Nikto found some files which gives away even more detailed information:

+ OSVDB-3092: /fcgi-bin/echo: The FastCGI echo program may reveal system info or lead to other attacks.

This is very much and detailed information about the configuration of this server.

%d bloggers like this: