Home > CA SSL Certificates, Security, VMware > The Trouble with CA SSL Certificates and ESXi 5

The Trouble with CA SSL Certificates and ESXi 5

For those of you that follow me on Twitter you’ll know that I’ve been having some fun this week with changing out the default VMware generated SSL certificates on a greenfields deployment of vSphere 5 that will be supporting a large public cloud. Changing certificates is nothing new, and in environments that are concerned with security it is common practice. However it has been my experience that changing certificates with ESX(i) and vCenter has always been a bit of a challenge (I have done it on vSphere 4.x before this).  It can be very time consuming and error prone, especially if you haven’t done it before. One of the things that makes it hard for people to get this right is that there is no one document or source of truth that explains in sufficient detail what the requirements and supported configurations are or how to implement CA signed ssl certificates in ESX(i) and vCenter Server. This has tripped up many organizations both large and small. I’m hoping that the information in this article will help and encourage more people to change out the default certs (to improve security), and make the process far more reliable and easier to achieve with vSphere 5. This article will focus on successfully changing the default VMware SSL certificates on ESXi 5 hosts with CA signed certificates using a Microsoft CA (it will also work with public and OpenSSL CAs, but I have not tested it yet).

If you want a way to fully manage the certificate lifecycle and replace certs automatically then you’ll want to check out vCert Manager – Changing VMware SSL Certs Made Easy. This will completely automate the SSL certificate process in vSphere environments. 

General Information on X.509 Certificates

For anyone that doesn’t know what an X.509 certificate is here are a couple of links that will explain it. The first one is a good human readable explanation and the second is the actual specification published by the Internet Engineering Taskforce (IETF).

Wikipedia.org – x.509

IETF RFC 3280 – X.509

Each component in your vSphere Infrastructure uses these X.509 SSL certificates for secure encrypted communications. Each SSL certificate is uniquely generated for each component and ties to the FQDN of the component. So this means every ESXi 5 server has a certificate generated for it based on it’s unique FQDN, as does vCenter, vCenter Update Manager, SRM, vShield Manager and any other components you may be using. This ensures non-repudiation. That is, every system knows that is communicating with the other system that it expects, and it’s not an imposter. This means you can’t just take one cert generated for vCenter for example and apply it to all of your hosts. I have not tested using wildcard certificates (*.domain) with vSphere 5, but in earlier versions some components didn’t support them. From a security standpoint it is much better to have a single SSL cert tied to a single host by FQDN.

VMware CA SSL Certificate Resources

While I was working through generating and applying the certificates in the environment I was working in over the last week I ran through all of these resources below. All have some good information. But none is a complete end to end guide on how to generate and apply the certificates that will work reliably with vSphere 5 (which is the reason I had to review them all). The reason why I went through some of the older material on this is because it is largely still relevant. But there are some subtle changes to the way that vSphere 5 works that you need to know about to be successful. You will also notice as you read through all of these documents and kb articles that there is a lack  of consistency.

vi_vcenter_certificates.pdf

How to use CA certificate to replace VMware certificate on ESX(i) 4 and vCenter

After upgrading to vSphere 5, you see the HA error: vSphere HA Cannot be configured on this host because its SSL thumbprint has not been verified

Configuring HA after upgrading to vCenter Server 5.0 fails with the error: Cannot complete the configuration of the vSphere HA agent on the host. Misconfiguration in the host setup

vSphere 5 Security Guide

Replacing vCenter Server 4.1 Certificates

Generating Domain Root CA signed certificates for vCenter Server

The Trouble with CA SSL Certificates and vCenter 5

Import an OpenSSL CSR into a Windows CA

Now for the Trouble

You’ll notice in the above list of resources are a couple of VMware KB articles referring to issued with the new VMware HA after you change the SSL certificates for CA signed certificates, or after an upgrade where it was previously done. This is the problem that I ran into both in my lab and also in my customers environment (both are new builds with CA SSL certificates). However following the steps in the KB was not successful.

The reason these errors occur in the first place is that FDM, which is the New VMware HA, enforces SSL certificate verification for communication whenever a host is configured for VMware HA. So you need to make sure that you also have vCenter set to verify host certificates (vSphere Client, Menu Bar – Administration > vCenter Server Settings > SSL Settings > tick vCenter requires verified host SSL certificates, which is the default setting), otherwise you won’t be able to use HA. This is fantastic for security, but unfortunately there is a bug (is expected to be fixed in vSphere 5.0 U1) that means the new thumbprints on hosts that have had their SSL certificates changed don’t end up in the vCenter database (not good). As a result when you try and configure VMware HA after an upgrade where the certs have been changed, or straight after you’ve changed them in a new environment, VMware HA configuration will fail.

You might see an error message such as this:

vSphere HA Timeout ErrorAnd also see something like this, an HA election that never ends:

vSphere HA Never Ending Election

You might also see this in your fdm.log in /var/log on your ESXi Host (related to the above picture):

Feb  3 23:38:37 vmserver12 Fdm: [7D620B90 info ‘Cluster’ opID=SWI-6cc6b9b8] Change state to Startup:0
Feb  3 23:38:38 vmserver12 Fdm: [7D620B90 info ‘Cluster’ opID=SWI-6cc6b9b8] Change state to SlaveConnecting:146874673025
Feb  3 23:38:38 vmserver12 Fdm: [7D620B90 info ‘Election’ opID=SWI-6cc6b9b8] Slave to host @ 192.168.3.222
Feb  3 23:38:42 vmserver12 Fdm: [7D59EB90 info ‘Cluster’ opID=SWI-f5f44234] [ClusterManagerImpl::MainLoop] curState 4 lastState 3
Feb  3 23:38:44 vmserver12 Fdm: [7D620B90 info ‘Cluster’ opID=SWI-6cc6b9b8] Change state to Slave:146874673025
Feb  3 23:38:56 vmserver12 Fdm: [7D7E7B90 info ‘Election’] MasterShutdown
Feb  3 23:38:59 vmserver12 Fdm: [7D6E3B90 info ‘Message’] Destroying connection
Feb  3 23:39:00 vmserver12 Fdm: [7D620B90 info ‘Cluster’ opID=SWI-6cc6b9b8] Change state to SlaveConnecting:146874673025
Feb  3 23:39:04 vmserver12 Fdm: [7D59EB90 info ‘Cluster’ opID=SWI-f5f44234] [ClusterManagerImpl::MainLoop] curState 3 lastState 1
Feb  3 23:39:04 vmserver12 Fdm: [7D59EB90 info ‘Cluster’ opID=SWI-f5f44234] [ClusterManagerImpl::MainLoop] curState 4 lastState 3
Feb  3 23:39:24 vmserver12 Fdm: [7D5DFB90 warning ‘Libs’ opID=SWI-f67a1d5c] SSL_VerifyX509: Certificate verification is disabled, so connection will proceed despite the error
Feb  3 23:39:24 vmserver12 Fdm: [7D5DFB90 warning ‘Libs’ opID=SWI-f67a1d5c] SSL_VerifyX509: Certificate verification is disabled, so connection will proceed despite the error
Feb  3 23:39:44 vmserver12 Fdm: [7D620B90 info ‘Election’ opID=SWI-6cc6b9b8] Slave to host @ 192.168.3.222
Feb  3 23:39:46 vmserver12 Fdm: [7D6E3B90 warning ‘Libs’ opID=SWI-a257b9a0] SSL_VerifyX509: Certificate verification is disabled, so connection will proceed despite the error
Feb  3 23:39:58 vmserver12 Fdm: [7D620B90 info ‘Election’ opID=SWI-6cc6b9b8] Slave to host @ 192.168.3.222
Feb  3 23:40:13 vmserver12 Fdm: [7D620B90 info ‘Cluster’ opID=SWI-6cc6b9b8] Change state to Startup:0
Feb  3 23:40:21 vmserver12 Fdm: [7D620B90 info ‘Cluster’ opID=SWI-6cc6b9b8] Change state to Startup:0
Feb  3 23:40:38 vmserver12 Fdm: [7D59EB90 verbose ‘Cluster’ opID=SWI-f5f44234] [ClusterManagerImpl::CheckElectionState] Transitioned from Startup to SlaveConnecting
Feb  3 23:40:40 vmserver12 Fdm: [7D661B90 warning ‘Libs’ opID=SWI-15378378] SSL_VerifyX509: Certificate verification is disabled, so connection will proceed despite the error
Feb  3 23:40:40 vmserver12 Fdm: [7D661B90 warning ‘Libs’ opID=SWI-15378378] SSL_VerifyX509: Certificate verification is disabled, so connection will proceed despite the error
Feb  3 23:40:44 vmserver12 Fdm: [7D620B90 info ‘Election’ opID=SWI-6cc6b9b8] [ClusterElection::ChangeState] SlaveConnecting => Slave : SlaveConnectingStateFunc
Feb  3 23:40:50 vmserver12 Fdm: [7D765B90 warning ‘Libs’ opID=SWI-16099a2a] SSL_VerifyX509: Certificate verification is disabled, so connection will proceed despite the error
Feb  3 23:40:54 vmserver12 Fdm: [7D6A2B90 info ‘Election’] MasterShutdown

Now for the Fix

To resolve this situation you need to add one additional step to the process that is outlined in After upgrading to vSphere 5, you see the HA error: vSphere HA Cannot be configured on this host because its SSL thumbprint has not been verified. The step is this:

After changing the certificates, restarting the management agents on the host, and existing maintenance mode, wait for HA to configure and fail. Once the exit maintenance mode task is completed disconnect and reconnect the host to vCenter.

Now you can use either of the methods mentioned in KB 2006210 to fix the SSL certificate thumbprint problem. My preference is to use the pearl script via the vSphere API as this doesn’t require vCenter to be shut down. Once the fix has been applied as per the KB you will once again need to reconfigure VMware HA on the host. You will now notice that it is functioning correctly. Now for the step by step process I used.

Step by Step ESXi Host SSL Certificate Replacement using Windows and a Microsoft CA

You could execute a similar process to the one I’m about to describe using an OpenSSL or Public CA and using the Unix/Linux version of OpenSSL, however this is how I did it successfully in my lab and with my customer. As mentioned in the vSphere 5 Security Guide VMware uses X.509 v3 SSL certificates (base-64 encoded) for encrypting traffic between various components. If you CA has been set to support only SHA512 hash that is fine, it will work, although the VMware documentation doesn’t mention it. The two key files for an ESXi host are rui.crt and rui.key.

In order to generate the certificates you’ll need to get a copy of OpenSSL x86 v0.98r or higher, and have access to a Microsoft CA (2000 or higher). The certificates will use a standard web server request template. On the system where you will generate the certificate signing request (rui.csr) you will need to ensure you have Microsoft Visual C++ 2008 Redistributable Package (x86) before installing OpenSSL. For the purposes of this process you will use the Microsoft CA Web Pages to submit the certificate request and download the resulting base-64 encoded certificate. You can use the certreq command if you wish also (not covered here). Ensure you have a vSphere Management Appliance v5 (vMA) deployed in your environment, you will use this to execute the HostReconnet.pl script to save you having to shut down vCenter during the process (hopefully won’t be needed when vSphere 5.0 U1 is available). Before applying the certificates to your environment you should ensure that your clients and vCenter server trust your CA, if it’s an AD integrated CA this should be automated, else you may have to pre-trust the Root or Intermediary CA  by loading the CA public cert into your clients and vCenter server (not covered in this process).

Prerequsites:

Microsoft CA (2000 or above, with Web Server Template configured to your liking)
Microsoft Visual C++ 2008 Redistributable Package (x86) on the system where you will generate the certificate signing request (CSR)
OpenSSL 0.98r or above on the system you will use to generate the CSR
vSphere Management Assistant v5 (vMA)
FinalHostReconnect.rar, which contains HostReconnect.pl and can be obtained from VMware KB 2006210
Putty or other SSH client
WinSCP or other SFTP / SCP client
vCenter 5.0
ESXi 5.0
Assumes that the ESXi 5.0 hosts are in a cluster with VMware HA enabled.

Process Step by Step:

  1. After having installed Microsoft Visual C++ 2008 Redistributable Package (x86) and Open SSL 0.98r or later on a management system (vCenter or other system, not the CA) open a command prompt and change to the OpenSSL\bin folder.
  2. Edit the openssl.cfg file and ensure it looks similar to the one included at the bottom of this article but with your organization specific information, save the configuration.
  3. Execute the following command – openssl req -new -nodes -out rui.csr -keyout rui.key -config openssl.cfg.
  4. If you have specified all the relevant organization information in the OpenSSL configuration you will only have to specify the Common Name, which will be the FQDN of your ESXi host, and enter twice (i.e. blank/no password) when it asks you for a password at the end.
  5. Copy or submit rui.csr to your CA, using the Web Server template, and download the base-64 encoded certificate to the system with OpenSSL that was used to generate the CSR (Screenshots of this available here: How to use CA certificate to replace VMware certificate on ESX(i) 4 and vCenter).
  6. Create a folder on the system used to generate the CSR to back up the existing VMware default certificates that are on the host.
  7. Enable SSH on the target host, ensure lockdown mode is disabled, and then put it into maintenance mode.
  8. Using WinSCP or other SFTP/SCP client change directory on the target host to /etc/vmware/ssl and copy the rui.crt and rui.key files off  the host to your backup folder that you created in step 6.
  9. Delete rui.crt and rui.key from the target host.
  10. Copy the new rui.crt and rui.key files that were generated to the target host in /etc/vmware/ssl, be sure to use Text Mode or ASCII Mode transfer, otherwise you will have problems with special characters (^M) ending up in the certificate file and the process will fail.
  11. Open up a console through the remote management card or KVM to the target host and log in as root to the Direct Console User Interface (DCUI – F2 on the console screen).
  12. Scroll down the screen till you reach Troubleshooting Options, then press enter.
  13. Scroll down to Restart Management Agents, then press enter.
  14. Press F11 to restart the management agents (vpxa etc).
  15. After the management agents are restarted press escape a couple of times till you log out of DCUI.
  16. Ensure that you have copied the HostReconnect.pl script to your vMA v5, you will need it soon.
  17. Take the target host out of maintenance mode, and wait for HA to reconfigure and fail (either time out, task completes and HA continues to say election).
  18. Disconnect and then reconnect the host (this is currently the missing step from KB 2006210).
  19. Once the host is connected and HA agent reconfigured you need to log into your vMA as vi-admin and change directory to where you copied HostReconnect.pl.
  20. If this is the first time running HostReconnect.pl execute chmod u+x on HostReconnect.pl to ensure that you can run the command.
  21. Execute HostReconnect.pl –server <vcenter server fqdn>, enter username and password of a vCenter administrator when prompted.
  22. Monitor the output. You will notice that each host has been reconnected in the vCenter Tasks window. This script reconnects the hosts using their actual thumbprint and updates the expected thumbprint in the vCenter database. Without running this command, or stopping vCenter processes and manually editing the database, the thumbprints will not match and the configuration of HA will fail (as per KB 2006210).
  23. Reconfigure HA on the target host, you should notice that the it works successfully and the host is back to normal.
  24. Repeat the above steps for subsequent ESXi hosts.

Please let me know if you have any trouble with the above process, and also if it works for you, your comments and feedback are appreciated. Steps 16, 18 – 23 will hopefully not be needed when vSphere 5.0 U1 is available. I will write about it again once I’ve tested it.

Example OpenSSL Configuration file (openssl.cfg) without most of the normal comments and white space that is included:

#OpenSSL Configuration Start

HOME            = .
RANDFILE        = $ENV::HOME/.rnd
oid_section        = new_oids

[ new_oids ]

####################################################################
[ ca ]
default_ca    = CA_default        # The default ca section

####################################################################
[ CA_default ]
dir        = ./demoCA        # Where everything is kept
certs        = $dir/certs        # Where the issued certs are kept
crl_dir        = $dir/crl        # Where the issued crl are kept
database    = $dir/index.txt    # database index file.
#unique_subject    = no            # Set to ‘no’ to allow creation of
# several ctificates with same subject.
new_certs_dir    = $dir/newcerts        # default place for new certs.
certificate    = $dir/cacert.pem     # The CA certificate
serial        = $dir/serial         # The current serial number
crlnumber    = $dir/crlnumber    # the current crl number
# must be commented out to leave a V1 CRL
crl        = $dir/crl.pem         # The current CRL
private_key    = $dir/private/cakey.pem# The private key
RANDFILE    = $dir/private/.rand    # private random number file
x509_extensions    = usr_cert        # The extentions to add to the cert
name_opt     = ca_default        # Subject Name options
cert_opt     = ca_default        # Certificate field options
default_days    = 5475            # how long to certify for (e.g. 15 years)
default_crl_days= 30            # how long before next CRL
default_md    = sha512            # which md to use.
preserve    = no            # keep passed DN ordering
policy        = policy_match

# For the CA policy
[ policy_match ]
countryName        = match
stateOrProvinceName    = match
organizationName    = match
organizationalUnitName    = optional
commonName        = supplied
emailAddress        = optional

[ policy_anything ]
countryName        = optional
stateOrProvinceName    = optional
localityName        = optional
organizationName    = optional
organizationalUnitName    = optional
commonName        = supplied
emailAddress        = optional
[ req ]
default_bits        = 2048
default_keyfile     = privkey.pem
distinguished_name    = req_distinguished_name
attributes        = req_attributes
x509_extensions    = v3_ca    # The extentions to add to the self signed cert
input_password = testpassword
output_password = testpassword
string_mask = nombstr

[ req_distinguished_name ]
countryName            = Country Name (2 letter code)
countryName_default        = NZ
countryName_min            = 2
countryName_max            = 2
stateOrProvinceName        = State or Province Name (full name)
stateOrProvinceName_default    = Auckland
localityName            = Locality Name (eg, city)
localityName_default        = Auckland
0.organizationName        = Organization Name (eg, company)
0.organizationName_default    = IT Solutions 2000 Ltd
organizationalUnitName        = Organizational Unit Name (eg, section)
organizationalUnitName_default    = IT
commonName            = Common Name (e.g. server FQDN or YOUR name)
commonName_max            = 64
emailAddress            = Email Address
emailAddress_max        = 64
emailAddress_default        = admin@yourdomain.com

[ req_attributes ]
challengePassword        = A challenge password
challengePassword_min        = 4
challengePassword_max        = 20
unstructuredName        = An optional company name

[ usr_cert ]
basicConstraints=CA:FALSE
nsComment            = “OpenSSL Generated Certificate”
subjectKeyIdentifier=hash
authorityKeyIdentifier=keyid,issuer

[ v3_req ]
basicConstraints = CA:FALSE
keyUsage                 = nonRepudiation, digitalSignature, keyEncipherment, dataEncipherment
extendedKeyUsage         = serverAuth, clientAuth

[ v3_ca ]
subjectKeyIdentifier=hash
authorityKeyIdentifier=keyid:always,issuer:always
basicConstraints = CA:true

[ crl_ext ]
authorityKeyIdentifier=keyid:always,issuer:always

[ proxy_cert_ext ]
basicConstraints=CA:FALSE
nsComment            = “OpenSSL Generated Certificate”
subjectKeyIdentifier=hash
authorityKeyIdentifier=keyid,issuer:always
proxyCertInfo=critical,language:id-ppl-anyLanguage,pathlen:3,policy:foo

#OpenSSL Configuration End

This post first appeared on the Long White Virtual Clouds blog at longwhiteclouds.com, by Michael Webster +. Copyright © 2012 – IT Solutions 2000 Ltd and Michael Webster +. All rights reserved. Not to be reproduced for commercial purposes without written permission.

Advertisements
  1. Nick Evans
    February 6, 2012 at 8:59 pm

    Great post Michael. During my VCAP4-DCA study I went through changing SSL certs and it’s a lengthy process and like you said the documentation is all over the place!
    Will have to give it blast in my lab.

  2. Alain Dolbec
    February 10, 2012 at 10:25 am

    Hi Michael,

    Thanks for sharing. I had myself to go through this but I was able to go through my own CA using openssl. I need now to go through an approved CA and I have been trying to get the specific x509 certificate attributes (keyUsage) that need to be included in the CSR but cannot find anything official about this. I set the attributes as they were in the self-signed ones but the CA will not keep all of those attributes.

    Did you see this specified somewhere? Does it matter?

    Thanks

    Alain

    • February 10, 2012 at 10:35 am

      Hi Alain, I got the keyUsage requirements from the Update Manager KB and also the default certs. The other attributes were a combination of reviewing multiple documents and sources of information in addition to some trial and error. The keyUsage needs to be serverAuth and clientAuth.

  3. vcpguy
    May 16, 2012 at 6:44 am

    Hi, do we need to take all these steps, if there is already a in house PKI

    • May 17, 2012 at 12:42 am

      Yes the steps are required when you are using an in house PKI.

  4. Geoff
    May 25, 2012 at 1:57 pm

    For anyone doing a 5.0.1 (U1) update, once the cert is replaced and you’ve come out of maintenance mode the HA agent will install but the status will be “election” until you disconnect and reconnect the host. As soon as it reconnects the status will change to connected (master or slave) and you’re done. No need to run the HostReconnect script.

    • May 25, 2012 at 2:13 pm

      Hi Geoff,

      That procedure is if you don’t want to run the host reconnect script. Essentially the host reconnect script does the same thing. However if you’re on vCenter 5.0 U1 there is no need to even do that as the bug that caused this problem is fixed.

  5. Geoff
    May 26, 2012 at 12:05 am

    OK, got it, thanks Michael. I never looked at the script – thought it was doing more than just a disconnect/reconnect. But it’s interesting that my hosts stayed in “election” status until I disconnected and reconnected. Sounds like I shouldn’t have needed to do this with U1?

    • May 26, 2012 at 7:50 am

      Correct, the hosts should have been ok after exiting maintenance mode, as the expected SSL thumbprint should have been updated in the vCenter database.

  6. Ryan
    June 29, 2012 at 3:51 am

    Great post! We are using an internal CA. My question is, do we need to update the vCenter SSL cert first and then all of the hosts or does it matter?

  1. February 7, 2012 at 12:15 am
  2. February 24, 2012 at 6:53 am
  3. March 6, 2012 at 9:21 pm
  4. March 6, 2012 at 10:10 pm
  5. March 29, 2012 at 5:42 am

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: