05 Sep 2015

BUG: Hyperic 5.8.4 Agent Causes 100% CPU Utilisation After Replacing Hyperic Server SSL Certificates

Following on from my previous blog post where I mentioned that we’ve discovered a bug in the Hyperic 5.8.4 client (on both Windows and Linux), I think it’s only fair that I share our findings. It’s a bug that we discovered whilst deploying a very large vRealize Suite (two maximum sized global clusters of vROPS, vRLI, Hyperic and vRA/vRO).

Whilst carrying out some testing in my lab surrounding the impact of replacing SSL certificates in Hyperic, I noticed that if for whatever reason authentication between the Hyperic agent and Hyperic server fails, the Hyperic agent increases CPU utilisation of the client machine it’s running on to between 85% and 100%. At first I thought that it’s an anomaly, but I was then able to reproduce the symptoms a further 3 times in proving to VMware GSS that the issue really does exist. A long story short

, VMware GSS has opened a bug ticket with engineering and it should be resolved in a future release I believe.

Now, if you are running Hyperic 5.8.4 and you are looking to replace SSL certificates for an implementation that is already running, the task is relatively straight forward, although I will not be covering how to replace the Hyperic SSL certificates as part of this post. The successful execution of replacing SSL certificates for the Hyperic server depends on how the Hyperic agents that are currently reporting back into the Hyperic server were configured when they were deployed.

The default Hyperic agent configuration, which can be found in the agent.properties file, contains a configuration line that will ultimately determine if the agent will accept the new SSL certificate presented by the Hyperic server or not. If the Hyperic agent was pushed out using the default settings within the agent.properties file, with the exception of the “agent.setup.<setting>” lines, you will most probably encounter the bug when replacing your SSL certificates on the Hyperic Server.

Reproducing the issue

To reproduce the issue in a lab, simply:

1. Deploy a new Hyperic Server instance

2. Deploy a few new Windows and/or Linux server with the agent installed, using the default agent.properties configuration.

3. Confirm that the agents are monitored from within Hyperic

4. Replace the Hyperic Server SSL Certificate

5. Monitor the platforms from the Hyperic web interface and confirm that they show as red (unavailable) after a few minutes

6. Monitor the agent machines CPU utilisation and agent.log

What’s the cause?

The default agent.properties file contains the following lines:

## Automatically accept unverified certificates
accept.unverified.certificates=false

With this accept.unverified.certificates (notice the plural) property set to false, the Hyperic agent will not accept the new Hyperic server certificate and will therefore log the following in the agent.log file:

[SenderThread] [AgentCallbackClient@168] javax.net.ssl.SSLPeerUnverifiedException: The authenticity of host 'vrhs01.spiesr.com' can't be established: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated

javax.net.ssl.SSLPeerUnverifiedException: The authenticity of host 'vrhs01.spiesr.com' can't be established: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated

        at org.hyperic.util.security.DefaultSSLProviderImpl$1.verify(DefaultSSLProviderImpl.java:139)

        at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:390)

        at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:148)

        at org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:149)

        at org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121)

        at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:561)

        at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:415)

        at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)

        at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)

        at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)

        at org.hyperic.util.http.HQHttpClient.post(HQHttpClient.java:81)

        at org.hyperic.util.http.HQHttpClient.post(HQHttpClient.java:57)

        at org.hyperic.lather.client.LatherHTTPClient.invoke(LatherHTTPClient.java:111)

        at org.hyperic.hq.bizapp.client.AgentCallbackClient.invokeLatherCall(AgentCallbackClient.java:162)

        at org.hyperic.hq.bizapp.client.AgentCallbackClient.invokeLatherCall(AgentCallbackClient.java:146)

        at org.hyperic.hq.bizapp.client.MeasurementCallbackClient.measurementSendReport(MeasurementCallbackClient.java:62)

        at org.hyperic.hq.measurement.agent.server.SenderThread.sendBatch(SenderThread.java:457)

        at org.hyperic.hq.measurement.agent.server.SenderThread.sendData(SenderThread.java:645)

        at org.hyperic.hq.measurement.agent.server.SenderThread.run(SenderThread.java:630)

        at java.lang.Thread.run(Thread.java:745)

Caused by: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated

        at sun.security.ssl.SSLSessionImpl.getPeerCertificates(SSLSessionImpl.java:421)

        at org.apache.http.conn.ssl.AbstractVerifier.verify(AbstractVerifier.java:128)

        at org.hyperic.util.security.DefaultSSLProviderImpl$1.verify(DefaultSSLProviderImpl.java:137)

        ... 19 more

 This is expected behaviour. What is not expected is what happens next. The agent keeps retrying indefinitely, going into a loop of connecting -> receiving the new SSL certificate -> rejecting the new SSL certificate. With this constant retrying, the agent uses up to 100% of the available CPU power.

Whilst the agent is in this loop, you can simply edit agent.properties file and change the line from:

## Automatically accept unverified certificates
accept.unverified.certificates=false

to read:

## Automatically accept unverified certificates
accept.unverified.certificates=true

Once the change is made, save the agent.properties file. Without having to restart the Hyperic agent, you’ll notice that the CPU utilisation has immediately dropped to normal levels and that the platform will show as green in Hyperic after a few minutes.

This is a major issue. If you have a thousand “platforms” in Hyperic all communicating back using this version of the agent configured to not accept unverified certificates (i.e. the default configuration), you’ll probably bring down all of those platforms in a reverse-DDOS style internal attack (if a term like that even exists), simply by replacing that single SSL server certificate on the Hyperic server.

As mentioned before, we have opened a support request with VMware GSS and after having to reproduce the issue and upload DEBUG logs to GSS, they have now acknowledged that it is an issue and that a bug report has been submitted.

 

Deploying Hyperic?

When preparing the agent.properties file prior to rolling out the Hyperic agent to your estate, the accept.unverified.certificates property should NOT be confused with the agent.setup.acceptUnverifiedCertificate property. The agent.setup.acceptUnverifiedCertificate property is only used for the initial agent configuration, where it will accept the initial SSL certificate presented by the Hyperic server. Once this certificate has been accepted, a change to the Hyperic server certificate will only be accepted if the accept.unverified.certificates property has been set to true.

I really hope those with the default agent configuration who wishes to replace their Hyperic server certificates, find this blog post, or at least test it in a lab first, before attempting it, as it could cause major performance problems on all their servers (platforms) with this agent configuration in place.

 

Written by  3 comments
Last modified on Saturday, 05 September 2015 11:21
Rate this item
(1 Vote)

Comments (3)

  1. Piotr

Hello,
I'm trying to replace ssl certificate in hyperic server and I get this in log:
Private key entry with alias hq differs from persisted version, overriding local file keystore (REQUIRES SYSTEM RESTART)
And keystore is overwritten.
Do you...

Hello,
I'm trying to replace ssl certificate in hyperic server and I get this in log:
Private key entry with alias hq differs from persisted version, overriding local file keystore (REQUIRES SYSTEM RESTART)
And keystore is overwritten.
Do you know how to replace the ssl certificate in hyperic server?

Regards,
Piotr

Read More
  Attachments
 
  1. Rynardt Spies    Piotr

Yes, I've replaced Hyperic Server certificates many times in the past. I've actually got a blog post drafted on the subject and will publish this soon.

The best thing to do is to make a backup of the original hyperic.keystore file and then...

Yes, I've replaced Hyperic Server certificates many times in the past. I've actually got a blog post drafted on the subject and will publish this soon.

The best thing to do is to make a backup of the original hyperic.keystore file and then replace the original keystore with a brand new keystore file which contains the CA roots, the new Hyperic server certificate and private key. I would suggest you do that, rather than adding your new certificate to the existing hyperic.keystore file.

Once the keystore has been changed, before staring the Hyperic server service, you need to remove the private key from the DB.
On the Postgress Database Server:
1. Log into the database:
/opt/vmware/vpostgress/9.2/bin/psql HQ hqadmin
2. See what keys are currently in the keystore table:
SELECT id, alias_name, type FROM EAM_KEYSTORE WHERE TYPE='PrivateKeyEntry' AND alias_name='hq';
3. Delete the key from the database. When the server is started, a new key entry will be made to the table in order to replace the deleted key.
DELETE FROM eam_keystore WHERE TYPE='PrivateKeyEntry' AND alias_name='hq';
On the Hyperic Server
4. Start the hq-server service
/opt/hyperic/server-5.8.4-EE/bin/hq-server.sh start

I hope this helps


Regards

Rynardt

Read More
  Attachments
  Comment was last edited about 3 years ago by Rynardt Spies Rynardt Spies
  1. Piotr    Rynardt Spies

Hello!
This is priceless knowledge, this information does not exist anywhere on the Internet. I wondered many times how to remove the keys from the database. Thank you, it's really a great thing :-)
Hyperic is a powerful tool we use it...

Hello!
This is priceless knowledge, this information does not exist anywhere on the Internet. I wondered many times how to remove the keys from the database. Thank you, it's really a great thing :-)
Hyperic is a powerful tool we use it extensively. Maybe you have an idea how to replace VMware AppHA in vSphere 6? No AppHA in vCloud Suite 6 is a very serious matter.

Best regards,
Piotr

Read More
  Attachments
 
There are no comments posted here yet

Leave your comments

Posting comment as a guest. Sign up or login to your account.
0 Characters
Attachments (0 / 3)
Share Your Location