Troubleshooting Tips

Stuck task on VM

This is by no means a new issue. However, I still get support calls regarding tasks that get stuck on VMs. What do I mean by "stuck tasks"? Well, I've seen cases where a snapshot task initiated by VCB got stuck in the state of "Creating Virtual Machine Snapshot". Then VM goes down and cannot be accessed via the console, does not respond to pings, and the status of VMtools turns to "Unknown". You also cannot do "Power On" on the VM either as the "Creating Virtual Machine Snapshot" task is still showing as an active task. You can wait, but after 30 minutes, chances are that it won't sort itself out, so user intervention is required!

This is normally the approuch I take to sort this out:

1. Make sure that the VM is inaccessible to everyone and that it really is down.

2. Browse the datastore where the VM is located (best to do this via the CLI on the service console with "ls -lh") and check the time stamps of the files to see how log the snapshots, if any,have been sitting there for.
3. in VirtualCenter, or "vCenter" the VM will probably still be showing as powered on. Check on which of your ESX hosts it is running.
4. Log onto the service console of the ESX host that is running the VM. Elevate your priviledges to root.
5. Now, as the VM has an active task, you won't be able to send any other commands to the VM. You won't be able to use vmware-cmd to change the state of the VM either. Until the task that's stuck in progress has completed, the ESX host will not be able to send any power commands to the VM. The only way to now release the VM from it's sorry state and get rid of the "Active task" is to kill the VM's running process from the service console. In order to do so, you need to find the PID for the "running" VM. To get the PID do:

The Syntax is:
ps -auxwww |grep <VM-NAME>

Example:
Suppose you have a VM called WKSTNL01 The command will be:
ps -auxwww |grep WKSTNL01

This should return something like this:

root 12322 0.0 0.4 3140 1320 ? S<s 13:32 0:03 /usr/lib/vmware/bin/vmkload_app --sched.group=host/user/pool1 /usr/lib/vmware/bin/vmware-vmx -ssched.group=host/user/pool1 -# name=VMware ESX;version=4.0.0;buildnumber=164009;licensename=VMware ESX Server;licenseversion=4.0 build-164009; -@ pipe=/tmp/vmhsdaemon-0/vmx673aca8b7403868b; /vmfs/volumes/489a1228-2bfd25b5-6a2c-000e0cc41e52/WKSTNL01/WKSTNL01.vmx

The PID in this instance is 12322. This is what we need to kill.

6. Kill the process ID with kill -9:

kill -9 12322

7. Delete any snapshots created

8. Power On the VM.

Published in vSphere: Virtual Machine Operations

UPDATE: VMware Tools reports incorrect status after VCB Backup

After patching some test ESX hosts with ESX 3.5 Update 4, the problem with the VMware tools being shown as "Not running" after a VCB backup operation seems to have been solved. This has cured some backup woes at least.

I will now run ESX 3.5 Update 4 in a test cluster (with virtual machines that will be backed up with VCB) for a few of weeks before updating production ESX hosts to Update 4.

Published in vSphere: Virtual Machine Operations

ESX 3.5 Update 3 Problem with HA

There is a problem with the with HA in VMware ESX 3.5 Update 3. Virtual Machines may reboot unexpectedly when migrated with VMotion or after a Power On operation. This is only when the Virtual Machine is running on an ESX 3.5 Update 3 Host and the ESX Host has VMware HA enabled with "Virtual Machine Monitoring" option active.To work around this problem:

Option 1: Disable Virtual Machine Monitoring

1. Select the VMware HA cluster and choose Edit Settings from the right-click menu.

2. In the Cluster Settings dialog box, select VMware HA in the left column.

3. Un-Check the Enable virtual machine monitoring check box.

4. Click OK.

Option 2: Set hostd hearbeat delay to 0

1. Disconnect the host from VC (Right click on host in VI Client and select "Disconnect" )

2. Login as root to the ESX Server with SSH.

3. Using a text editor such as nano or vi, edit the file /etc/vmware/hostd/config.xml

4. Set the "heartbeatDelayInSecs" tag under "vmsvc" to 0 seconds as shown here:

<vmsvc>
<heartbeatDelayInSecs>0</heartbeatDelayInSecs>
<enabled>true</enabled>
</vmsvc>

5. Restart the management agents for this change to take effect.

service mgmt-vmware restart

6. Reconnect the host in VC ( Right click on host in VI Client and select "Connect" )

Published in vSphere: DRS/HA/VMotion/FT

Strange VCB Snapshot Issue on Update 2

I don't know if anyone else has come across this issue, but since upgrading to ESX 3.5 Update 2 we've been having strange problems with VCB snapshots. I've not had much time for troubleshooting in the last 3 or so weeks, but I found a workaround. Before I get ahead of myself, let me just first explain the issue we are having.

When backing up our VMs with VCB, the snapshot delta files created by VCB needs to be merged back into the main VMDK. However, for the past few weeks I've seen cases on some of our ESX hosts where the snapshots never gets merged back, and the delta files just keep on stacking up every time a snapshot is created. In other words, here’s what happens:

1. VCB creates a snapshot of a VM. This creates delta files such as VMNAME-000001.vmdk.
2. When the backup process completes, the snapshot delta VMDK is supposed to merge back into the main or its parent VMDK file but fails to do so. Now, this is not normally a problem, as you can just go and "delete" the snapshot using the snapshot manager in the VI Client.
3. However, when you go to the snapshot manager in the VI client, there are no VCB snapshots but there may be a "consolidate helper" snapshot. Even if I delete this snapshot, the process fails to merge the VMDK files back.
4. If I then create another snapshot manually using the snapshot manager, this creates the second set of delta files such as VMNAME-000002.vmdk.
5. When I then try to delete the snapshot, the VI Client reports the Virtual Machine as having no snapshots, however when browsing the data store, I can still see all the delta files. Also, when I log onto the ESX server where the VM is running, and issue vmware-cmd /vmfs/volumes/<DATASTORE>/<VMNAME>/<VMNAME>.vmx hassnapshot, the ESX server returns no snapshots for that VM.

This is a strange problem. The Virtual Machine clearly has snapshots delta files in its data store; however the ESX host is unaware of any snapshots for that VM.Now I did find a workaround for this problem, but I've been unable to find the root cause of this problem as I've been way to busy the last few weeks to have a good look at it.

The workaround is:

1. Log onto the console of the ESX host where the VM with snapshot problems is running on.
2. Restart the management agent on that server with service mgmt-vmware restart.
3. In the VI Client, go to the snapshot manager and manually create a snapshot for the VM (without a memory snapshot).
4. Now, "Delete" all snapshots. This should merge all delta files back into the main VMDK file.

The workaround suggests that there is a problem with the management agent on the ESX hosts, but if so, it has to be something in the Update 2 release as my cluster has 16 hosts and the problem seems to be popping up on random hosts daily. I am now planning to upgrade to Update 3 to see if that will clear the problem.

If anyone else has come cross a similar issue, please drop me an email or a comment.

Published in ESX/ESXi Server Admin and Maintenance

VMware ESXi 3.5 on HP Proliant ML110 G5

Ok, so we've looked at installing VMware ESX 3.5 on the ML110 G5. Now it's time to look at installing ESXi 3.5 Installable on the HP Proliant ML110 G5.

This should all work well, providing that you have the correct storage controller settings specified in the BIOS setup of the server.

To install ESXi 3.5 on the ML110 G5, follow these easy steps:

Published in VMware ESX on Whitebox Servers

ESX 3i: Host in HA Cluster must have userworld swap enabled

As of VMware VirtualCenter 2.5 Update 1, ESX Server 3i systems can only be added to an HA cluster if the system has swap enabled.

This article applies to:

VMware ESXi 3.5.x Embedded
VMware ESXi 3.5.x Installable
VMware VirtualCenter 2.5.x

ESX 3i Servers with swap not enabled will show the following message(s):

An error occured during configuration of the HA agent of the host.
HA Agent has an error : Host in HA cluster must have userworld swap enabled.

To enable swap on the ESX 3i Server:

Published in vSphere: DRS/HA/VMotion/FT

ESX 3.5 on HP Proliant ML115 G5

We know now that VMware ESX Server 3.5 works well with the HP Proliant ML110 G5. So, I thought I'd try running it on the HP Proliant ML115 G5. This however did not go as well a I thought it would.The integrated NIC in both the ML110 and ML115 is not supported, that I knew, so I installed an Intel Pro1000 GT NIC in the ML115. This card is supported so I was able to install VMware ESX 3.5 on the ML115 with no problems at all.

However, when the server booted up and started loading the VMKernel modules... that was where it all went horribly wrong. The server could not mount the root partition!!! Read on, because there is a fix to this problem.

Published in VMware ESX on Whitebox Servers

Veeam FastSCP on Windows Server 2003 x64

Those of us who have tried running Veeam FastSCP on 64-bit Windows found that it installs perfectly, but once you try and run the application, it comes up with an error. Someting like "Unable tp connect to (local\VEEAM)".

It's sad to say, but Veeam FastSCP does not support 64-bit operating systems at this moment in time. However I found a little workaround that will allow you to use the application on 64-bit Windows, at your own risk and unsupported it has to be said.

So here's how:

1. Download the Microsoft .NET Framework 2.0 SDK (THE 64-BIT VERSION!) from the Microsoft Website. This download is about 300MB if I remember correctly.

2. Install the SDK on the 64-bit machine that you would like to run Veeam FastSCP on.

3. Now, open a command prompt (Start -> Run -> Type "cmd" -> OK)
4. Change directory to: C:\Program Files\Microsoft.NET\SDK\v2.0 64bit\Bin
5. Now Run: corflags "C:\Program Files (x86)\Veeam\Veeam Backup and FastSCP\VeeamShell.exe" /32BIT+
6. Now when you try and run Veeam FastSCP again, it should work fine.

Published in General 3rd Party Applications

Patching ESX 3.5 Using esxupdate

Just a quick guide to patching a standalone ESX 3.5 Server using esxupdate from the service console:

Download all available patches to your local computer from the following link: (ALSO DOWNLOAD THE FILE CALLED contents.zip)

VMWare Patch Download Page

Decide where to place the patches in a partition with enough space to accommodate the patches on the target ESX Server. It is not recommended to use the root ( / ) partition at all. A good strategy is to create a directory called updates under the /var partition.

Published in ESX/ESXi Server Admin and Maintenance

Stuck task on VM

UPDATE: VMware Tools reports incorrect status after VCB Backup

ESX 3.5 Update 3 Problem with HA

Strange VCB Snapshot Issue on Update 2

VMware ESXi 3.5 on HP Proliant ML110 G5

ESX 3i: Host in HA Cluster must have userworld swap enabled

ESX 3.5 on HP Proliant ML115 G5

Veeam FastSCP on Windows Server 2003 x64

Patching ESX 3.5 Using esxupdate

Archives

Let's socializ(s)e

Certifications and Awards