Rynardt Spies

Rynardt Spies

SP_1SP_7 SP_AUGUST SP_2SP_0SP_1SP_9

In ESX 3.5, when trying to create a new snapshot, the following error is reported in the VI Client:

An invalid snapshot configuration was detected


Also, when reading the Virtual Machine log file (vmware.log) located in the same location as the VM, you may find references to delta files such as "vm_name-000001.vmdk", but when browsing the datastore, the delta file does not exists. Also, when running vmware-cmd hassnapshot,  "hassnapshot ()= " is returned, which means that the VM has no snapshots in place.

If this is truly the case, why is "An invalid snapshot configuration was detected" returned when trying to create a new snapshot?

VMware records snapshot information about the current VM in a .vmsd file. This file is located with the rest of the virtual machine configuration and VMDK files on the datastore. The file is normally called .vmsd. This file will contain information, even if your VM has no snapshots in place.

It is possible that in some cases, ESX fails to properly clean up after previous snapshots were removed. The information for previous snapshots may still be recorded in the .vmsd file. The file may indicate that you still have snapshots in pace, although all previous snapshots were removed and the dalta files have been merged. When you then try to create a snapshot, the .vmsd file will inform the ESX host that there is a delta file in place and that it has to create a second or third delta file. When the ESX host interigates the VMFS file system, it's unable to find the snapshot delta files specified in the .vmsd file and therefore errors with "an invalid snapshot configuration was detected."

Read more...
SP_1SP_7 SP_AUGUST SP_2SP_0SP_1SP_9

Today I was planning on testing the new 16 patches released by VMware for vSphere 4. I wanted to place these on my 2nd ESX host. I normally place my ESX hosts in maintenance mode before I remedaite updates. As I placed esx2 in maintenance mode, the VMs, as expected, started to migrate over to the other hosts in the cluster with VMotion. The VMotion migration of two of my VMs running Windows XP, failed with the following error message: 

A general system error occurred: Failed to write checkpoint data (offset 33558328, size 16384): Limit exceeded

It turns out that a VM must have less than 30MB Video RAM or VRAM assigned in order to be compatible with VMotion. As I normally run these two VMs at 1680 x 1050 resolution, I went all out an assigned the maximum amount of memory allowed, which is 128MB as VRAM, hence the reason for the VMotion failure.

Read more...
SP_1SP_7 SP_AUGUST SP_2SP_0SP_1SP_9

VMware ESX / ESXi 4 does work out of the box on the HP Proliant ML110 G5. I know, I'm using ML110's with ESX 4. However, I still see posts like this http://communities.vmware.com/thread/163029 were people have problems with this ML110 and ESX 4 and even ESX 3 combination.

Now, if you look at the link to VMware Communities, you will notice in one of the screen shots attached that the CPU is a 1.8GHz Dual Core. The ML110 G5 comes in more than one CPU, so beware that when you decide to get one of these HP Proliant ML110 G5 for running ESX 4, make sure you get one with the Intel Xeon 3065 CPU that runs at 2.3GHz. The reason is because not all Intel CPU's shipped with the ML110 supports Intel VT, and this is a requirement for ESX 4.

Now I know you're probably saying: "The guy in the post is not using ESX 4." My point is simple: It doesn't matter what version of ESX the post refers to. If you want ESX 4 to run on the ML110, make sure you have the correct processor.

Read more...
SP_1SP_7 SP_AUGUST SP_2SP_0SP_1SP_9

ImageI recently had to renew my self-signed SSL certificate used to publish my Outlook Web Access with Microsoft ISA Server 2004. As it’s been a while since I’ve done OWA publishing, I found myself scrambling for information on the internet until I eventually managed to compile this document. As I would like to use this again in the future, I though I'd post it here for reference.

I always used to use the Microsoft Windows Certification Authority to sign my own SSL certificates, but as I don’t really like the way the Windows Certification Authority does things, and I do like the way OpenSSL does things, so I opted to use OpenSSL on good old trustworthy openSUSE Linux to:

  • Create a new Certification Authority that I can use for all my private sites
  • Create a new x509 SSL Certificate to replace the current soon-to-expire SSL certificate in use by my OWA setup.
Read more...
SP_1SP_7 SP_AUGUST SP_2SP_0SP_1SP_9

This is by no means a new issue. However, I still get support calls regarding tasks that get stuck on VMs. What do I mean by "stuck tasks"? Well, I've seen cases where a snapshot task initiated by VCB got stuck in the state of "Creating Virtual Machine Snapshot". Then VM goes down and cannot be accessed via the console, does not respond to pings, and the status of VMtools turns to "Unknown". You also cannot do "Power On" on the VM either as the "Creating Virtual Machine Snapshot" task is still showing as an active task. You can wait, but after 30 minutes, chances are that it won't sort itself out, so user intervention is required!

This is normally the approuch I take to sort this out:

1. Make sure that the VM is inaccessible to everyone and that it really is down.

2. Browse the datastore where the VM is located (best to do this via the CLI on the service console with "ls -lh") and check the time stamps of the files to see how log the snapshots, if any,have been sitting there for.
3. in VirtualCenter, or "vCenter" the VM will probably still be showing as powered on. Check on which of your ESX hosts it is running.
4. Log onto the service console of the ESX host that is running the VM. Elevate your priviledges to root.
5. Now, as the VM has an active task, you won't be able to send any other commands to the VM. You won't be able to use vmware-cmd to change the state of the VM either. Until the task that's stuck in progress has completed, the ESX host will not be able to send any power commands to the VM. The only way to now release the VM from it's sorry state and get rid of the "Active task" is to kill the VM's running process from the service console. In order to do so, you need to find the PID for the "running" VM. To get the PID do:

The Syntax is:
ps -auxwww |grep <VM-NAME>

Example:
Suppose you have a VM called WKSTNL01 The command will be:
ps -auxwww |grep WKSTNL01


This should return something like this:

root     12322  0.0  0.4   3140  1320 ?        S<s  13:32   0:03 /usr/lib/vmware/bin/vmkload_app --sched.group=host/user/pool1 /usr/lib/vmware/bin/vmware-vmx -ssched.group=host/user/pool1 -# name=VMware ESX;version=4.0.0;buildnumber=164009;licensename=VMware ESX Server;licenseversion=4.0 build-164009; -@ pipe=/tmp/vmhsdaemon-0/vmx673aca8b7403868b; /vmfs/volumes/489a1228-2bfd25b5-6a2c-000e0cc41e52/WKSTNL01/WKSTNL01.vmx

The PID in this instance is 12322. This is what we need to kill.

6. Kill the process ID with kill -9:

kill -9 12322


7. Delete any snapshots created

8. Power On the VM.

Read more...
SP_1SP_7 SP_AUGUST SP_2SP_0SP_1SP_9

You can use esxcfg-auth to easily configure your ESX server to allow network based authentication as well as password complexity settings for your machine. It support setting up your system to do authentication against an Active Directory Server, but not user management, as well as authentication against a NIS server, a Kerberos server or an LDAP server.

Read more...
SP_1SP_7 SP_AUGUST SP_2SP_0SP_1SP_9

The CLI command line tool esxcfg-advcfg can be used to query and modify advanced options for a wide variety of different aspects of the ESX 3/4 VMkernel, such as resources, networking, storage and global settings. Note that this page is written for ESX 4 and not all commands may work with ESX 3.

Read more...
SP_1SP_7 SP_AUGUST SP_2SP_0SP_1SP_9

The esxcfg-addons CLI command is used to enable and disable addons in ESX 4.

Usage: esxcfg-addons [action] [parameter(s)]

Read more...
SP_1SP_7 SP_AUGUST SP_2SP_0SP_1SP_9

I’ve found myself asking this very annoying question just last week again. Which one of the servers is holding a lock on a virtual machine log file that was last modified 3 months ago?

Last week I came across a problem where VCB failed a job while trying to perform a full backup of one of the VMs. This was because one of the log files for the Virtual Machine was locked on the SAN. VCB was therefore unable to copy the log file to the backup server and therefore failed the entire job.

Normally, a simple VMotion of the Virtual Machine to another host will solve this issue, but I wasn’t as lucky this time. So I thought powering off the VM will do it... Didn’t work! No matter what I did, I just couldn’t get the lock released on that file. One of the ESX hosts in the cluster was holding on to the log file, but how do I go about finding out which one of the 20 ESX hosts is was? To me, this sounded like a job for vmkfstools, and indeed it was. Well, sort off. Using vmksftools, I was able to retrieve the MAC address of the ESX host in the cluster that was holding on to the 3 month old log file.

The command is:

vmkfstools –D /filename

In my case this was;

vmkfstools –D /vmfs/volumes/iscsi-002-vmfs/WKSTN01/vmware.log

The output is then written to /var/log/vmkernel.

To get the output, simply do:

tail /var/log/vmkernel

This returned:

Jun 20 15:35:33 esx1 vmkernel: 23:02:22:35.020 cpu0:4174)FS3: 142:
Jun 20 15:35:33 esx1 vmkernel: 23:02:22:35.020 cpu0:4174)Lock [type 10c00001 offset 29190144 v 7, hb offset 4083712
Jun 20 15:35:33 esx1 vmkernel: gen 1881, mode 1, owner 4a2128d2-86a81c3a-ce30-000e0cc41e98 mtime 893]
Jun 20 15:35:33 esx1 vmkernel: 23:02:22:35.020 cpu0:4174)Addr , gen 6, links 1, type reg, flags 0x0, uid 0, gid 0, mode 644
Jun 20 15:35:33 esx1 vmkernel: 23:02:22:35.021 cpu0:4174)len 312433, nb 1 tbz 0, cow 0, zla 1, bs 1048576
Jun 20 15:35:33 esx1 vmkernel: 23:02:22:35.021 cpu0:4174)FS3: 144:  

The MAC address of the host locking the file is reported in line 3:

000e0cc41e98

Now, this is the bit where I can’t make it any easier for you. Unless you write a script, (and I don’t have that much time at the moment) the only way to find the host with that MAC is to log onto each host via SSH and run: 

esxcfg-info |grep –i  ‘system uuid’

This will then return the UUID for the host you are on. If it matches the MAC retrieved using vmkfstools, then you know the process that’s keeping the lock is on that server.

So what process is locking the file? That I can’t tell you. I can only give you some tips as to how to find it.
1.       Power off the VM in vCenter;
2.       Log onto the service console of the host that’s locking the file;
3.       Try to move or delete the lock file from the service console of the locking host. This worked me. If it works for you, then good. If not, go to step 4;
4.       Try and see if there’s a process running with the filename that is locked;

ps –auxwww |grep

If it returns a line(other than the grep line) kill the process with “kill -9 "

5.       If it doesn’t return any processes under that filename, then try and search for a PID with the VM name that has a locked file:

ps –auxwww|grep

If it returns a PID, kill the PID, as your VM was already powered off in step one and should therefore not have a PID on any host;

6.       If it still doesn’t work, leave a comment and we'll have a look at it ;-) 

Read more...

@Mike_Tornincasa @rubrikInc Welcome back on the rocket!
Follow Rynardt Spies on Twitter