vSphere ESXi, vSwitch0, vmk0 and host profiles

9 03 2011

So I’ve noticed a rather worrying feature of ESXi (I’m on 4.1 U1) in the configuration that I have. I’ve created our ESXi servers to use a vDS for VM traffic but for management, vMotion and FT I am sticking with the local vSwitch0. vSwitch0 has three active NICs and three port-groups, one for each of the above VMkernel tasks (management, vMotion and FT).

These port-groups have the NICs set to a specific order with only 1 active and 2 in standby ensuring that, although only 1 NIC is actively used for each portgroup, the two other NICs are there for failover. A few pics may explain this better….

vSwitch0 Overview

vSwitch0 vSwitch NIC configuration (click to enlarge)

Management port-group configuration

 

Management NIC configuration (click to enlarge)

It’s also worth noting here that, since we have Enterprise Licenses, I decided to make use of Host Profiles to help me set up other ESXi hosts. So I configured one ESXi server the way I wanted it and used it to create a baseline host profile. This profile was then applied to all remaining hosts. Nice, quick deployment of a standard config. Good times!

Server Down!

My problems started when I wanted to change the IP address or VLAN of the Management port-group. Actually problems arose when I made a change to ANY port group in vSwitch0. My most recent problems were when I wanted to change the VLAN that the vMotion port-group used. So I changed the VLAN and associated IP address for the vMotion PG and suddenly my host went offline!

What seemed to happen was that the new IP  and VLAN I applied to the vMotion PG was actually applied to the Management PG!!! Seriously not good. Especially when your cluster has HA enabled which means that vCenter dutifully starts powering down VM’s and bringing them online to other hosts in your cluster! Queue lots of alerts and irate application owners.

After some digging around and finding some VMware Community posts I realised that the problem was due to an unconfirmed bug. It seems to be down to the way that Host Profiles are applied; specifically the order in which port-groups are created on the host.

If you look again at my first pic you’ll notice that the Management PG does not use the first vmkernel nic ‘vmk0′. This is actually assigned to the vMotion PG.

To me this is a massive problem and from my point of view, the “fix” is to make sure that the Management PG is using vmk0.

Reassigning vmk0

  1. The first thing you want to do is disable HA on your cluster. Otherwise when you make your change and temporarily loose connection to your host all your VM’s when vCenter detects that host has gone offline.
  2. Next, you’ll need either physical or ILO access to the DCUI. SSH won’t cut it as we’re going to loose network connection for a while. Once you’re on the machine hit ALT-F1 to go to the local TSM (assuming you have it enabled) and logon as root or admin user
  3. Now we want to remove all port-groups apart from the Management PG. We could do this via the VI Client, but lets do this at the command line being as we’ve made the effort to get here…List your vSwitch0 port-groups (and show vmknic’s):

    Here, again, you can see that the Management port-group is using vmk3. This is bad, mmmkay? The other odd thing, and something I’m assuming is not good, is that the MAC shown for the Management PG is that of the physical NIC. This shouldn’t be the case. This should be a virtual MAC starting with the VMware specific range; 00:05
  4. The following commands removed my vMotion and Fault Tolerance port-groups. The affect of this is that their vmknics are destroyed:
    esxcfg-vmknic -d vMotion
    esxcfg-vmknic -d 'Fault Tolerance'

    Now the fun part. We run exactly the same command but on our Management PG. Yes. You will loose network connectivity to the host, but that’s why we’re using ILO, right? :)

    esxcfg-vmknic -d Management
  5. After we’ve blown away our Management PG we can recreate it. When we do, it will automatically be given vmk0:
    esxcfg-vmknic -a -i 10.32.202.11 -n 255.255.255.0 Management


    You’ll also notice our new PG has a correctly assigned virtual MAC (starting 00:05).

  6. At this point you’ll want to go back to the VI client and configure the new PG the way you want including adding additional pNIC’s to the vSwitch and setting failover order for the PG’s.

Now I’m able to make changes to my vMotion and FT port-groups without affecting my Management PG.

Let me know if you’ve been affected by this and how you’ve got around it.

My next step is to take another host-profile baseline and ensure that, when reapplying it to the other hosts, the vmknic assignments are consistent.





Change disk persistence mode on the fly in vSphere using PowerCLI

4 11 2010

I’ve been plagued, for some time, by a really annoying problem in my home lab. Being as I run everything from the one VM host (ESXi) it is also home to my Windows 2008 domain controller.

Recently I had installed Veeam Backup and Replication to finally start backing up my VM’s (even though it’s my home/test environment I would be a broken man if I lost everything on there!). I installed Veeam on to the Windows 2008 DC – against all common sense of course, but RAM is low in my ML110 and I couldn’t afford to create another Windows VM.

I had configured Veeam to backup my guest’s using the vStorage API – Virtual Appliance mode. Backups of my other Windows guests ran well but I noticed some problems with the DC VM backup after a short while. It was around this time that I also noticed that the hard disks of this guest OS had automatically changed to Independant Nonpersistent! This would have been bad enough for any machine (as soon as the guest powers down you loose all changes), but for a DC – as you can imagine – it’s a nightmare!

This post on Experts Exchange confirmed – to some extent – that Veeam was the culprit and there wasn’t some other weird force at work: http://www.experts-exchange.com/Software/VMWare/Q_26310902.html

I disabled the DC backup via Veeam but the problem remained that my VM’s disks were Independant. The vSphere Infrastructure Client does not allow you to change disk modes on the fly so I was stuck – as soon as I shut my VM down to change the disk type I would loose all changes:

Thankfully, PowerCLI came to my rescue!

After connecting to my ESXi host using

connect-viserver <esxi_ipaddress>

All that was needed was a simple command:

Get-HardDisk -VM <vm_name> | Set-HardDisk -Persistence "Persistent"

Success!! At least partially. My primary (system) disk had successfully reverted to a persistent disk. However, my second disk had not and PowerCLI had actually errored :

CapacityKB Persistence                                                    Filename
---------- -----------                                                    --------
68157440   Persistent                       [localdisk01] vm_guest/disk1.vmdk
Set-HardDisk : 04/11/2010 00:49:44    Set-HardDisk        Another task is already in progress.
At line:1 char:42

I haven’t yet figured out what task it is that is supposedly in progress, but I certainly can’t find one. It may be worth noting that my primary disk is on local SATA while the second disk is on an NFS share.

Now, at least, when I run Get-HardDisk, it shows that my system drive is Persistent:





RDM mapping of local SATA storage for ESXi

25 10 2010

This post has been sat in my WordPress Drafts folder for sometime since I no longer use local storage this way. I decided to post it however as (a) it’s a good learning curve for ESXi work and (b) others may have more luck that me.

I recently acquired three 1TB drives and decided to do something about my lack of storage at home. Always trying to make best use of existing kit (and save money) I decided to stick the drives in to my HP ML110 and try something in a VM instead of doing the sensible thing of lobbing them in to a dedicated NAS box.

After wasting a few hours I realised that the onboard SATA RAID controller of the ML110 just can’t do RAID5 and to make matters worse, when I gave up and created a RAID1 array with a hot spare, vSphere 4.1 didn’t recognise the array and instead saw the drives as 3 individual drives. I saw this as a chance to try out the WAFL-alike ZFS file system. FreeNAS had been my NAS of choice recently so I chose to try ZFS in that.

I point blank refused to create 3 1TB VMDK’s (one of each of the three drives) so I set about figuring out how to create Raw Device Mappings (RDMs) of the local SATA drives. There were a couple of posts on the net that got me a little closer, but no guide/article had the whole thing down, so that’s my aim with this blog post.

Step 1

Once you had your drives installed, SSH to your ESXi box (now even easier in vSphere 4.1) and go to the /dev/disks directory. There, if you perform a ls -l, you’ll see your drives listed:

Ignore the instances of your drives which show them as VM stores (vm1.*****). We want to look at the raw devices.

Step 2

Now move to the /vmfs/volumes folder. Here you can see your existing local datastore(s). If, like me, you had a solitary hard-drive, you’ll just see localdisk01 or whatever you chose to name the local datastore:

Step 3

Now we are going to use the vmkfstools utility to create our RDM’s. Remember that a RDM is just another VMDK, but instead of the VMDK pointing to a xxx-flat.vmdk file (which is the actual virtual hard disk), the VMDK points to our physical device. Being as we still need to create this VMDK file we need to save it somewhere. Since we just have the one local datastore, we are going to create the RDM VMDK files in it’s root.

The following command creates the RDM VMDK for us:

vmkfstools -z /vmfs/devices/disks/<name of RAW device from Step 1> <location to store VMDK>/<RDM name>.vmdk

In my personal example below, I am creating an RDM called rdm_WD2DWCAVU0477582.vmdk and it is being stored in the location /vmfs/volumes/localdisk01/ I chose the name of the VMDK to match the name of the serial number of the physical drive (and what is shown in Step 1) to help with troubleshooting in the future when I get an inevitable drive failure). You can call your RDM’s whatever you wish.

The name of the RAW device (t10.ATA____WDC_WD10EARS2D00Z5B1__________________________WD2DCAVU0477582 in my example) you will have noted from Step 1 when you listed all local devices attached to your ESXi host. This is why the tech Gods created Copy n Paste! You will want to copy the full device name as shown in Step 1 in to the vmkfstools command.

Step 4

Once you have repeated the steps for all of your local SATA drives, you can navigate to where you created the RDM’s (in my case /vmfs/volumes/localdisk01) and perform an ls -l *.vmdk to see the new VMDK’s you have created:

Don’t panic – the xxx-rdmp.vmdk files will reflect the size of the RAW devices they are mapping to, but rest assured it will be taking no more space than a few bytes on your local disk!

Step 5

You can now add your RDM’s to an existing VM. vSphere doesn’t recognise this as a true RDM (to a SAN) so you just browse the local disk datastore for the VMDK files that we created.

Edit the properties of an existing VM and click Add…

Step 6

Select Use an existing virtual disk and click Next >

Step 7

Click Browse. You now need to navigate your local datastore and select the VMDK’s that we created in Step 3).

Once complete you will be shown a confirmation window. Repeat Steps 5 through 7 to add additional RDM’s to your VM.

Step 8

You should now see your new Hard Disk’s in your VM and vSphere will correctly identify them as Mapped Raw LUN.

NOTE: One thing I forgot to show in the screen shots, is that you should create your RDM’s on a new SCSI controller! You do this by simply selecting a new SCSI ID starting with 1:x instead of 0:x. Existing VMDK’s should be on SCSI Controller 0. Your RDM’s should be on SCSI controller 1. Although my screenshot shows 0:3 this should read 1:3.

You can now save your VM configuration. Your VM will now access the RAW SATA drives  and be able to use things like SMART to monitor its health.

See below; I am adding my three 1TB drive to FreeNAS to create a new ZFS pool.

Stay tuned for an upcoming blog post on FreeNAS and NexentaStor which may or may not put you off ZFS [in a VM] altogether!





Netapp Rapid Cloning Utility v3.0

26 02 2010

Netapp have released their new version of their Rapid Cloning Utility – a vCenter plugin which allows you to provision and new datastores and clone hosts (including VMware View 4 VDI’s) with ease right inside of the Virtual Infrastructure Client. vCenter 4 is needed but it is compatible with ESX 3.5 and 4.

The great thing is that all the storage processing is offloaded from vCenter and is performed entirely on the filers. I’ve not had a chance to play with the RCU yet but this just looks utterly awesome! Check out this preview blog post and video from Netapp:

http://blogs.netapp.com/virtualstorageguy/2009/12/preview-rapid-cloning-utility-30-vcenter-plug-in.html

If you don’t have time to sit and read, then at least check out the video. I dare you not to be impressed!


Do any other storage vendors have similar tools?

Source: http://www.ntapgeek.com/2010/02/netapp-updates-rcu-for-vmware-vsphere-4.html







Follow

Get every new post delivered to your Inbox.