So I’ve noticed a rather worrying feature of ESXi (I’m on 4.1 U1) in the configuration that I have. I’ve created our ESXi servers to use a vDS for VM traffic but for management, vMotion and FT I am sticking with the local vSwitch0. vSwitch0 has three active NICs and three port-groups, one for each of the above VMkernel tasks (management, vMotion and FT).
These port-groups have the NICs set to a specific order with only 1 active and 2 in standby ensuring that, although only 1 NIC is actively used for each portgroup, the two other NICs are there for failover. A few pics may explain this better….
vSwitch0 Overview
vSwitch0 vSwitch NIC configuration (click to enlarge)
Management port-group configuration

Management NIC configuration (click to enlarge)
It’s also worth noting here that, since we have Enterprise Licenses, I decided to make use of Host Profiles to help me set up other ESXi hosts. So I configured one ESXi server the way I wanted it and used it to create a baseline host profile. This profile was then applied to all remaining hosts. Nice, quick deployment of a standard config. Good times!
Server Down!
My problems started when I wanted to change the IP address or VLAN of the Management port-group. Actually problems arose when I made a change to ANY port group in vSwitch0. My most recent problems were when I wanted to change the VLAN that the vMotion port-group used. So I changed the VLAN and associated IP address for the vMotion PG and suddenly my host went offline!
What seemed to happen was that the new IP and VLAN I applied to the vMotion PG was actually applied to the Management PG!!! Seriously not good. Especially when your cluster has HA enabled which means that vCenter dutifully starts powering down VM’s and bringing them online to other hosts in your cluster! Queue lots of alerts and irate application owners.
After some digging around and finding some VMware Community posts I realised that the problem was due to an unconfirmed bug. It seems to be down to the way that Host Profiles are applied; specifically the order in which port-groups are created on the host.
If you look again at my first pic you’ll notice that the Management PG does not use the first vmkernel nic ‘vmk0′. This is actually assigned to the vMotion PG.
To me this is a massive problem and from my point of view, the “fix” is to make sure that the Management PG is using vmk0.
Reassigning vmk0
- The first thing you want to do is disable HA on your cluster. Otherwise when you make your change and temporarily loose connection to your host all your VM’s when vCenter detects that host has gone offline.
- Next, you’ll need either physical or ILO access to the DCUI. SSH won’t cut it as we’re going to loose network connection for a while. Once you’re on the machine hit ALT-F1 to go to the local TSM (assuming you have it enabled) and logon as root or admin user
- Now we want to remove all port-groups apart from the Management PG. We could do this via the VI Client, but lets do this at the command line being as we’ve made the effort to get here…List your vSwitch0 port-groups (and show vmknic’s):
Here, again, you can see that the Management port-group is using vmk3. This is bad, mmmkay? The other odd thing, and something I’m assuming is not good, is that the MAC shown for the Management PG is that of the physical NIC. This shouldn’t be the case. This should be a virtual MAC starting with the VMware specific range; 00:05 - The following commands removed my vMotion and Fault Tolerance port-groups. The affect of this is that their vmknics are destroyed:
esxcfg-vmknic -d vMotion esxcfg-vmknic -d 'Fault Tolerance'
Now the fun part. We run exactly the same command but on our Management PG. Yes. You will loose network connectivity to the host, but that’s why we’re using ILO, right?
esxcfg-vmknic -d Management
- After we’ve blown away our Management PG we can recreate it. When we do, it will automatically be given vmk0:
esxcfg-vmknic -a -i 10.32.202.11 -n 255.255.255.0 Management
You’ll also notice our new PG has a correctly assigned virtual MAC (starting 00:05). - At this point you’ll want to go back to the VI client and configure the new PG the way you want including adding additional pNIC’s to the vSwitch and setting failover order for the PG’s.
Now I’m able to make changes to my vMotion and FT port-groups without affecting my Management PG.
Let me know if you’ve been affected by this and how you’ve got around it.
My next step is to take another host-profile baseline and ensure that, when reapplying it to the other hosts, the vmknic assignments are consistent.





Recent Comments