troubleshooting

Office Arguments – Maximum VMDK size is NOT 2TB-512bytes

…if you want to use snapshots Pop Quiz Q: What’s the maximum size VMDK you could create in vSphere 5.1 or earlier? A: Most people that have studied for VCP will know the maximum VMDK size is 2TB minus 512 bytes. If you create a disk in the GUI, it allows you to choose 2TB, but it’s smart enough to minus 512 bytes. So technically that’s the maximum VMDK size, but you should NOT create it that big.

Maximum Disks Per SCSI Controller is NOT 15

 Pop Quiz Q: What’s the maximum number of disks per SCSI Controller? A: It depends.. On your VCP exam, you would have said 15. Correct. Although if you want to clone or snapshot and quiesce a VM, the maximum is 7 disks per SCSI controller. Each SCSI Controller can control 15 disks and the quiesced snapshots in Windows 2008 require one available slot per existing disk. If you have more than 7 disks, the clone / quiesce part will fail, and you’ll have the following errors in vCenter and the VM’s vmware.

SRM – IP Customisation error

During an SRM (5.0) failover, a VM failed during the IP customisation failed at step 11. It was strange as we hadn’t seen this error in quite a while. 11. Power On Priority 3 VMs     Error – Cannot complete customization, possibly due to a scripting runtime error or invalid script parameters (Error code: -1). IP settings may have been partially applied. Looking through the SRM logs, they pointed to an error in C:WindowsTEMPvmware-imcguestcust.

vSphere Client Storage Views tab not showing any infomation

The storage views tab in the vSphere client disappeared, and vCenter System Services displayed some of the following errors: unable to retrieve health data from https://localhost:443/vsm/health.xmlunable to retrieve health data from https://localhost:443/eam/eamService-web/health.xmlunable to retrieve health data from https://localhost:443/SMS/health.xml VMware KB article 2016177 (vCenter Server Health status reports the error: Error retrieving health from url (2016177)) had the fix. This issue & kb is only for vCenter 5.

Running Dell DSET remotely on ESXi

For those using Dell hardware, when you log the job with Dell Support, they’ll ask you to run a DSET report. This collects various information of the server including service tag, all hardware devices, firmware versions etc. There’s 3 ways to get DSET info. 1) Install DSET locally 2) Run DSET LiveCD 3) Run DSET remotely and create a report on a local server. Each option has their pros and cons.

Missing VM NIC

The Disappearing Act A VM went off the network, and actually lost the NIC from within the VM’s hardware. Pouring through logs, (some thanks to LogInsight, more on that later), I discovered in vmware-xx.log: 2013-11-19T07:33:01.246Z| vcpu-0| Powering off Ethernet0 2013-11-19T07:33:01.246Z| vcpu-0| Hot removal done. ah ha! This shows Ethernet0 was removed via the “Safely Remove Hardware” icon in the Windows system tray. The solution is to add a new NIC of the same type.

Constant Alarm ‘Network Uplink Redundancy Lost’

It’s amazing how much is going on when you dig through logs. On this occasion I was looking at  “tasks & events” of a host and noticed a lot of network errors. Alarm ‘Network uplink redundancy lost’ on triggered an action The error was occurring every 5 minutes. This was made visual with the use of Log Insight. My new favourite tool. I couldn’t find anything wrong with this particular ESXi host, vSwitch or uplink.

Force mount missing datastores

By accident while in Cluster Settings / Datastore Heartbeating, I noticed a datastore wasn’t available of one of the hosts. Trying to mount it from the vSphere client failed with a popup: Call “HostStorageSystem.ResolveMultipleUnresolvedVmfsVolumes” for object “storageSystem-326” on vCenter Server “vcenter” failed. The command to force mount a snapshot that is persistent has changed from ESX(i)4.x to ESXi5.0. The details are at http://kb.vmware.com/kb/1011387. Use SSH or ESXi Shell and run:

Slope info magic number check failed

While using Splunk I noticed 1 ESX host had a huge amount of logs compared with the others in that cluster. Looking into it, every hour there were about 23,000 entries for: storageRM: Slope infomagicnumbercheckfailed.Ondisk0x0, expected0x1df5e76. There was only 1 hit on google from a twitter conversation with @northlandboy & @blomjoh asking if anyone knew what the error was. I logged an SR for it. But magically the next day, the error stopped appearing in the logs.