Monday, February 20, 2012

VCP Passed

I took my VCP510 exam today, and passed! I am chuffed as I really didn't put enough effort in.

I think VMware have moved the exams in a much better direction than VCP410 with less emphasis on maximums and more on learnt knowledge. I am especially happy because we only use FC and haven't implemented many of the new features, I also don't have a lab at home and haven't had time to run through this in the office. So as a result I haven't had any significant real world real hands on with vCSA, Auto-Deploy, iSCSI, SDRS, etc. To try and get this info I read most of Scott Lowes excellent book "Mastering VMware vSphere 5", which gave me a good basic understanding of these new features and then read PDFs to get more detailed knowledge.

I do wish one thing, the Mock test, is just too easy and doesn't give a good representation of where you are on the knowledge tree.

Going to work towards VCAP-DCD now.

Thursday, February 2, 2012

Nasty "bug" in ESXi 5 with hot-add cpu and 8+ cores

We recently did some load testing with a very large SQL server running under ESXi, this was to check the viability of using our VMware environment to provision extra capacity for one of our production physical databases. The VM was set-up with hot-add CPU's and hot-plug memory so we could increase capacity as required. We had a host dedicated to this role and were pushing the VM up to 32cores and 128GB RAM, a beast! The underlying server was a Dell M910 with 64 threads and 192GB RAM.


The problem was very noticeable as memory increased, SQL Server was filling its buffer pool, at about 36-38GB SQL would stop responding normally, the privileged CPU usage would increase dramatically as would the rate in which it filled the buffer pool. It would carry on this increase in memory until it hit the SQL server Max Memory value and then return to normal operation. It was noticed that the number of vCPU's used would affect the point at which the issue would happen, less vCPU's the longer the server would be okay for.







VMware were originally happy with the configuration of the VM, so over to Microsoft for the SQL Server core support team. Eventually we identified that if we disabled SQL Server NUMA with the startup trace flag of T8015 that the issue did not happen. So back over to VMware support and they came back and said gave me some more information.


It turns out that CPU hot-add and vNUMA are currently incompatible and that when CPU  hot-add is enabled, vNUMA is disabled and the following is seen in the vmware log for the VM "NUMA and VCPU hot add are incompatible.  Forcing UMA"
The Guest OS tries to use NUMA but the VM cannot and then the trouble starts.

I hope VMware create a KB to make people aware of this, as you wouldn't really expect this behaviour. Or alternatively alert users of the incompatibility of hot-add and vNUMA on systems with more than 8vCPU's.