Advantages of the whitebox appliance model

Recently I saw the following tweet from a well-known VMware blogger


And had flashbacks to some nightmares I had back while working at Cisco.  At Cisco we commonly had to build custom ISOs to handle a variety of custom drivers (Nexus 1000v VEM, VIC drivers for Eth/FC, VM-FEX driver, etc).  If you didn't do this right you were SOL, and if you did choose "ignore compatibility warnings" to upgrade anyway, then you were likely going to have to reinstall.  Looking at the official guide to creating a custom ISO, you've got a lot of steps to get right and won't know if you were successful until you try to install.


First a little background. This check is looking for drivers that won't work after an upgrade.  Many drivers and VIBs are only supported with a particular release of ESXi.  For example, a Network Driver VIB might only support ESXi 5.5.  When you are doing a host upgrade (such as to 6.0) the bundle you are upgrading with needs to have a 6.0 version of this VIB, otherwise the compatibility check will fail.

So what if you ignore this warning?  The host will upgrade, but then force-load the 5.5 VIB.  ESXi can't just pull out the VIB as there could be other things that rely on it.  When it does this force-load the driver could fail to properly load and place the host into a limbo state.  You can't install the 6.0 VIB because there's already a VIB with the same name loaded and you can't uninstall the VIB because it never properly loaded, leaving you with a useless host.  Now you've got to reinstall ESXi or mess around with the recovery shell, turning an upgrade into a recovery operation.

So what does this have to do with a whitebox appliance model?  Because Nutanix uses well-known commodity hardware the drivers are included in ESXi by default - there's no special card or special NIC we need to make a custom ISO for.  With our configure-to-order model we maintain a small list of appliances that can be fully customized for any need, making it easy for our QA to test every combination.  Once QA has finished testing we release a .json file that can be used to upgrade with our one-click hypervisor upgrade.  The guarantees that the bundle you use is the exact same they used and tested - no messing with custom ISOs or driver incompatibilities.  NCC before ensures that the system is ready to upgrade, and after ensures that everything is up to date and that there are no mis-configurations.

All of this works in the background towards one goal - you can click a single button in Prism and have your hosts upgrade their hypervisor with complete confidence that your production cluster with will upgrade successfully without any additional intervention or affecting your business critical workloads.  All this work behind the scenes makes this one button very powerful indeed.

My ProTip of the day: Use Nutanix One-Click upgrades!  Just one more way Nutanix is making the infrastructure invisible.