Basic flow of Data

In my last post we covered the Nutanix Overview, but didn't go deep into how the NDFS is built.  In this post we will cover the basic dataflow, including how the CVMs connect to the network.  Many of these diagrams are inspired by the Nutanix Bible which is a great resource for all things Nutanix.

One Node

Just like every Lego model begins with the first block, every NDFS cluster begins with the first node.


Remember that the SCSI controller is passed through to the CVM using Intel VT-d.  The green arrow above shows the flow of data for a read (this example assumes that all data is local):

  1. The Guest VM sends the read command to the Hypervisor.
  2. The Hypervisor sends the read to the CVM using the chosen storage protocol (NFS for ESXi, SMB for Hyper-V, iSCSI for KVM)
  3. The CVM will find the requested data on the disks and send it back to the hypervisor
  4. The Hypervisor sends it to the guest VM as normal
This isn't a particularly exciting example as each Nutanix cluster has a minimum of 3 nodes.  The additional nodes will add complexity (and redundancy).  Before diving into how a full cluster works, let's take a quick look at the networking setup (using ESXi terminology - there will be a dedicated Hyper-V post later)


Networking



Here we see that there are 2 vSwitches created - 1 private network for communication between the CVM and the Hypervisor and 1 external vSwitch for ESXi management, guest VM networking, and CVM management.  VMK1 always has the IP 192.168.5.1 and the CVM internal interface is always 192.168.5.2 (which is how the cluster tricks the ESXi host to thinking it's one big datastore - each host mounts 192.168.5.2.)

Each node has 4 interfaces (2 10gb, 2 1gb interfaces).  I only show one connection in the diagram up there, however you can use as many or as few as you like.  I'll get deeper into networking in a later section, however the most important thing to know is that there's an internal and external network.

One Block



Taking what we just learned, let's take a look at how reads happen in this case.
  1. The guest VM sends the read to the Hypervisor
  2. Using the private network the Hypervisor sends the request to the CVM (NFS/SMB/iSCSI)
  3. The CVM looks at it's metadata database to determine where the data is stored
  4. If local, it reads the data from the drives 
  5.  If remote, it requests the data from the CVM that has the data over the network.  It will then be locally stored so that future reads are fast
  6. No matter how the data was gathered the CVM returns it to the hypervisor who returns it to the VM
This is the core of what makes the "distributed" part of the NDFS.  Using the normal 1/10gb network the CVMs are able to communicate with each other and distribute data completely transparent to the OS.  The Hypervisor has no idea that it is accessing the local hard drives of another hypervisor, nor does it matter.  The CVMs quickly and efficiently deliver the requested data.

You may have noticed another new topic here - metadata.  Basically metadata is information about the data - the CVMs use a Cassandra database to store and share information about the data stored on the local disks.  Using this they are able to quickly find the data they need, no matter what node it is on or how many nodes there are in the cluster.

Writes

In the above section I covered reads, but what about writes?  Since we are just writing to a disk do we need to use the network? 

Yes.  In order to satisfy redundancy (RF2 in the current versions of code) we cannot just have one copy of data.  Thus a write is not acknowledged until we have another copy of the write on another node.  This means that, unlike reads, every write uses the 10gb network in order to ensure we are fully redundant.

I'm not going to go deep into the details of how we write, as that requires understanding of our different data structures (which I haven't covered yet), however if you're looking for an in depth guide right now, the Nutanix Bible has all that and more.

Conclusion

In this post we talked about how the Hypervisor and the CVM communicate (over the private vSwitch), how the guest VMs read/write to the disks through the CVMs, and how the CVMs know where all the data is stored.  Next time we will cover the networking configuration in depth and see how we can integrate the Nexus 1000v into the system.

Thanks for reading!