In follow up posts, I hope to give more examples of how the physical network should adapt to help optimize the virtual network.
Prior to server virtualization, the server OS (Windows, Linux, etc.) was already de-coupled from hardware. Applications were written for each OS. Server virtualization was a huge win because it didn’t require any change to the existing and underlying server operating systems (OS). Key point: the hypervisor was a level of abstraction – it wasn’t a new server OS.
Prior to *new* network virtualization (still today), the network OS (Cisco IOS, NX-OS, JunOS) is not de-coupled from hardware. Cisco Nexus NX-OS runs in software on 1000v, but you still can’t go load it up as the control stack on another device. Just so happens the closest thing to this is Open vSwitch (lead developers from Nicira); it can run as the control stack for a vswitch and a general purpose hardware switch. OVS is portable. And there really aren’t apps at all for current Network OSs.
Network Virtualization can accomplish a lot. I’m a huge fan, but given today’s technologies, it is fundamentally different than server virtualization, especially for VMware. Remember, as I just said above, VMware offered a server hypervisor that didn’t require changes to current Server OSs. In the world of networking, they are offering a Network Operating System (OS), not really a hypervisor. These are different. VMware isn’t at fault, but it is nearly impossible to do exactly what they did for server virtualization for network virtualization, because network operating systems are tightly coupled to the hardware --- today. I don’t follow Microsoft and Linux distributions, but how much were they impacted by server virtualization. Was the impact positive or negative? I personally don’t know. Feel free to comment below with your thoughts. Then compare this to the impact network virtualization can have on current Network Operating Systems.
Hardware vs. Software
It shouldn’t be a battle of hardware vs. software. You will always need both because the new access layer is definitely in server. From physical switch to blade switch to virtual switch. The battle will be about complete solutions, ease of integration, and product interoperability.
Let’s examine some more of the things that I think about when comparing server virtualization and network virtualization. This will lead into what “complete solutions” could look like in the future --- pure speculation on my part.
A hypervisor manager lets a server admin spin up/down VMs and the admin chooses which OS to load on a VM. In networking, we are talking about ONE network OS where virtual segments (and associated L4-L7 svcs) are spun up/down. Difference here is lack of flexibility for choosing which OS. An application owner can still RDP into a guest virtual machine to manage the OS and application. Can a local network admin still SSH into their virtual network segment to manage their network? No, from what I can tell. I’ve never seen the Nicira solution, so I could be wrong. Big Switch could possibly do this – in fact, I’ve seen a demo from them that did something similar with network slicing (MAC filtering), but not sure when using overlays. Feel free to comment if you have more details.
“Virtual machines to virtual segments” is not the best comparison --- although we are indirectly comparing them every day.
What do we really need?
Using Cisco and VMware current technologies as examples, we need technology that morphs Nexus Virtual Device Contexts (VDC) and the Nexus 1000V (or VMware distributed switches) together. VMware used to go to market with several vswitches per physical server and then after the network guys got involved, it usually became one vswitch with the use of VLANs. That totally makes sense and I still recommend that.
But, the more I think about this --- why not keep the multiple vswitch concept per physical host, but in a distributed manner. Allow the management of each local vswitch, Virtual Ethernet Module (VEM) for the Cisco folks, be managed by a different control plane, or a multi-tenant control plane? This means a different distributed switch and VSM per tenant. This would increase the number of Virtual Supervisor Modules (VSM), but offer greater flexibility in terms of configuration, administration and per-tenant admin control, should that be desired. It would also give 16M segments per tenant. The goal would be to still offer a high level manager, easy to use UI, such as a Data Center wide Virtual Supervisor Module (DC-VSM) that spins up/down tenant VSMs (T-VSM) as needed to manage new tenants and applications. At the same time, it reduces the fault domain of the network controllers. Here, I am loosely calling the VSM a network controller. It must evolve into this although Cisco has never called the VSM a controller.
Post Update 2/27/13: With what I am describing, multiple vswitches would exist on a single host, but each would not require dedicated physical NICs per vswitch to be used as uplinks.
Now that specific VMs can plug into a *dedicated*distributed virtual switch, there should be the ability to take a port from a physical switch and include that under the same management domain – under control by the Nexus Virtual Supervisor Manager (VSM). This is where we morph in the VDC concept. Rather than call it a VDC, we can think of the Nexus switch (5K/6K/7K) running multiple instances of the 1KV VEM across access layer switches. This could mean the physical Nexus would be running as line cards with the main Supervisor being the VSM. So, the physical supervisor becomes a DFC for all intents and purposes.
In this design, there can be two models to inter-connect VMs on different hosts together on the same L2 subnet ---- the ultimate requirement for vMotion. First, we can use overlays. The tenant based VSM would handle this. Second option, we can leverage a high capacity 1KV VEM or VDC concept on the intermediary switches in the data path. I say high capacity because the number of VDCs would never scale as we know it today. 1KV VEM would help this scale.
The first option simplifies the physical fabric and eliminates the need to worry about things like MAC Address scale, L2 in the physical network, large fault domains, etc., but the second option may seem *cleaner* to the network engineer who wants to use L2 and have network visibility along every switch hop of the network.
Key point: If the argument is visibility, deploying overlays in hardware TOR switches still doesn’t change anything and it still doesn’t gain a customer visibility between TOR switches. Overlays are overlays. APM tools that could analyze the contents of tunnels such as those encapsulated by VXLAN and dissect them to analyze the real source/dest MAC and IP would prove to be very valuable in any type of overlay model.
Different models for different types of customers. The first option may be better suited for a Cloud Service Provider, while the second option may be more ideal for an Enterprise environment. It will always depend. In this case, it will depend on a lot considering this is all speculation on my part.
I didn’t mention SDN at all in this post. Good or bad?
Regards,
Jason
Follow me on Twitter: @jedelman8