Visibility in the Network Going Forward

4/14/2013

What sort of insight should the physical network fabric offer network operators when it comes to deploying network virtualization? It is a great question and the answer is really going to vary based on who answers it. Martin Casado and co. recently voiced their perspective here. As always, Martin’s blogs are a great read and I encourage you to follow him at NetworkHeresy if you aren't already, although there haven’t been many posts since the Nicira acquisition. Looks like he is making it a community based blog going forward, so let’s hope to see more soon.

We know virtualization, server and network, offer a means of abstracting the underlying physical hardware. Once the hardware is abstracted though, how much visibility should there be into the virtual networks or virtual servers?

As I said in before, I think it’ll come down to the type of customer and what they require. For a large scale muti-tenant public cloud, they will likely view the network as an IP fabric. For Enterprise, I could see them wanting to maintain end to end visibility because that is what they know and for the fear of not being able to troubleshoot in the case of a problem. But in the long run, that could likely change. Time will tell.

How much information about virtual servers can be gleaned from accessing a physical server OS?

While I usually agree with much of what Martin says, I’m not sure I like comparing voice to network virtualization. The part that I don’t agree with would be for the Enterprise customer. As he has stated with advanced codecs, abundant bandwidth, and some QOS, voice on an IP network has come a long way with no longer discussing RSVP when it comes to IP Voice. Phone calls are peer to peer, with call setup being done by a call controller just like a tunnel setup in network virtualization. One could draw even further similarities with call conferencing that require multicast to join multiple parties to what the current control plane, or lack thereof, looks like in some forms of network virtualization.

With all of that said, while voice and video traffic are kept local to an Enterprise, network operators have full visibility to this traffic on a hop by hop basis. This comes in handy for those trying to troubleshoot why their CXO just had a call dropped for the third time in two hours. In current forms of network virtualization where an overlay is created between hypervisor switches, this wouldn’t be possible. Rather the network operator would likely troubleshoot a tunnel. Tunnel is up, network is up. Can it be that simple? The big question is, how many and which type of organizations will deploy network virtualization without regard of being able to troubleshoot individual flows in the physical network?

What about carriers and Telcos? What about Internet-based voice applications like Skype? Are those services good enough?

That doesn’t paint the best picture because some visibility can still be maintained within network virtualization, but it must be done local on the vswitch. For example, NetFlow data can still be pulled from the virtual switch. In addition APM/NPM tools such as Riverbed Cascade (as an example) can be deployed on each physical host to correlate data entering the tunnel (pre-encapsulation) and exiting the tunnel (post- decapsulation) correlating a single flow within a particular tenant. This would allow a network operator to see end to end latency, packet loss, along with full traffic analysis/statistics. If SLAs were being offered, it would still be simple to prove they are being adhered to or broken. This type of network visibility could be offered under or over the cloud. If the physical network had different paths in their IP Core, they could then *possibly* shift where that traffic appropriately based on traditional routing metrics.

Random : In general APM/NPM are often deployed in a passive manner (via SPAN/tap) using something like RVBD Cascade mentioned above, but for those who have access to the actual servers to load agents on, Boundary has a slick solution that is able to gather flow level data from every server and correlate it together. This makes Boundary a great solution if servers are deployed in multiple data centers and in the public cloud. How did Boundary get that domain name is what I want to know? :)

In a recent post, I talked about Network vMotion and Network Driven DRS. I still believe there could be benefit integrating the virtual network and physical network. In this model, there would be intelligence exchanged between the network virtualization manager (NVP, etc.) and physical top of rack switches, not the full network including Core/Distribution. Just the virtual edge and physical edge.

Resource pools are created within vSphere today and a virtual server admin would not add a new VM to a resource pool that is already at 98%. Why would we allow a VM, or group of VMSs, to be added to physical hosts that connect to a Top of Rack (TOR) switch that already has its uplinks at 98% utilization. Simple answer. We can’t prevent it today because there is no communication between the physical and virtual networks to know this utilization or dynamically react to it, hence Network Driven DRS. Rather than being driven by the network, network I/O should be an attribute the hypervisor uses to calculate proper VM placement.

How do we mitigate this? Can we deploy something like Call Admission Control (CAC) for network virtualization?

Ensure there is no oversubscription and unlimited bandwidth in the data center using a non-blocking fabric; this may be the recommendation from vendors who sell only network virtualization
Use optical in the data center to bring bandwidth where it’s needed – a moving partial mesh; this would be the recommendation for those who sell physical networks with optical integration that tie back into the hypervisor
Load a hypervisor (VMware, etc.) agent on each physical TOR switch (VMware tools-ish) so total bandwidth capacity and utilization was known and bandwidth could be a factor in VM placement; while this sounds interesting, I’m not sure incumbent network vendors would entertain this. Possibly on open platforms or whitebox solutions.
Because what I described is applicable to network virtualization using VXLAN or VLANs, there could be parallel communication between the hypervisor manager (vCenter) and a focal point of physical network control. Hmmm --- we don’t have controllers in the physical network, yet. This could be an ideal solution once there is a controller in the data center communicating/controlling TOR physical switches. This solution would differentiate a particular vendor’s hardware solution.
Develop rack awareness for physical hosts and monitor total network bandwidth utilization for all VMs in a given rack. This could be something a server virtualization vendor can accomplish fairly simple with CDP/LLDP and bandwidth tracking.

What do you think? I’d love to hear your thoughts.

Thanks,
Jason

Twitter: @jedelman8

0 Comments

Visibility in the Network Going Forward

Leave a Reply.

Author

Categories

Archives