Justifying VDI – Part One

As Chris Murphy explained in his video blog post, EMC IT began last year to implement a virtual desktop infrastructure based on VMware View. The VDI concept is pretty straightforward, and sounds compelling: reduce desktop management complexity, more cost-effectively update aging desktops (and their operating systems), and give users greater platform choice—and “anywhere, anytime” universal access.

Can VDI really deliver its user experience promise? How much it would really benefit our company in cost savings and in increased flexibility? EMC IT came up with answers to those questions—and got a “green light” for deploying a production VDI environment during the second half of this year.

First Step: Learn The Technology

As I wrote previously, VDI “rules of thumb” generally estimate cost savings based on several assumptions and fairly ideal conditions. A number of vendors and consultants offer online “ROI calculators,” some with fairly sophisticated models. However, calculating the actual cost/benefit ratio we‘d likely see from a VDI solution in a real environment is no small undertaking.

EMC IT chose to go about this using a phased approach. Their first goal was to learn about VDI and how to make best use of it. Then they could incrementally explore benefits—and risks—at increasingly large scale.

Last year, IT began executing its learning plan with a pilot of VMware View 3.1. They did this using existing server equipment. No additional capital equipment costs.

The VDI pilot had a small number of users. They were mostly contract workers and other outside-partner employees needing access to certain EMC internal systems. Those users had previously been using virtual desktops through Citrix Desktop Broker, and their feedback showed VDI bringing an overall improvement in performance and stability.

Encouraged, IT moved on to a proof-of-concept (POC) in late summer. This time, additional Massachusetts-based VMware View 3.1 servers were provisioned to house a 200-user mix that included EMC employees at home, training facility desktops, and remote eLearning.

Feedback from this POC was still positive, but users were comparing their VDI experience with physical desktops, not Citrix terminal-server sessions. Just over half the POC users would use VDI for a secondary desktop, but only a small minority of those said they’d pick a virtual desktop to replace their current machine.

Not great news. Why the reluctance? Slower performance when printing locally and accessing local drives—especially for users in Europe and Asia. EMC IT did some testing in their lab, and found a significant performance bottleneck in Microsoft Windows’ Remote Desktop Protocol (RDP) on links with long network delays—or latency. Such transmission delays are unavoidable when accessing virtual desktops halfway around the globe. Something about that darned speed of light thing.

The latency “pain threshold” ended up being around 100 milliseconds. For local users, where network latencies are typically much shorter, virtual desktop performance was similar to a physical PC. For remote users with latencies exceeding 100ms, remote desktop performance was consistently slower than a local PC.

One application, however, actually works better running on a virtual desktop halfway around the world than locally: Microsoft Outlook. Another surprise was the number of people using media-intense collaboration tools such as Office Communicator, Live Meeting and WebEx to share applications and entire desktops seems to have taken our IT folks by surprise. IT ended up adding applications to its VDI “base image” because of how many people used them regularly. It also cause IT to adjust how much weight it gave to concerns like network latency.

Overall, VMware View showed promise, and would work well for locally based task workers. But that’s a small minority of EMC’s employee population. So IT concluded that version 3.1 would not be suitable for enterprise-wide deployment. Luckily enough, View 4.0 was becoming available by year-end, bringing “PCoverIP,” a protocol that promised to overcome RDP’s latency challenges.

Finding VDI’s Limits

In mid-September, towards the end of the VDI Pilot, IT began testing a beta release of View 4.0 in their lab. IT’s VDI lab contains network test gear that can add latency, drop network packets, and simulate a wide variety of network conditions and pathology. The test team was particularly interested in the (hopefully positive) impact of PCoverIP at higher connection latencies. The results, including “stopwatch tests” that measure wall-clock performance of virtual-desktop tasks, looked promising.

IT launched a View 4.0 based POC in early November, and it’s running as I write this with 500 users (yours truly being one of them). To improve application performance for knowledge workers, the bulk of EMC’s employees, memory allocated to each virtual machine was doubled to 2 GB.

But that didn’t need to mean cutting the number of desktops per server in half. You can see in this slide from an EMC IT deck that Cisco UCS, a key Vblock component, by the way, enabled IT to double the number of virtual desktops per CPU core. How? Custom memory ASICs. You gotta love machines purpose built for virtual infrastructure.

In addition to testing more use cases and larger scale, IT team members are gathering data to help with a tough decision: whether to use a centralized or regional strategy for a worldwide VDI deployment.

A fully centralized approach has obvious advantages from a cost and complexity standpoint. But a regional deployment has performance advantages because it significantly lowers network latency for users in regions far away from EMC’s U.S. headquarters. So a small server infrastructure was added in Cork, Ireland.

PCoverIP improved performance for most applications, even streaming video, on higher-latency connections. So maybe centralized could work, after all?

Packet loss, on the other hand, quickly renders a PCoverIP VDI session unusable. And packet loss is a greater threat over greater distances.

Bottom line: the jury’s still out on centralized vs. regional deployment.

Do any of you have experience you can share on this debate?

The good news is that deploying virtual machines and updating base system software was indeed, as promised, easier. Templates, linked clones—all cool stuff.

But some things got harder. VDI changes everything. Unspoken assumptions in desktop management can suddenly become invalid. And if we’re not paying attention, VDI’s biggest advantage—consolidation—can become our biggest source of pain.

A really unpleasant example of this started happening every day around 11:30am. Performance would suddenly tank, with seemingly random responsiveness that was so bad it could take minutes for typed characters to be displayed on screen. This typically lasted till 1:30pm. (I have to admit I usually reverted to my local desktop. It was just too hard to get anything done.) Just what the heck was going on?

One big contributor turned out to be anti-virus software. Centrally managed by EMC IT, our company’s desktops have been programmed to automatically update their virus definitions, and then scan local files every day during that two-hour period around lunchtime. Scattered over hundreds of machines across a network, the performance impact is barely noticeable. It’s a classic example of the power of distributed work.

When a couple hundred virtual machines suddenly scanning all of their “local files” on a single ESX server, the impact is dramatic. Remember, VDI’s big advantage is consolidation. Furthermore, it’s about much more than merely pooling previously distributed resources in a central place. Virtualization’s true power—in desktops, servers, networking and storage—is in oversubscribing, also known as thin provisioning. In other words, providing more virtual resources than physically exist.

For average desktop workloads, with their largely random and sporadic resource demands, oversubscribed resources translate into much greater efficiencies with little to no loss of performance. But when a large majority of a server’s VMs simultaneously start virus scanning, resources are quickly starved. Adding insult to injury, most of each VM’s files are actually physically located in a single place on disk (using “linked clone” technology), meaning hundreds of virus scanners are competing for access to the same files.

But anti-virus scanning isn’t the whole story. After dispersing scans over a much wider timeframe—and even temporarily eliminating them—the daily VDI lunchtime crunch continues. Apparently, a lot of automated activities were created, scheduled, and forgotten years ago. Fearless IT archeologists have managed to unearth a few, but this particular mystery remains.

This has sparked a debate in EMC IT. When we start deploying Windows 7, should we abandon the modifications and settings built up over the years and start fresh? Or should we pursue this problem and find its cause, no matter how much time and effort is required?

Whether or not you’re planning to use VDI, which route are you taking?

Refining Goals

For this POC, IT started using our internal social-network community as well as formal user surveys to gather feedback and suggestions. That’s important, because survey data only answers questions you already know to ask.

For example, EMC IT was considering a bring-your-own-PC program pilot as part of a later VDI deployment stage. But IT wanted to target early adopters in the next stage.

Guess what? We have a few thousand users in EMC that have purchased their own Macs. They’re using VMware Fusion to run corporate virtual desktop images, created by EMC IT, to use business applications available only on Windows. They’re already BYOPC users, the majority of them are early adopters—and largely self-supporting. What better audience for testing a BYOPC program for VDI? Or for testing a completely virtual Windows 7 desktop deployment? Sounds to me like a lot of bang for the buck. And that’s exactly how EMC IT sees it.

In my next post, we’ll finally look at TCO/ROI numbers that helped gain approval for next phase of EMC IT’s VDI project: a production rollout for 5,000 of EMC’s 40,000 desktop users.

As always, I welcome your thoughtful comments.

This entry was posted in Private Cloud, Virtual Desktop by David Freund. Bookmark the permalink.

About David Freund

David Freund is EMC’s Corporate Virtual Architect, working on strategic aspects of EMC’s technology, business development and marketing. David joined EMC in 2006 as CTO of Infrastructure Software, and has 30 years of IT experience in operations, engineering, marketing, management and services. He’s been a customer, vendor, VAR—and even an industry analyst, quoted too many times by trade- and business-press organizations for him to get away with denying it.

13 thoughts on “Justifying VDI – Part One

  1. Hi David,

    Thanks very much for sharing as this info helps confirm for me that it “Client Virtualization” is still very much in it’s infancy and while it still may have a way to go I still believe that this will gain momentum over time.

    Cheers,
    David Caddick

  2. Pingback: Justifying VDI – Part One » Welcome to privatecloud.com

  3. Pingback: Justifying VDI – Part One « EMC IT’s Journey to the Private Cloud » TechAgility

  4. Great post sharing your environment, expectations, rationale, approach, results and refinement; well done! And thanks.

    -Navneeth

  5. Pingback: Will Your VDI Solution Support your Remote Users? | The Virtualization Practice

  6. Hi David,
    Very interesting post and i can feel the pain on the challenges you have faced, AV scanning being one..
    On one of your thoughts/questions: Win 7 migration/VDI
    The majority of customers I speak with also are giving this a lot of thought, the gerneral consensus is to create a new corporate image as it gives them the chance to “clean up” the environment and also gives them a solid base for future “virtualization” of the client environment, such a user personality/profile virtualization amd of course Application virtualization…
    Cheers
    Dave

  7. One of the issues raised here was latency, especially for remote users. The fact is that PCoIP is certainly a great display protocol. However in some scenarios of slow remote connections (like over certain WANs) there may be issues where PCoIP doesn’t function quite as well. In those cases, you can complement the VMware View deployment with Ericom Blaze, a software-based RDP acceleration and compression product that provides improved performance over WANs. Besides delivering higher frame rates and reducing screen freezes and choppiness, Blaze accelerates RDP performance by up to 10-25 times, while significantly reducing network bandwidth consumption over low-bandwidth/high latency connections.

    You can use VMware View with PCoIP for your LAN and fast WAN users, and at the same time use VMware View with Blaze over RDP for your slow WAN users. This combined solution can provide enhanced performance in both types of environments, letting you get the best out of VMware View for your users.

    Read more about Blaze and download a free evaluation at:
    http://www.ericom.com/ericom_blaze.asp?URL_ID=708

    Adam

  8. Great post, David. We are starrting off too on a VDI PoC, so this is an excellent read. Can you comment on the use cases you chose for the PoC? I would like to have a chat with you further – do you have an email contact?

    Thanks
    -Neeraj

  9. Hi David this is a very good insight into your thought processes and issues. It is great to share this data. I have one question has EMC thought or experimented with voice traffic on a VDI devices using say a softphone application?
    Interested to hear your experiences if you have.
    Kevin

  10. Great post David
    We are also launching a pilot, although in a much smaller scene. We do have a unique situation though, we do not control the network infrastructure at our remote offices, our client does, so for us the pilot will consist of a combination of VDI and DaaS.

  11. Pingback: Designing a Desktop Architecture for offsite Offices « Monné

Leave a comment