Sunday, June 26, 2011

VM Sprawl a burning issue !

Server virtualization has been extremely successful. It has reduced physical server counts, increased business flexibility and made DR planning simpler. But server virtualization has also brought its own set of challenges, one of which is virtual machine (VM) sprawl. VM sprawl is the ‘weed-like’ growth in VMs that, similar to ‘NT server sprawl’ a decade ago, has become a management problem for IT administrators everywhere.

VM Sprawl a burning issue in all virtual world, Oh people who are yet to implement or in planning phase to implement might be does not come across this buzz word. But friends of mine who are already sailing in the boat of virtualization are already worried about it. The entire TCO and ROI of virtualization are proving to be failure in just 2-3 year time frame.  Maximum gain and savings are getting returned after 5 year if you are investing in infrastructure but just after 3 year it’s started draining a hefty about from your budget just like petrol prices.
Now some of you are interested to know what is VM Sprawl. So there you go.

The web definition is presented as.
The number of virtual machines (VM) running in a virtualized infrastructure increases over time, simply because of the ease of creating new VMs, not because those VMs are absolutely necessary for the business. Concerns with VM sprawl are the overuse of the infrastructure if it is not needed and the cost of licenses for virtual machines that may not have been required.
Because of this ease of deployment virtual servers are routinely ‘stood up’, almost as soon as they’re requested. It seems that this happens without much thought being given to how important the application is or the length of time the VMs need to be deployed. There are cases of VM growth rates approaching 125% per year, with the majority of those VMs being servers that never existed before the switch to virtualization.

Definition for tech gigs
When the number of virtual machines (VMs) on the network reaches the point where an administrator can't manage them effectively -- or where the VMs start demanding excessive host resources -- that means there is virtual sprawl.
Virtualization sprawl or VM sprawl is defined as a large amount of virtual machines on your network without the proper IT management or control. For example, you may have multiple departments that own servers begin creating virtual machines without proper procedures or control of the release of these virtual machines.
Now, if you let months go by, bottlenecks begin to appear on servers or crashes occur because system resources are low. Now the research begins by an IT department, and they now begin to understand the nightmare that has become their reality.
Hypothetically, a company that had 10 physical servers one year ago might have dropped that number down to eight with virtualization. But today, that company might now have 25 VMs running on those eight servers. The number of physical servers the company needs to manage has dropped by 20%, but the number of operating system instances has increased by 150%!
The reason for this growth is simple: Engineers and users have gotten used to the ease with which they can deploy a virtual machine. Application users continually ask for their own server. With VMware, engineers can easily accommodate those requests. Savvy users realize how easy it is to get dedicated server space, so, he said, the number of VMs keeps increasing.

What is the cost of an orphaned VM?
The truth is that orphaned VMs are not really idle. They’re still consuming memory and CPU cycles and burdening the hypervisor to continually check-in to see if the VM needs additional resources. And, they’re consuming disk resources, which, can be quite high, thanks to the practice of using templates to make the set up easier. Most administrators set a “safe” file size in their VM templates to make sure there’s always enough disk capacity. It’s very likely that idle VMs can be tying up TBs of excess disk space in a typical environment. Most server virtualization environments have made the extra investment and deployed shared storage for VM flexibility which means this wasted capacity is coming at a premium price.
 Orphaned VMs also unnecessarily add to the cost and complexity of the data protection processes. The data associated with these orphaned systems is often included as part of a default replication strategy which takes disk space at a DR site. These orphaned VMs also consume backup resources, as they’re saved when full backups are executed and examined during each incremental backup, to confirm that no changes have occurred since the last backup. Orphaned VMs can also have an impact on the performance of other VMs on the same server, so it is critical that administrators keep track all of the VMs sharing and drawing on the same resources.
Compared to a physical server, the effort required to identify, turn off and archive an orphaned virtual machine is minimal. Physical machines need to be powered off, de-racked and physically stored or securely discarded. If suddenly an application needs to be regenerated the physical deployment has to occur all over again. A virtual system can be turned off and the virtual machine image can be archived to less expensive storage. Returning to operation requires only a few clicks and the time to do a disk to disk transfer.
If you are in implementation or in planning phase the five basic questions should be answered before you should take any decision those questions are very well listed at Also there are various ways by which we can put control on VM Sprawl issue. Some of them are enclosed for easy of reference.
Identifying orphaned VMs should be the first step in getting VM sprawl under control. These are server instances that had been set up for a specific purpose, but outlived their usefulness quickly and so were abandoned. For example, a request may come in for a VM to test a new version of an application. The server is only needed for about 30 days, but after the testing is done, it just sits idly - another orphaned server, with no task to perform.

Reactive measures
In its simplest form VM sprawl reactive resolution can start by general house cleaning, this won’t require you to purchase a product as using Virtual center can quite easily accomplish and target reductions if needed. For example some VM’s might be not registered on ESX hosts; some might be replicated or spun off to a clone due to original operational issues when the app team or ISV deployed the VM. You may also find that your actual presented VMDK’s for VM’s are way under filled so they can be shrunk to regain space.
On the consumed storage issues, vSphere 4 introduces a few added pieces of functionality which will aid and reduce this in future, any recommendations are based on current releases. Main features include Thin Provisioning of VM’s, this will enable you to grow VM usage and not have what is effectively whitespace within your VMDK’s unable to be used.

The first step is to identify these virtual machines and archive them out of the environment, or at least turn them off. A monitoring tool like Vizioncore’s vFoglight can provide data on resource utilization, template efficiency and deployment strategies. These tools will monitor from a virtual machine view, a vCenter view or a data center view, essential to detecting virtual machines that are inactive. They will also allow the close monitoring of specific resources that can provide additional clues to identifying orphaned VMs. The ability to examine a VM over the course of time is critical. Low memory and CPU utilization for one night does not justify the decommissioning of a VM, but over the course of a few weeks, it likely does.

Once the orphaned VMs are identified they can then be dealt with. For VMs that will likely see resumed use, simply tag and turn them off. This is a key advantage over physical systems. If there are applications in the environment that are only run quarterly, for example, it’s easy to turn them off and on as needed. Physical systems require physical interaction and typically are not used on an as-needed basis like this.
VMs that are deemed highly unlikely to be needed in the future can be archived to a secondary disk tier that’s lower in cost per GB and more power efficient. Using tools like Vizioncore’s vRanger, the archived VMs can be recalled with a view clicks. This provides the ability to free up all the disk resources discussed earlier and store the server in a secure state, in case there’s a need to show chain of custody in a legal action.

Proactive planning and prevention
Every virtualized environment should have at least some kind of documented audit, if you have not got a CMDB then in simplest form an Excel spreadsheet provides a simplistic view of your Virtual Infrastructure and allocation. Virtualcenter has exportable reporting built in to contribute to build even a simple spreadsheet, to see this in action withinin your Virtualcenter today goto “VM and Template View” then select the highest level folder then select “File > Export > Export List”. Some VI Admins may be quite clever with powershell scripts or by building SQL queries but this is quick and easy and intuitive. You can use this type of audit to also help capacity planning for your environment, this enables you to monitor how much space you have left and perform simplistic “What If” analysis on how much disk, RAM and CPU resource you would have when adding a new machine that is being requested.

Control and Automation
Once the existing environment has been cleared of orphaned systems, the next step is to put procedures in place to keep VM sprawl from happening in the future. With products like Vizioncore’s vControl and the public domain scripting capabilities of VESI the whole process can be automated. For example, more granular use of templates can be instituted. During their creation the administrator can be prompted for the needed VM disk size to keep utilization efficient. They can also provide VM expiration dates and the name of the VM requester. This information can be embedded into the notes section of the VM. A subsequent task could then check for expired VMs and email the requester for authorization to turn it off. A final task could then be run which turns off all expired and confirmed VMs. Especially in server virtualization, this kind of broad automation is critical to enable system administrators to increase the amount of VMs that they can manage.
 Effective management of VM sprawl is enabled by having the right tools. Some of these capabilities exist within the server virtualization software, but need the help of automation tools to allow administrators to take full advantage of them. For exacting control however, third-party programs that can monitor and archive these virtual machines are required. Through this combination of internal utilities and external software tools, the ‘great VM sprawl challenge’ can be managed and the ROI on server virtualization projects increased.
Some of the good VM Sprawl Management Tools are listed at
Again VMware Virtualcenter will at some point this year have functionality within a module called CapacityIQ to enable you to gain this functionality from within the vCenter console, for more information see on this. I’ve seen it in action and its great, it provides out of the box functionality which will most certainly aid what I’ve said about within this post.
The ease with which VMs are created makes it that much easier for VMs to be launched and moved willy-nilly regardless of the security and software licensing cost issues, just to name two common problems. Vendors of course have been hip to these challenges. This month, Embotics Corp. released version 2.0 of its V-Commander management software designed to automatically nip virtual sprawl in the bud. One way the software does this is by automatically enforcing policy dictating such things as VM expiration dates and through role-based security access that defines just who can do what in terms of VM creation and migration.

The Rolls Royce solution
For larger enterprise sized Virtual environments, keeping track of the constant demand and growth demand is impossible and to succeed IT services ideally need to be self service based with the end user or customer being able to request what they want through web mechanism. It would sound stupid to provide the enduser with control to increase even more the created problem of sprawl that you are experiencing, however to combat this the SSP (self service portal) can be provided with delegated privileges, pre defined object creation control, approval processes to higher level management or project support offices and also they can provide proactive benefits such as what if analysis and tombstone of Virtual Machines. All policy within the technology which is applied is set by IT governance policies and defined according to business requirement within the tools.
Two example products which provide self service portals include;
These technologies are currently rather low on uptake and adoption within organisations today, there maybe more technologies on the market but with using example functionality in the above products we will certainly start to see more and more as IT departments struggle with the demands from the business for Infrastructure. I also predict that the technologies will also start to become known as has with VMware the killer app to reduce lost productivity gain within organisations and project teams.
The issues today with the products are they currently they do have medium to large price tags associated which puts off the typical bean counter when businesses cases are put forward, so before building any proposals do your research on the product and see where you feel it is able to reduce and cut current tedious expensive business processes, VM Sprawl and improve your budgeting cost projects so this can be equated into a measurable deliverable ROI post deployment of such product.

No comments: