SQL Server 2012 : Running SQL Server in A Virtual Environment - ARCHITECTING SUCCESSFUL VIRTUAL DATABASE SERVERS

8/3/2013 11:31:35 AM

To ensure SQL Server works at least as well as you expect, if not better, when you deploy it on a virtual server, design considerations must be made, which often involve settings and decisions that you wouldn’t have to make with a physical server deployment. These considerations, as you will see, cover a range of environment components that you may not have had to consider before, yet will sometimes have considerable influence on the performance of your database server instance.

Architecting Virtual Database Servers vs. Physical Database Servers

In the physical server world, assigning and reserving hardware resources such as CPU and memory is relatively straightforward. Ultimately, a server chassis only ever runs one instance of an operating system that has every piece of hardware resource in the server available to it. For example, if a server has 16 physical CPU cores and 32GB of memory installed in it, then Windows would also have that amount of resources available to it. This “dedicated resource” approach not only makes server design easier, it also makes troubleshooting performance issues easier, as whatever data Performance Monitor shows is the truth.

For most users of SQL Server, the introduction of storage area networks (SANs) was the first time they had to continuously depend on a shared resource. Remote network shares had previously been used for backups, but as long as their performance was “good enough” no one really questioned it. SANs, however, were different because not only were they often highly influential on SQL Server’s performance, they were also usually shared among several servers at a time. This meant that the activities of one server could potentially affect the performance of all the other servers also using it. In small environments this wasn’t so bad, as the DBA could probably see all the other servers connected to the SAN and quickly identify the cause of any problems; but in much larger environments the DBA had to rely on the SAN administrator to provide assurances and performance data that ruled out SAN contention as a potential cause of performance issues for SQL Server.

Most virtualization environments use SANs as their shared storage, so the scenario just described is something many administrators will be familiar with, especially those responsible for making instances of SQL Server perform as fast as possible. However, as shown in Figure 1, with virtualization not only is the SAN a shared resource, so is the host server’s hardware resources such as the CPU and memory. Now, the DBA has to consider not only other users of the SAN affecting SQL Server’s performance, but also other virtual servers running on the same physical host server.

FIGURE 1

The diagram below, shows how a visualization administrator could inadvertently allocate the same physical storage to several virtual servers, attracting performance issues if the configuration of that shared storage resource can’t meet the future performance requirements of it.

Sharing resources doesn’t always have to be a problem; in fact, if it is a problem then something is wrong. A correctly sized virtualization environment consisting of adequately resourced host servers and sensibly configured virtual servers should be the foundation for a very successful and well-performing virtualization environment. However, success can be assured only by consistently adhering to and monitoring these considerations.

Virtual Database Server Design

This section covers the most important areas of a virtual server’s design. Understanding these elements will ultimately help you deploy SQL Server in a virtual server successfully.

Memory

Of course, the requirement for SQL Server to have sufficient memory to deliver the performance levels expected of it isn’t eliminated when it’s virtualized. If anything, this requirement can become harder to fulfill because as you just saw, virtualization creates more opportunities — through contention — for memory to be taken away from SQL Server. Because the performance of SQL Server is so dependent on adequately sized and provisioned memory, it ranks first in my list of design considerations. It should be yours too.

For database servers, you should usually ensure that for every GB of memory allocated to the virtual server there is one GB of uncontended physical memory in the host server, and any future host servers the virtual server might run on for whatever reason. If your hypervisor allows it, as VMware’s does, a memory reservation should be set to ensure that the virtual server always has reserved for it the memory it’s allocated. This is particularly important in VMware environments where it’s possible to over-allocate a host server’s memory in the hope that all the virtual servers won’t need it at the same time, yet performance problems might occur if they do.

Microsoft’s Hyper-V technology in Windows Server 2008 R2 Service Pack 1 and later, however, offers Dynamic Memory, which enables you to allocate any unused physical memory in the host server to virtual servers should they ever be detected to have low-memory conditions.

With Hyper-V, a virtual server could be configured to boot with 16GB assigned to it but have an upper limit memory of 32GB. If while running, the virtual server gets close to using all its initial 16GB of memory, the Hyper-V hypervisor will respond by increasing the memory allocated to the virtual server if sufficient free physical memory is available. It won’t immediately assign the full 32GB, but it will slowly increase the amount the virtual server needs to eliminate the low-memory condition. As we mentioned earlier, SQL Server will then respond to the extra memory becoming available by expanding the buffer pool.

While it’s good that extra memory might be available when it’s most needed, care should be taken not to assume that it will always be available; or that if extra memory has been allocated, it won’t be taken back by the balloon driver if a virtual server with a higher weighting needs the memory instead.

Dynamic Memory is a good way to size a new database server’s memory requirement as if after a few weeks or a complete cycle of business activity the memory allocated to a database server hasn’t increased above 17GB, you could be confident that 18GB, rather than 16GB, is an acceptable memory allocation for that virtual server.

VMware’s memory overcommitting feature works slightly differently as the virtual server is told it has an amount of memory assigned to it but only has the memory it’s currently using allocated to it. You can still size a database server’s potential memory requirement, but the memory utilization data will have to come from VMware’s performance counters rather than straight from those provided in Windows.

Storage

Storage is usually the second most important part of a virtual server’s design in order to ensure SQL Server has the performance it needs to deliver the results expected of it.

Assigning storage to a virtual server is accomplished by attaching a virtual hard drive to it. A virtual drive is just a flat file stored and managed by the hypervisor but presented to the virtual server’s operating system as though it were a physical disk. From that point, Windows can create a partition on it that is then mounted as drive D, formatted as NTFS, and so on.

When you deploy SQL Server in a physical environment you know it can benefit hugely by having multiple sets of unshared and uncontended hard drives available for its data storage. You typically see these used to distribute the system files, data files, log files, and in tempdb across different sets of spindles. The same consideration should be given to your virtualization environment’s storage design, if at all possible multiple uncontended groups of physical disks should be used to place each virtual hard drive for SQL Server on. Of course, that’s not always possible as some SAN’s now like to pool all of their disks into one large group, in which case you should work with your storage team or vendor to understand how to get the best possible concurrent performance out of it.

Even though hypervisors have different ways of presenting storage to virtual servers, whether they use fixed or dynamic virtual disks, or raw device mappings, you ultimately need to ensure that SQL Server’s I/O activity isn’t negatively affected by competing workloads at the spindle level. Another possible performance impact, which is unique to virtual environments, can occur when the same set of spindles holds not only the database server’s virtual hard drives but also the virtual hard drives and entirely separate virtual servers. This is a typical deployment practice for virtual environments, and for most virtual servers it is usually acceptable. However, SQL Server sometimes doesn’t fit in the “usual” category — and being as sensitive as it is to storage latency, it will suffer from busy virtual hard drive storage.

Therefore, as a best practice, for active and busy servers I recommend the following storage design principles:

Ensure that every logical drive within the Windows operating system is a separate virtual hard disk if you’re able to place each of them on a separate physical SAN partition, known as LUNs.
Test any use of thin provisioning, both at the hypervisor and storage layers, before you deploy it. Thin provisioning enables the virtual server to operate as though it has potentially much more disk space allocated to it than what has actually been provisioned for it yet. For example, in Windows we might see a 100GB logical drive but if it only has 28GB of data stored on it, then the hypervisor or the SAN may only have actually provisioned 28GB. This enables system administrators to over-allocate storage but only deploy the amount of storage required, rather than the amount provisioned, which may never be used. While this makes commercial sense, there’s always an element of uncertainty regarding how much of a performance impact thin provisioning might have on SQL Server when it performs a write. I’m sure the vendors promote their thin-provisioning technologies in a way that reassures you this won’t be an issue; but when you consider that the most performance-sensitive action in SQL Server can be a transaction log write, you never want to let anything unnecessarily impact the performance of that write.
Consider contention on your storage infrastructure’s networking. If you deploy SQL Server on a dedicated physical server and use SAN storage, then SQL Server is likely to have almost the entire capacity of the server’s host bus adapter and fabric switch port dedicated to it. This helps support large numbers of concurrent reads and writes to the physical disks SQL Server needs to use. When running on a virtual server it’s likely that several virtual servers will be using the host server’s HBA for all their I/O activity, so it’s possible that a virus scan of a mail server’s file system could use the same HBA that your instance of SQL Server uses. This sharing of HBAs gets potentially even worse when blade servers are used as there might only be two HBAs for an entire chassis of 16 blade servers, and is particularly relevant as blade servers are a popular choice in medium and large virtualization environments. The easy solution to this is to ensure that your physical host server has adequate HBAs installed to support its total peak I/O workload.

As you can see, you need to consider a few additional factors when deploying storage for a virtual database server compared to a physical server, but none of them should be a challenge for an adequately resourced virtualization environment.

CPU

Configuring a virtual server’s virtual CPUs, also known as vCPUs, is one of the easier tasks to perform, and until recently it was the first capacity ceiling users hit when they sized their virtual servers.

A vCPU is what is presented by the hypervisor to the virtual server as a “physical” CPU core; for example, Task Manager in a virtual server with 4 vCPUs would show 4 CPUs. However, although a vCPU acts and behaves like a physical CPU core, it doesn’t perform like one. That’s because there is no one-to-one mapping between a vCPU and a physical CPU core. Instead, a vCPU is a logical thread that is put on a physical CPU core’s queue when it has something to execute. The more requests in the queue, the longer it takes for a thread to have its workload executed.

Additionally, another phenomenon unique to virtualization appears when a virtual server is configured with multiple vCPUs. The hypervisor knows that if a physical server has four physical CPU cores, then the operating system expects to be able to execute an instruction on all four simultaneously; a very abstract example would be SQL Server executing a parallel query across four logical CPUs, in which case SQL Server would expect all four CPUs to be available concurrently, rather than have to wait for CPUs to become available.

VMware’s relaxed co-scheduling algorithm is clever enough to know that if only two of a virtual server’s four vCPUs need to perform work, then only two physical CPU cores need to be found to execute the virtual server’s work. This can make managing concurrent virtual server workloads more efficient to manage, as the number of physical CPU cores needed is often much lower than you would imagine. There can be issues with this approach though, when a virtual server that really does need to use a large number of virtual CPUs has to compete with a large number of smaller virtual severs with a low number of virtual CPUs. VMware’s scheduling behaviors mean these smaller virtual machines can jump on and off the CPUs faster, delaying the ability to schedule the single much larger request for CPU time.

However, other hypervisors will only “run” the virtual server when enough physical CPU cores in the host server are free to run each of the vCPUs concurrently. For example, if the virtual server has four vCPUs assigned and the host server has 16 physical CPU cores, then the hypervisor would have to wait until four of the 16 physical CPU cores were available before it could process any CPU instructions from the virtual server. In this example if the host server was only running that virtual server there shouldn’t be any unnecessary latency, as with 16 physical cores available there will most likely always be four available. It’s only when virtual servers start competing between them for a limited number of physical CPU cores that performance problems can occur.

As a consequence of this potential for CPU access latency, a general recommendation is for virtual servers to be configured to have as few vCPUs as they actually need. That’s not to say they can’t have a large number if they need them, but the assumption that the more you have the faster the server will be may not always be correct.

Be aware that the virtual server itself is totally unaware of this latency, so it can only be monitored from outside of the virtual server. High CPU latency will not, for example, incur high SOS_SCHEDULER_YIELD wait stats within SQL Server.

The latest generation of hypervisors now support large numbers of vCPUs per virtual sever. VMware’s vSphere 5 and Microsoft’s Hyper-V in Windows Server 2012 both support assigning 32 vCPUs to a virtual server, an incredible amount of computing power but a capability that is critical to their goal of being able to virtualize any workload. My only advice is to ensure that you have adequate physical CPU cores in the host servers to support such a large number of vCPUs. The physical CPU cores in the host server should not be overcommitted — that is, try to ensure that no more than two vCPUs are assigned to running virtual servers per physical CPU core, although this limit may need to be lower in demanding workload environments.

When considering how many vCPUs to assign to your virtual database server, I recommend using previously captured performance data along with an understanding of your SQL Server workload. If you are running a lot of queries serially, then a large number of vCPUs may not be required if the underlying clock speed of the physical CPU cores is high. However, a workload with a higher proportion of parallel queries needs additional vCPUs. Performance testing using Performance Monitor and monitoring the SOS_SCHEDULER_YIELD wait stat within SQL Server are good sources of monitoring data to determine whether you have assigned sufficient vCPUs.

Some hypervisors, such as VMware’s vSphere, allow MHz limits to be placed on vCPUs, as well as a specific number of vCPUs to be allocated. For example, the virtual server may be allowed to execute four concurrent CPU threads across four vCPUs, yet each of those vCPUs may be limited to operating at 1GHz despite the fact that the clock speed of the physical CPU core it’s using is much higher.

At the other end of the CPU sizing topic, however, is the issue of assigning too few vCPUs to a virtual server. The only recommendation I make here is to always assign at least two to any virtual server running Windows and SQL Server. Much of the code shipped by Microsoft is multi-threaded today and designed for servers with more than one logical CPU. Even the difference in the time it takes to install Windows on a virtual server with two vCPUs rather than one can be noticeable.

Networking

Networking, or rather adequate network bandwidth, is often overlooked when considering a virtual server’s design, probably because it’s only the source of any performance issues if the server performs lots of data transfer. My only recommendation here is that if you know that your database server transfers a lot of data over the network or is sensitive to any network latency, then ensure that you have sufficient network interface cards in the host server. In the same way that the host bus adapters needed for SAN storage can become contended, so can the NICs. Examples of SQL Server workloads that would warrant reviewing your host server’s NIC sizing include large ETL imports or exports, or the use of synchronous database mirroring or AlwaysOn availability groups.

High Availability

Moving away from resource allocation but still an equally important design consideration is how, if at all, you will provide high availability for the SQL Server instance.

Before I review the HA services available to SQL Server 2012 in a virtual environment, I want to reiterate some advice provided earlier: If possible, ensure that you only ever have a single HA service monitoring, controlling, and reacting to a failure. Having just one place to look when the system goes down saves time!

The next pair of high-availability services are the same as what you would deploy with regular physical servers: AlwaysOn Failover Clustering Instances and AlwaysOn Availability Groups. For those considering deploying these on virtual servers running SQL Server, my recommendations are as follows:

Configure the “server affinity” settings for your virtual servers such that the hypervisor ensures that the virtual servers that are part of your cluster or AlwaysOn availability groups are never run on the same physical host server at the same time. The idea is to protect against host server failure, so you want to remove any single points of failure.
If you deploy any synchronous database mirroring, then ensure that you have adequate network bandwidth available on all the host servers on which you will run virtualized SQL Server instances.
Likewise, for any servers involved in synchronous mirroring, ensure that you have adequate free physical CPU resources available on the host servers so that any latency to which vCPUs are exposed as they wait for physical CPU time is kept to a minimum.
Finally, will your virtualization environment have any SAN-level replication deployed in order to replicate your storage system to another SAN infrastructure, typically off-site, for disaster recovery purposes? If so, you should consider whether it is using synchronous or asynchronous mirroring and what performance and data consistency impact that may have on SQL Server. It is critical to maintain storage-level transactional consistency between all the drives a SQL Server database uses; there is no point to having an updated data file drive at a storage level at the remote site if the transaction log drive is a few write transactions behind.

Operating System Enhancements

When an instance of Windows is deployed on a virtual server, other than ensuring that you have the correct hardware device drivers, there’s nothing specific to virtualization that needs to be configured within the operating system other than to make sure the hypervisor’s tools we discussed earlier are installed.

SQL Server Memory Configuration

Like Windows, SQL Server can also be installed on a virtual server and will operate quite successfully without any specific tuning or configuration. This is further proof of just how well a virtual server can emulate a physical server, assuming you have the resources, such as CPU and memory, available to SQL Server that it needs to run optimally for your workload.

However, you may want to consider configuring the Max Server Memory setting within SQL Server, although you’d probably do this on a well-tuned physical server as well. In SQL Server 2012 this setting now places a working limit on the total memory SQL Server uses, whereas in previous editions this only influenced the size of the buffer pool.

If you are using VMware, their recommendation is to set SQL Server’s max server memory value to be based on the size of the memory reserved for the virtual server. For example, allow 2GB of memory for the operating system and assign the rest to SQL Server on a dedicated database server.

If you are deploying your virtual servers in a Microsoft Hyper-V environment, then the advice is slightly different. If you are not using Microsoft’s Dynamic Memory feature, then you can be assured that whatever memory your virtual server appears to have, it actually does have, so you should configure you max server memory setting based on that value.

If you are using Dynamic Memory, then you should set the Startup RAM value in Hyper-V to represent as much memory as SQL Server will normally need, and set the Maximum RAM value to allow for any extra memory you think the virtual server might be allocated in a peak workload situation. In my experience, setting Max Server Memory to be 2GB lower than the Maximum RAM value set in the Hyper-V configuration allows SQL Server to increase its memory utilization as more memory is allocated. Note that this situation requires the Standard or Enterprise Editions of SQL Server 2012, or the Enterprise Edition of previous versions.

Common Virtualization Design Mistakes

In my role as a consultant, I regularly work with many different instances of SQL Server, an increasing number of which now run on virtual servers. Each virtual environment I see them run in has its strengths and weaknesses. The following list describes some common design decisions that have a negative impact on SQL Server:

Too many vCPUs — As mentioned earlier, the more vCPUs a virtual server has, the longer the virtual server potentially has to wait for sufficient underlying physical CPU cores to become available in order for it to execute a CPU instruction. I’ve seen virtual servers running fairly light SQL Server workloads that have 4, 8, or even 16 vCPUs assigned to them “because they could” while in fact they could have performed comfortably with just two.
Unmanaged memory configurations — Sometimes the configuration options in the hypervisor make it seem like you can assign as much memory as you like, and it will take care of finding and allocating all the memory required. There’s some truth in that but you still need to account for all the memory assigned to virtual servers, even if some of the burst capability you give them is contended/shared with other virtual servers. In one environment I saw, when a virtual server running the backup software got busy at night, the performance of all the other virtual servers dropped severely! It was difficult explaining how such a simple misconfiguration completely unrelated to SQL Server was causing this. The solution is to know where all your critically needed memory will come from during the busiest of workload periods.
One big LUN with one big partition — SQL Server, even when running in a virtual server, benefits from having multiple uncontended drives for the different types of files it uses. A default deployment option for some virtualization software is to use a single large pool of physical drives and create a single large partition on it; onto that are put all the files for the entire virtualization environment. This can quickly lead to storage hotspots, workload contention, and having to adopt the same storage configuration settings for every virtual server, such as the storage cache policy. Ideally, a SAN should be configurable so that different controller settings can be applied to different storage partitions, allowing the storage that SQL Server will ultimately use to be optimized wherever possible. It also makes performance monitoring difficult because you can’t always easily identify a specific virtual server’s workload. As you saw earlier, the solution here is to distribute a busy database server’s storage across multiple virtual hard drives, providing they can use different groups of physical drives.
Saturated host bus adapters — It is common to see a host server with only a single HBA running several I/O-intensive virtual servers. Not only is this a single point of failure, but the HBA can easily get saturated, causing I/O requests to be queued and introducing storage latency.

Other -----------------

- SQL Server 2012 : Running SQL Server in A Virtual Environment - IDENTIFYING CANDIDATES FOR VIRTUALIZATION

- SQL Server 2012 : Running SQL Server in A Virtual Environment - MANAGING CONTENTION

- Microsoft Content Management Server Development : A Placeholder Control to Store All HTML Tags (part 2)

- Microsoft Content Management Server Development : A Placeholder Control to Store All HTML Tags (part 1)

- Sharepoint 2013 : Create a Team Site, Create an Enterprise Wiki Site in SharePoint Server, Create a Blog Site

- Sharepoint 2013 : Create a Subsite

- SQL server 2008 R2 : Reverting to a Database Snapshot for Recovery

- SQL server 2008 R2 : Setup and Breakdown of a Database Snapshot

- Windows Home Server 2011 : Maintaining Windows Home Server - Checking Free Disk Space on the System Drive

- Windows Home Server 2011 : Maintaining Windows Home Server - Checking Your Hard Disk for Errors