To ensure SQL Server works at least as
well as you expect, if not better, when you deploy it on a virtual
server, design considerations must be made, which often involve settings
and decisions that you wouldn’t have to make with a physical server
deployment. These considerations, as you will see, cover a range of
environment components that you may not have had to consider before, yet
will sometimes have considerable influence on the performance of your
database server instance.
Architecting Virtual Database Servers vs. Physical Database Servers
In the physical server world, assigning
and reserving hardware resources such as CPU and memory is relatively
straightforward. Ultimately, a server chassis only ever runs one
instance of an operating system that has every piece of hardware
resource in the server available to it. For example, if a server has 16
physical CPU cores and 32GB of memory installed in it, then Windows
would also have that amount of resources available to it. This
“dedicated resource” approach not only makes server design easier, it
also makes troubleshooting performance issues easier, as whatever data
Performance Monitor shows is the truth.
For most users of SQL Server, the introduction of
storage area networks (SANs) was the first time they had to continuously
depend on a shared resource. Remote network shares had previously been
used for backups, but as long as their performance was “good enough” no
one really questioned it. SANs, however, were different because not only
were they often highly influential on SQL Server’s performance, they
were also usually shared among several servers at a time. This meant
that the activities of one server could potentially affect the
performance of all the other servers also using it. In small
environments this wasn’t so bad, as the DBA could probably see all the
other servers connected to the SAN and quickly identify the cause of any
problems; but in much larger environments the DBA had to rely on the
SAN administrator to provide assurances and performance data that ruled
out SAN contention as a potential cause of performance issues for SQL
Server.
Most virtualization environments use SANs as their
shared storage, so the scenario just described is something many
administrators will be familiar with, especially those responsible for
making instances of SQL Server perform as fast as possible. However, as
shown in Figure 1,
with virtualization not only is the SAN a shared resource, so is the
host server’s hardware resources such as the CPU and memory. Now, the
DBA has to consider not only other users of the SAN affecting SQL
Server’s performance, but also other virtual servers running on the same
physical host server.
The diagram below, shows how a visualization
administrator could inadvertently allocate the same physical storage to
several virtual servers, attracting performance issues if the
configuration of that shared storage resource can’t meet the future
performance requirements of it.
Sharing resources doesn’t always have to be a
problem; in fact, if it is a problem then something is wrong. A
correctly sized virtualization environment consisting of adequately
resourced host servers and sensibly configured virtual servers should be
the foundation for a very successful and well-performing virtualization
environment. However, success can be assured only by consistently
adhering to and monitoring these considerations.
Virtual Database Server Design
This section covers the most important
areas of a virtual server’s design. Understanding these elements will
ultimately help you deploy SQL Server in a virtual server successfully.
Memory
Of course, the requirement for SQL
Server to have sufficient memory to deliver the performance levels
expected of it isn’t eliminated when it’s virtualized. If anything, this
requirement can become harder to fulfill because as you just saw,
virtualization creates more opportunities — through contention — for
memory to be taken away from SQL Server. Because the performance of SQL
Server is so dependent on adequately sized and provisioned memory, it
ranks first in my list of design considerations. It should be yours too.
For database servers, you should usually ensure
that for every GB of memory allocated to the virtual server there is one
GB of uncontended physical memory in the host server, and any future
host servers the virtual server might run on for whatever reason. If
your hypervisor allows it, as VMware’s does, a memory reservation should
be set to ensure that the virtual server always has reserved for it the
memory it’s allocated. This is particularly important in VMware
environments where it’s possible to over-allocate a host server’s memory
in the hope that all the virtual servers won’t need it at the same
time, yet performance problems might occur if they do.
Microsoft’s Hyper-V technology in Windows Server
2008 R2 Service Pack 1 and later, however, offers Dynamic Memory, which
enables you to allocate any unused physical memory in the host server to
virtual servers should they ever be detected to have low-memory
conditions.
With Hyper-V, a virtual server could be configured
to boot with 16GB assigned to it but have an upper limit memory of
32GB. If while running, the virtual server gets close to using all its
initial 16GB of memory, the Hyper-V hypervisor will respond by
increasing the memory allocated to the virtual server if sufficient free
physical memory is available. It won’t immediately assign the full
32GB, but it will slowly increase the amount the virtual server needs to
eliminate the low-memory condition. As we mentioned earlier, SQL Server
will then respond to the extra memory becoming available by expanding
the buffer pool.
While it’s good that extra memory might be
available when it’s most needed, care should be taken not to assume that
it will always be available; or that if extra memory has been
allocated, it won’t be taken back by the balloon driver if a virtual
server with a higher weighting needs the memory instead.
Dynamic Memory is a good way to size a new
database server’s memory requirement as if after a few weeks or a
complete cycle of business activity the memory allocated to a database
server hasn’t increased above 17GB, you could be confident that 18GB,
rather than 16GB, is an acceptable memory allocation for that virtual
server.
VMware’s memory overcommitting feature works
slightly differently as the virtual server is told it has an amount of
memory assigned to it but only has the memory it’s currently using
allocated to it. You can still size a database server’s potential memory
requirement, but the memory utilization data will have to come from
VMware’s performance counters rather than straight from those provided
in Windows.
Storage
Storage is usually the second most
important part of a virtual server’s design in order to ensure SQL
Server has the performance it needs to deliver the results expected of
it.
Assigning storage to a virtual server is
accomplished by attaching a virtual hard drive to it. A virtual drive is
just a flat file stored and managed by the hypervisor but presented to
the virtual server’s operating system as though it were a physical disk.
From that point, Windows can create a partition on it that is then
mounted as drive D, formatted as NTFS, and so on.
When you deploy SQL Server in a physical
environment you know it can benefit hugely by having multiple sets of
unshared and uncontended hard drives available for its data storage. You
typically see these used to distribute the system files, data files,
log files, and in tempdb across different sets of spindles. The same
consideration should be given to your virtualization environment’s
storage design, if at all possible multiple uncontended groups of
physical disks should be used to place each virtual hard drive for SQL
Server on. Of course, that’s not always possible as some SAN’s now like
to pool all of their disks into one large group, in which case you
should work with your storage team or vendor to understand how to get
the best possible concurrent performance out of it.
Even though hypervisors have different ways of
presenting storage to virtual servers, whether they use fixed or dynamic
virtual disks, or raw device mappings, you ultimately need to ensure
that SQL Server’s I/O activity isn’t negatively affected by competing
workloads at the spindle level. Another possible performance impact,
which is unique to virtual environments, can occur when the same set of
spindles holds not only the database server’s virtual hard drives but
also the virtual hard drives and entirely separate virtual servers. This
is a typical deployment practice for virtual environments, and for most
virtual servers it is usually acceptable. However, SQL Server sometimes
doesn’t fit in the “usual” category — and being as sensitive as it is
to storage latency, it will suffer from busy virtual hard drive storage.
Therefore, as a best practice, for active and busy servers I recommend the following storage design principles:
- Ensure that every logical drive within the Windows operating system
is a separate virtual hard disk if you’re able to place each of them on a
separate physical SAN partition, known as LUNs.
- Test any use of thin provisioning, both at the hypervisor and
storage layers, before you deploy it. Thin provisioning enables the
virtual server to operate as though it has potentially much more disk
space allocated to it than what has actually been provisioned for it
yet. For example, in Windows we might see a 100GB logical drive but if
it only has 28GB of data stored on it, then the hypervisor or the SAN
may only have actually provisioned 28GB. This enables system
administrators to over-allocate storage but only deploy the amount of
storage required, rather than the amount provisioned, which may never be
used. While this makes commercial sense, there’s always an element of
uncertainty regarding how much of a performance impact thin provisioning
might have on SQL Server when it performs a write. I’m sure the vendors
promote their thin-provisioning technologies in a way that reassures
you this won’t be an issue; but when you consider that the most
performance-sensitive action in SQL Server can be a transaction log
write, you never want to let anything unnecessarily impact the
performance of that write.
- Consider contention on your storage infrastructure’s networking. If
you deploy SQL Server on a dedicated physical server and use SAN
storage, then SQL Server is likely to have almost the entire capacity of
the server’s host bus adapter and fabric switch port dedicated to it.
This helps support large numbers of concurrent reads and writes to the
physical disks SQL Server needs to use. When running on a virtual server
it’s likely that several virtual servers will be using the host
server’s HBA for all their I/O activity, so it’s possible that a virus
scan of a mail server’s file system could use the same HBA that your
instance of SQL Server uses. This sharing of HBAs gets potentially even
worse when blade servers are used as there might only be two HBAs for an
entire chassis of 16 blade servers, and is particularly relevant as
blade servers are a popular choice in medium and large virtualization
environments. The easy solution to this is to ensure that your physical
host server has adequate HBAs installed to support its total peak I/O
workload.
As you can see, you need to consider a few
additional factors when deploying storage for a virtual database server
compared to a physical server, but none of them should be a challenge
for an adequately resourced virtualization environment.
CPU
Configuring a virtual server’s virtual
CPUs, also known as vCPUs, is one of the easier tasks to perform, and
until recently it was the first capacity ceiling users hit when they
sized their virtual servers.
A vCPU is what is presented by the hypervisor to
the virtual server as a “physical” CPU core; for example, Task Manager
in a virtual server with 4 vCPUs would show 4 CPUs. However, although a
vCPU acts and behaves like a physical CPU core, it doesn’t perform like
one. That’s because there is no one-to-one mapping between a vCPU and a
physical CPU core. Instead, a vCPU is a logical thread that is put on a
physical CPU core’s queue when it has something to execute. The more
requests in the queue, the longer it takes for a thread to have its
workload executed.
Additionally, another phenomenon unique to
virtualization appears when a virtual server is configured with multiple
vCPUs. The hypervisor knows that if a physical server has four physical
CPU cores, then the operating system expects to be able to execute an
instruction on all four simultaneously; a very abstract example would be
SQL Server executing a parallel query across four logical CPUs, in
which case SQL Server would expect all four CPUs to be available
concurrently, rather than have to wait for CPUs to become available.
VMware’s relaxed co-scheduling algorithm is clever
enough to know that if only two of a virtual server’s four vCPUs need
to perform work, then only two physical CPU cores need to be found to
execute the virtual server’s work. This can make managing concurrent
virtual server workloads more efficient to manage, as the number of
physical CPU cores needed is often much lower than you would imagine.
There can be issues with this approach though, when a virtual server
that really does need to use a large number of virtual CPUs has to
compete with a large number of smaller virtual severs with a low number
of virtual CPUs. VMware’s scheduling behaviors mean these smaller
virtual machines can jump on and off the CPUs faster, delaying the
ability to schedule the single much larger request for CPU time.
However, other hypervisors will only “run” the
virtual server when enough physical CPU cores in the host server are
free to run each of the vCPUs concurrently. For example, if the virtual
server has four vCPUs assigned and the host server has 16 physical CPU
cores, then the hypervisor would have to wait until four of the 16
physical CPU cores were available before it could process any CPU
instructions from the virtual server. In this example if the host server
was only running that virtual server there shouldn’t be any unnecessary
latency, as with 16 physical cores available there will most likely
always be four available. It’s only when virtual servers start competing
between them for a limited number of physical CPU cores that
performance problems can occur.
As a consequence of this potential for CPU access
latency, a general recommendation is for virtual servers to be
configured to have as few vCPUs as they actually need. That’s not to say
they can’t have a large number if they need them, but the assumption
that the more you have the faster the server will be may not always be
correct.
Be aware that the virtual server itself is totally
unaware of this latency, so it can only be monitored from outside of
the virtual server. High CPU latency will not, for example, incur high SOS_SCHEDULER_YIELD wait stats within SQL Server.
The latest generation of hypervisors now support
large numbers of vCPUs per virtual sever. VMware’s vSphere 5 and
Microsoft’s Hyper-V in Windows Server 2012 both support assigning 32
vCPUs to a virtual server, an incredible amount of computing power but a
capability that is critical to their goal of being able to virtualize
any workload. My only advice is to ensure that you have adequate
physical CPU cores in the host servers to support such a large number of
vCPUs. The physical CPU cores in the host server should not be
overcommitted — that is, try to ensure that no more than two vCPUs are
assigned to running virtual servers per physical CPU core, although this
limit may need to be lower in demanding workload environments.
When considering how many vCPUs to assign to your
virtual database server, I recommend using previously captured
performance data along with an understanding of your SQL Server
workload. If you are running a lot of queries serially, then a large
number of vCPUs may not be required if the underlying clock speed of the
physical CPU cores is high. However, a workload with a higher
proportion of parallel queries needs additional vCPUs. Performance
testing using Performance Monitor and monitoring the SOS_SCHEDULER_YIELD wait stat within SQL Server are good sources of monitoring data to determine whether you have assigned sufficient vCPUs.
Some hypervisors, such as VMware’s vSphere, allow
MHz limits to be placed on vCPUs, as well as a specific number of vCPUs
to be allocated. For example, the virtual server may be allowed to
execute four concurrent CPU threads across four vCPUs, yet each of those
vCPUs may be limited to operating at 1GHz despite the fact that the
clock speed of the physical CPU core it’s using is much higher.
At the other end of the CPU sizing topic, however,
is the issue of assigning too few vCPUs to a virtual server. The only
recommendation I make here is to always assign at least two to any
virtual server running Windows and SQL Server. Much of the code shipped
by Microsoft is multi-threaded today and designed for servers with more
than one logical CPU. Even the difference in the time it takes to
install Windows on a virtual server with two vCPUs rather than one can
be noticeable.
Networking
Networking, or rather adequate network
bandwidth, is often overlooked when considering a virtual server’s
design, probably because it’s only the source of any performance issues
if the server performs lots of data transfer. My only recommendation
here is that if you know that your database server transfers a lot of
data over the network or is sensitive to any network latency, then
ensure that you have sufficient network interface cards in the host
server. In the same way that the host bus adapters needed for SAN
storage can become contended, so can the NICs. Examples of SQL Server
workloads that would warrant reviewing your host server’s NIC sizing
include large ETL imports or exports, or the use of synchronous database
mirroring or AlwaysOn availability groups.
High Availability
Moving away from resource allocation but
still an equally important design consideration is how, if at all, you
will provide high availability for the SQL Server instance.
Before I review the HA services available to SQL
Server 2012 in a virtual environment, I want to reiterate some advice
provided earlier: If possible, ensure that you only ever have a single
HA service monitoring, controlling, and reacting to a failure. Having
just one place to look when the system goes down saves time!
The next pair of high-availability services are
the same as what you would deploy with regular physical servers:
AlwaysOn Failover Clustering Instances and AlwaysOn Availability Groups.
For those considering deploying these on virtual servers running SQL
Server, my recommendations are as follows:
- Configure the “server affinity” settings for your virtual servers
such that the hypervisor ensures that the virtual servers that are part
of your cluster or AlwaysOn availability groups are never run on the
same physical host server at the same time. The idea is to protect
against host server failure, so you want to remove any single points of
failure.
- If you deploy any synchronous database mirroring, then ensure that
you have adequate network bandwidth available on all the host servers on
which you will run virtualized SQL Server instances.
- Likewise, for any servers involved in synchronous mirroring, ensure
that you have adequate free physical CPU resources available on the host
servers so that any latency to which vCPUs are exposed as they wait for
physical CPU time is kept to a minimum.
- Finally, will your virtualization environment have any SAN-level
replication deployed in order to replicate your storage system to
another SAN infrastructure, typically off-site, for disaster recovery
purposes? If so, you should consider whether it is using synchronous or
asynchronous mirroring and what performance and data consistency impact
that may have on SQL Server. It is critical to maintain storage-level
transactional consistency between all the drives a SQL Server database
uses; there is no point to having an updated data file drive at a
storage level at the remote site if the transaction log drive is a few
write transactions behind.
Operating System Enhancements
When an instance of Windows is deployed
on a virtual server, other than ensuring that you have the correct
hardware device drivers, there’s nothing specific to virtualization that
needs to be configured within the operating system other than to make
sure the hypervisor’s tools we discussed earlier are installed.
SQL Server Memory Configuration
Like Windows, SQL Server can also be
installed on a virtual server and will operate quite successfully
without any specific tuning or configuration. This is further proof of
just how well a virtual server can emulate a physical server, assuming
you have the resources, such as CPU and memory, available to SQL Server
that it needs to run optimally for your workload.
However, you may want to consider configuring the
Max Server Memory setting within SQL Server, although you’d probably do
this on a well-tuned physical server as well. In SQL Server 2012 this
setting now places a working limit on the total memory SQL Server uses,
whereas in previous editions this only influenced the size of the buffer
pool.
If you are using VMware, their recommendation is
to set SQL Server’s max server memory value to be based on the size of
the memory reserved for the virtual server. For example, allow 2GB of
memory for the operating system and assign the rest to SQL Server on a
dedicated database server.
If you are deploying your virtual servers in a
Microsoft Hyper-V environment, then the advice is slightly different. If
you are not using Microsoft’s Dynamic Memory feature, then you can be
assured that whatever memory your virtual server appears to have, it
actually does have, so you should configure you max server memory
setting based on that value.
If you are using Dynamic Memory, then you should
set the Startup RAM value in Hyper-V to represent as much memory as SQL
Server will normally need, and set the Maximum RAM value to allow for
any extra memory you think the virtual server might be allocated in a
peak workload situation. In my experience, setting Max Server Memory to
be 2GB lower than the Maximum RAM value set in the Hyper-V configuration
allows SQL Server to increase its memory utilization as more memory is
allocated. Note that this situation requires the Standard or Enterprise
Editions of SQL Server 2012, or the Enterprise Edition of previous
versions.
Common Virtualization Design Mistakes
In my role as a consultant, I regularly
work with many different instances of SQL Server, an increasing number
of which now run on virtual servers. Each virtual environment I see them
run in has its strengths and weaknesses. The following list describes
some common design decisions that have a negative impact on SQL Server:
- Too many vCPUs — As mentioned
earlier, the more vCPUs a virtual server has, the longer the virtual
server potentially has to wait for sufficient underlying physical CPU
cores to become available in order for it to execute a CPU instruction.
I’ve seen virtual servers running fairly light SQL Server workloads that
have 4, 8, or even 16 vCPUs assigned to them “because they could” while
in fact they could have performed comfortably with just two.
- Unmanaged memory configurations —
Sometimes the configuration options in the hypervisor make it seem like
you can assign as much memory as you like, and it will take care of
finding and allocating all the memory required. There’s some truth in
that but you still need to account for all the memory assigned to
virtual servers, even if some of the burst capability you give them is
contended/shared with other virtual servers. In one environment I saw,
when a virtual server running the backup software got busy at night, the
performance of all the other virtual servers dropped severely! It was
difficult explaining how such a simple misconfiguration completely
unrelated to SQL Server was causing this. The solution is to know where
all your critically needed memory will come from during the busiest of
workload periods.
- One big LUN with one big partition —
SQL Server, even when running in a virtual server, benefits from having
multiple uncontended drives for the different types of files it uses. A
default deployment option for some virtualization software is to use a
single large pool of physical drives and create a single large partition
on it; onto that are put all the files for the entire virtualization
environment. This can quickly lead to storage hotspots, workload
contention, and having to adopt the same storage configuration settings
for every virtual server, such as the storage cache policy. Ideally, a
SAN should be configurable so that different controller settings can be
applied to different storage partitions, allowing the storage that SQL
Server will ultimately use to be optimized wherever possible. It also
makes performance monitoring difficult because you can’t always easily
identify a specific virtual server’s workload. As you saw earlier, the
solution here is to distribute a busy database server’s storage across
multiple virtual hard drives, providing they can use different groups of
physical drives.
- Saturated host bus adapters — It is
common to see a host server with only a single HBA running several
I/O-intensive virtual servers. Not only is this a single point of
failure, but the HBA can easily get saturated, causing I/O requests to
be queued and introducing storage latency.