SQL Server 2012 : Running SQL Server in A Virtual Environment - MONITORING VIRTUALIZED DATABASE SERVERS

8/3/2013 11:35:12 AM

Hopefully the previous section has given you sufficient guidance to architecting and deploying your first virtualized database servers, even if only in non-production environments. This section focuses on real-world monitoring of your virtualized database servers, identifying specific changes you may need to make to your monitoring processes and confirming which aspects of your current monitoring can remain unchanged.

Traditionally we have monitored Windows servers and servers running SQL Server with tools such as Performance Monitor. These tools have counters that are designed to expose the true utilization of a server’s hardware and the operating system’s demands on it. For example, we can look at the workload of a server’s CPUs by monitoring the % utilization values shown in Performance Monitor. Likewise, we can see how much memory the server has both used and available by looking at similar counters. These counters were perfect in the physical server world because we knew if Windows booted up and saw 4 logical CPUs and 16GB of memory then all of that resource would be available to the operating system and usually SQL Server as well.

This can cause issues; what does 100% of CPU utilization or 8GB of available memory actually represent in the virtual world? In environments where no restriction, contention, or over-allocation of resources has been configured, some certainty can be found from performance data. In larger, more complex environments, contention ratios or memory allocation might be changing on a minute-by-minute basis.

The example shown in Figure 1 demonstrates how in a Hyper-V environment, the same performance metric monitored in two places can be so different because of an underlying resource limitation in place. VMware provides its own Performance Monitor counters through the VM Memory and VM Processor objects.

FIGURE 1

Information and Misinformation from Performance Monitor

Clearly, the same tools that used to reliably provide us with monitoring information can now be sources of misinformation as you’ll now see.

Performance Monitor is still the most efficient way to monitor a virtual server’s performance. The only caveat is to ensure that you monitor the right counters from the right source.

Some Performance Monitor counters collected from within the virtual server are as valid and useful as they are on a physical server. Logical storage performance data, for example, will help you monitor the virtual server’s I/O workload and enables you to measure what percentage of the host server’s HBA capacity SQL Server is using, as well as ensure that SQL Server’s read and write latencies are acceptable.

While the role of the hypervisor is to make the virtual server believe it is running on dedicated hardware and totally abstract it from the underlying physical hardware, some calls to specific hardware-related APIs are passed by the hypervisor straight through to the physical hardware. An example of this is retrieving technical information about the CPU that an instance of Windows is using. Figure 2 shows an example of information retrieved by Windows running on a Hyper-V virtual server.

FIGURE 2

These hardware query requests are passed straight through to the hardware because it would be difficult for virtualization vendors to know what “artificial” value to pass back today and in the future in order to guarantee compatibility with any applications that check the version of the CPU on which they’re running. This behavior enables you to put the information that Windows or a tool like CPU-Z returns into perspective, particularly as it’s able to find the clock speed of the physical CPU even though the hypervisor might be limiting your access to only a portion of the available clock speed.

SQL Server wait stats are another area to consider when you are determining your sources of information or misinformation. However, even in the physical world, wait stats identify only the symptom of a system issue, not its cause. Therefore, in the virtual world they are still excellent indicators of potential issues hindering SQL Server’s performance, and wait stats are a good source of information, rather than potentially misleading misinformation.

Agent job runtimes are another source of excellent information within SQL Server you can use for performance monitoring. By creating jobs that perform the same tasks with the same volumes of data repeatedly, you can compare the time they took to run today with the time they took to run yesterday.

If, for example, you have a job to back up a database approximately 20GB in size, and for six weeks it took 20 minutes to run but in the last few days it started taking longer, you may have identified a reduction in the host server’s I/O capabilities. This information on its own may not be of significant operational value, but if your SQL Server instance has also started reporting a much greater occurrence of pageiolatch_xx wait stats, you may well want to start looking outside of your virtual server first.

Likewise, if you have a very CPU-intensive SQL Server agent job, then looking for changes in that job’s runtime might also help you detect signs of CPU contention at the host server level.

Detecting Hypervisor Memory Reclamation

Knowing how much memory your virtual server has access to at any point in time is something you should be able find the moment you suspect a previously healthy SQL Server instance has issues. While different hypervisors have different ways to assign memory to virtual servers — based on either demand, static allocations, or host server load — they usually all use a balloon driver to reclaim memory if they ever have to start taking memory back from a virtual server.

Although Windows and SQL Server have hot-add memory features that enable you to add memory to a running instance of SQL Server, they don’t have an opposite feature whereby memory can be taken away from Windows and SQL Server, yet this is sometimes a requirement in the virtual world when memory is to be reclaimed by a balloon driver.

Monitoring this reclamation happening can be tricky because the amount of “physical” memory the virtual server’s operating system thinks it has never decreases. However, when the balloon driver “inflates,” the amount of available memory within Windows begins to drop, and when it falls below an internal threshold SQL Server begins releasing memory to prevent the operating system from running out. Figure 3 shows a before and after representation of the balloon drive inflating.

FIGURE 3

One way to detect changes in the allocation of the memory SQL Server is using is to look for falls in the Page Life Expectacy counter value or increases in the Pages/sec value.

If you’re using Hyper-V, then another way is to query how SQL Server’s memory utilization has potentially changed in response to activity by the Dynamic Memory feature. The following query returns not how much memory SQL Server is currently using but what percentage of your virtual server’s memory it’s using. The reason for monitoring a percentage rather than an absolute value is that the percentage is proportional to the total amount of the virtual server’s memory. If the hypervisor grants the virtual server more memory at any time, you would expect the percentage of memory being used to remain constant even though SQL Server is using more memory. Likewise, if the balloon driver begins to inflate, you would expect the percentage of total memory SQL Server is using to begin to drop. You could use absolute values, but monitoring a percentage takes into account the physical memory size. The way this query works is dependant on Hyper-V’s Dynamic Memory model so it won’t work in VMware based environments.

select (m.total_physical_memory_kb /1024)
as 'Physical Memory (MB)',
convert(decimal(3,1),(convert(decimal(10,1),m.available_physical_memory_kb / 1024)
/ convert(decimal(10,1),m.total_physical_memory_kb / 1024)) * 100)
as 'Available Memory as % of Physical Memory'
from sys.dm_os_sys_memory m,sys.dm_os_sys_info i
 
select convert(decimal(10,1),(convert(decimal(10,1),i.committed_target_kb / 1024)
       /convert(decimal(10,1),m.total_physical_memory_kb / 1024))
       * 100) as 'Committed Target as % of Physical Memory',
       convert(decimal(3,1),(convert(decimal(10,1),i.committed_kb  / 1024)
       /convert(decimal(10,1),m.total_physical_memory_kb / 1024))
       * 100) as 'Total Committed as % of Physical Memory'
From   sys.dm_os_sys_memory m, sys.dm_os_sys_info i

Examples of output from this script run on a Hyper-V virtual server are shown in Figure 4 and Figure 5. The first instance was run just after SQL Server started, the second after a heavy workload when the hypervisor allocated additional memory to the virtual server using Dynamic Memory.

FIGURE 4

FIGURE 5

Recommended Performance Monitor Counters

I recommend monitoring the following three general groups of Performance Monitor counters to track the health of your virtual database server.

On the Host Server

Collecting performance information from the host server your SQL Server instance is running on will help you understand whether SQL Server itself should be performing as you expect it to be. For example, how busy are the host server’s physical CPU cores and how much of the HBA’s capacity is currently being used? It may also be necessary to collect performance data from the host server about a virtual server’s resource utilization. This depends on your hypervisor but with Hyper-V, for example, a virtual server’s true CPU utilization data is exposed at Virtual Processor counters on the host server. Interestingly, VMware does the opposite, exposing performance data about the host server to Performance Monitor in the virtual server.

Monitor the following performance counters on the host server:

Percentage of physical CPU utilization
Percentage of physical memory utilization
Any CPU latency experienced by the hypervisor (VMware calls this CPU Ready Time)
Total amount of memory currently allocated to the host’s virtual servers
Physical storage MB/s and IOPS that the hypervisor is using
Physical storage read and write latency times
Percentage of HBA bandwidth utilization
Percentage of NIC bandwidth utilization

On the Virtual Server

Performance data gathered from within a virtual server is likely to be more meaningful if it refers to logical objects or is measuring the proportion of the virtual server’s resource currently being used. For example, the % CPU utilization values that Task Manager and Performance Monitor display inside a virtual server indicate how much of the processor resource available to the virtual server is currently being used. Unfortunately, those values cannot be related to a specific physical and quantifiable hardware sizing metric — i.e., they don’t indicate whether 100% is a throttled 1GHz of CPU resource or the entire native 3GHz of the physical CPU. These metrics still have value, however. If you look at Task Manager on a virtual server and see that your anti-virus software is using 60% of your CPU resource, that’s an issue whether the server is virtual or physical.

The counters you should monitor here are as follows:

% logical CPU utilization
Total physical memory (MB)
Available memory (MB)
Logical storage MB/s and IOPS that the operating system is using
Logical storage read and write latency times

On the SQL Server Instance

The monitoring you perform within SQL Server when it’s running on a virtual server shouldn’t be very different from your regular SQL Server monitoring. The only difference now is the potential for new or more causes of performance issues to investigate.

The DMV sys.dm_os_sys_info has been changed in SQL Server 2012 and offers some new columns that provide information for virtual environments. First, it confirms that SQL Server recognized that it started on a virtual server. This is shown in the virtual_machine_type and virtual_machine_type_desc columns; and although it may not seem to offer much benefit, it can be useful if you ever forget which of your servers are virtual and which are physical! The other interesting column in this DMV is committed_kb, which now describes the total amount of memory SQL Server is using at the time you query the DMV. This is important to know if you operate in a demand-based memory allocation environment.

The traditional performance counters you probably monitor already are included in the list of SQL Server–related performance metrics that can be used to monitor a virtual server:

SQL Server file I/O latencies from sys.dm_io_virtual_file_stats
Committed and Target Committed Memory values from sys.dm_os_sys_info
Batch requests/sec as a benchmark of server activity levels

Other -----------------

- SQL Server 2012 : Running SQL Server in A Virtual Environment - ARCHITECTING SUCCESSFUL VIRTUAL DATABASE SERVERS

- SQL Server 2012 : Running SQL Server in A Virtual Environment - IDENTIFYING CANDIDATES FOR VIRTUALIZATION

- SQL Server 2012 : Running SQL Server in A Virtual Environment - MANAGING CONTENTION

- Microsoft Content Management Server Development : A Placeholder Control to Store All HTML Tags (part 2)

- Microsoft Content Management Server Development : A Placeholder Control to Store All HTML Tags (part 1)

- Sharepoint 2013 : Create a Team Site, Create an Enterprise Wiki Site in SharePoint Server, Create a Blog Site

- Sharepoint 2013 : Create a Subsite

- SQL server 2008 R2 : Reverting to a Database Snapshot for Recovery

- SQL server 2008 R2 : Setup and Breakdown of a Database Snapshot

- Windows Home Server 2011 : Maintaining Windows Home Server - Checking Free Disk Space on the System Drive