3. Monitoring Memory
Memory, like a processor, is divided into segments
for each process running on the server. If there is a demand for more
memory than is available, the operating system has to use virtual memory
to supplement the physical memory. Virtual memory is storage allocated
on the hard disk; it is named PAGEFILE.SYS under Windows.
It is important to remember that when the operating
system or SQL Server isn’t able to use memory to find something and has
to use virtual memory stored on the disk, performance degrades. You need
to work on minimizing this situation, known as swapping or page faulting.
Monitoring Memory with Performance Monitor
Table 3 reflects the main performance objects and counters that are best utilized to monitor memory for SQL Server.
Table 3. Memory-Related Performance Objects and Counters
Performance Monitor Object | Description |
---|
Process: Working Set|sqlservr | The set of memory pages touched recently by the threads in the process (SQL Server, in this case). |
SQLSERVER:Buffer Manager: Buffer cache hit ratio | The percentage of pages that were found in the buffer pool without having to incur a read from disk. |
SQLSERVER:Buffer Manager: Total Pages | The total number of pages in the buffer pool, including database pages, free pages, and stolen pages. |
SQLSERVER:Memory Manager: Total Server Memory(KB) | The total amount of dynamic memory the server is currently consuming. |
SQLSERVER:Memory Manager: SQL Cache Memory(KB) | The total amount of dynamic memory the SQL Server cache is currently consuming. |
SQLSERVER:Plan Cache: Cache hit ratio | The ratio between lookups and cache hits for query plans in the plan cache. |
SQLSERVER:Buffer Manager: Total Pages | The total number of pages in the plan cache. |
Memory: Pages/sec | The
number of pages read from or written to disk to resolve hard page
faults. This value usually gives a direct indication of memory issues. |
Memory: Pages Read/sec | The number of times the disk was read to resolve hard page faults. |
Memory: Page Faults/sec | The overall rate at which faulted pages are handled by the processor. |
Process: Page Faults/sec|sqlservr | The rate of page faults occurring in the threads associated with a process (SQL Server, in this case). |
To observe the level of the page faulting, you can look at the Memory: Page Faults/sec and Process: Page Faults (for a SQL Server instance) counters. Next in line are the MSSQL Buffer Manager: Buffer Cache hit ratio and MSSQL Buffer Manager: Total Pages
counters. They directly indicate how well SQL Server is finding data in
its controlled memory (cache). You want to achieve a near 90% or higher
ratio here.
If the Memory: Pages/sec counter is greater than 0 or the Memory: Page Reads/sec counter is greater than 5, the operating system is being forced to use the disk to resolve memory references. These are called hard faults. The Memory: Pages/sec
counter is one of the best indicators of the amount of paging that
Windows is doing and the adequacy of SQL Server’s current memory
configuration.
Because the memory used by SQL Server 2008
dynamically grows and shrinks, you might want to track the exact usage
by using either Process: Working Set: SQLServer or MSSQL: Memory Manager: Total Server Memory (KB)
counters. These counters indicate the current size of the memory used
by the SQL Server process. If these numbers are consistently high,
compared to the amount of physical memory in the machine, you are
probably ready to install more memory on the server. If you see a
performance degradation because SQL Server must continually grow and
shrink its memory, you should either remove some of the other services
or processes running or configure SQL Server to use a fixed memory
allocation.
Monitoring Memory in SSMS
If you are collecting performance data into the MDW
using the Data Collector for a SQL Server instance, you can view more
detailed memory usage and memory wait information in the Server Activity
History report . You can click on the Memory Usage graph to drill down and examine the details of the total memory usage in SQL Server
and the cache and page ratios. It also displays a list of all processes
in the server and the total working set size. You can use this
information to see if other running processes may be using memory that
should be left available for SQL Server.
DMVs or System Views for Monitoring Memory
The DMVs associated with memory are numerous. The
ones you’ll most likely utilize are memory clerks, memory pools, and
cache counters.
You can find memory allocations by type by using the sys.dm_os_memory_clerks DMV view:
SELECT type, sum(multi_pages_kb) from sys.dm_os_memory_clerks
WHERE multi_pages_kb <> 0
GROUP BY type
order by 2 desc
To see how the cache is being used, you can query sys.dm_os_memory_cache_counters:
SELECT substring(name,1,25) AS Name, single_pages_kb,
single_pages_in_use_kb
FROM sys.dm_os_memory_cache_counters
order by single_pages_kb desc
Finally, when you want to see the total pages allocated to the different objects in memory, you use the sys.dm_os_memory_objects DMV:
SELECT substring(type,1,25) as Type,
sum(pages_allocated_count) as Total_Memory_Allocated
FROM sys.dm_os_memory_objects
group by type
order by 2 desc
Several new DMVs were introduced in SQL Server 2008
to simplify retrieval of information that can be helpful with memory
troubleshooting. In some cases, the newly introduced DMVs provide
information that was previously available only in DBCC MEMORYSTATUS output. Table 4 provides a summary of the new DMVs available for memory troubleshooting.
Table 4. New Memory-Related DMVs in SQL Server 2008
DMV | Description |
---|
sys.dm_os_memory_brokers | Provides
information about memory allocations using the internal SQL Server
memory manager. The information provided can be useful in determining
very large memory consumers. |
sys.dm_os_memory_nodes and sys.dm_os_memory_node_access_stats | Provide
information about physical nonuniform memory access (NUMA) memory nodes
and node access statistics grouped by the type of the page. (sys.dm_os_memory_node_access_stats is populated only if dynamic trace flag 842 is enabled due to its performance impact.) |
sys.dm_os_nodes | Provides
information about CPU node configuration for SQL Server. This DMV also
reflects software NUMA (soft-NUMA) configuration. |
sys.dm_os_process_memory | Provides
overview information about SQL Server memory usage, including the total
physical memory in use and the memory utilization percentage |
sys.dm_os_sys_memory | Provides
overview information about the system memory usage including total
physical memory on the system and the available physical memory. |
sys.dm_resource_governor_configuration, sys.dm_resource_governor_resource_pools, and sys.dm_resource_governor_workload_groups | Provide
information about the state of the Resource Governor feature of SQL
Server 2008. Some of the configuration parameters of Resource Governor
affect how SQL Server allocates memory; you should check these
parameters during memory troubleshooting. |
4. Monitoring the Disk System
SQL
Server performance depends heavily on the I/O subsystem. SQL Server is
constantly reading pages from and writing pages to disk via the data
cache. It is also constantly writing to disk via the transaction log.
Focusing on the database data files, transaction log files, and
especially tempdb can yield great performance for your SQL Server platform. Table 5 lists the essential performance objects and counters related to monitoring the disk system.
Table 5. Disk Usage-Related Performance Objects and Counters
Performance Monitor Object | Description |
---|
Physical Disk: Current Disk Queue Length | The number of outstanding requests (read/write) for a disk. |
Physical Disk: Avg. Disk Queue Length | The average number of both read and write requests queued for disks. |
Physical Disk: Disk Read Bytes | The rate at which bytes are transferred from the disk during read operations. |
Physical Disk: Disk Write Bytes | The rate at which bytes are transferred to the disk during write operations. |
Physical Disk: % Disk Time | The percentage of elapsed time that the selected disk drive is busy servicing read or write requests. |
Logical Disk: Current Disk Queue Length | The number of outstanding requests (read/write) for a disk. |
Logical Disk: Avg. Disk Queue Length | The average number of both read and write requests queued for disks. |
Logical Disk: Disk Read Bytes | The rate at which bytes are transferred from the disk during read operations. |
Logical Disk: Disk Write Bytes | The rate at which bytes are transferred to the disk during write operations. |
Logical Disk: % Disk Time | The percentage of elapsed time that the selected disk drive is busy servicing read or write requests. |
Slow disk I/O causes a reduction in the transaction
throughput. To identify which disks are receiving all the attention, you
should monitor both the Physical Disk and Logical Disk
performance objects. You have many more opportunities to tune at the
disk level than with other components, such as processors. This has long
been the area where database administrators and system administrators
have been able to get better performance. You can start by looking at
the behavior of the Physical Disk: Current Disk Queue Length and Physical Disk: Avg. Disk Queue Length
counters for all disks or for each particular disk. This way, you can
identify where most of the activity is, from a disk-usage point of view.
As you monitor each individual disk, you might see
that some drives are not as busy as others. You can relocate heavily
used resources to minimize these long queue lengths that you have
uncovered and spread out the disk activity. Common techniques for this
are to relocate indexes away from tables, isolate read-only tables away
from volatile tables, and so on. You need to take special care with tempdb. The best practice is to isolate it away from all other disk I/O processing.
The Physical Disk: % Disk Time
counter for each physical disk drive shows the percentage of time that
the disk is active; a continuously high value could indicate an
underperforming disk subsystem.
Of course, the monitoring up to this point shows only
half the picture if drives are partitioned into multiple logical
drives. To see the work on each logical drive, you need to examine the
logical disk counters; in fact, you can monitor read and write activity
separately with Logical Disk: Disk Write Bytes/sec and Logical Disk: Disk Read Bytes/sec.
You should be looking for average times below 20ms. If the averages are
over 50ms, the disk subsystem is in serious need of replacement,
reconfiguration, or redistribution.
If you use RAID, you need to know how many physical
drives are in each RAID array to figure out the monitored values of disk
queuing for any one disk. In general, you just divide the disk queue
value by the number of physical drives in the disk array. This
calculation gives you a fairly accurate number for each physical disk’s
queue length.
5. Monitoring SQL Server’s Disk Activity
In the preceding section, we looked at monitoring
overall disk activity. In this section, we examine what SQL Server’s
contribution is to all this disk activity. Disk activity can be
categorized into reads and writes. SQL Server carries out writes to the
disk for the following processes:
Logging is a constant occurrence in any database that
allows modifications, and SQL Server attempts to optimize this process
by batching a number of writes together. To see how much work is done on
behalf of the database logs, you can examine the SQLServer:Databases:Log Bytes Flushed and SQLServer:Databases:Log Flushes/sec counters. The first tells you the quantity of the work, and the second tells you the frequency.
The third kind of write occurs to make space within
the page cache. This is carried out by the Lazy Writer process, which
you can track with the counter SQLServer:Buffer Manager:Lazy Writes.
You also can easily monitor the amount of reading SQL Server is doing by using the counter SQLServer:Buffer Manager:Page Reads.
Monitoring Disk Performance in SSMS
If you are collecting performance data into the MDW
using the Data Collector for a SQL Server instance, you can view more
detailed disk performance in the Server Activity History report.
You can click on the Disk Usage graph to drill down and examine the
details on Disk Response Time, Average Disk Queue Length, and Disk Transfer
Rate. In addition, the System Disk Usage detail report also lists the
top 20 processes with the highest I/O writes or highest I/O reads per
second. This information can help you determine which processes besides
SQL Server are performing a large number of disk reads and/or writes
that could be affecting the I/O performance of SQL Server.
DMVs or System Views for Monitoring Disk System Items
There are several I/O-related DMVs and functions.
They cover backup tape I/O, pending I/O requests, I/O on cluster shared
drives, and virtual file I/O statistics.
The best of these is the sys.dm_io_virtual_file_stats
function, which allows you to see the file activity within a database
allocation. You supply the database ID as the first parameter, along
with the file ID of the database file as the second parameter. This
yields an accumulating set of statistics that can be used to isolate and
characterize heavy I/O:
SELECT cast(db_name(database_id) as varchar(12)) as dbname,
file_id,
num_of_reads as numreads,
num_of_bytes_read as bytesread,
num_of_bytes_written as byteswritten,
size_on_disk_bytes as size
FROM sys.dm_io_virtual_file_stats (5,1)
GO
dbname file_id numreads bytesread byteswritten size
------------ ------- -------- --------- ------------ ----------
UnleashedMDW 1 7022 411140096 759291904 1048576000
In addition, the OS wait stats for I/O latch waits
are great for identifying when reading or writing of a page is not
available from the data cache. These latch waits account for the
physical I/O waits when a page is accessed for reading or writing. When
the page is not found in the cache, an asynchronous I/O gets posted. If
there is any delay in the I/O, the PAGEIOLATCH_EX or PAGEIOLATCH_SH
latch waits are affected. An increased number of latch waits indicates
that an I/O bottleneck exists. The following query reveals this latch
wait information:
SELECT substring(wait_type,1,15) AS Latch_Waits, waiting_tasks_count, wait_time_ms
FROM sys.dm_os_wait_stats
WHERE wait_type like 'PAGEIOLATCH%'
ORDER BY wait_type
GO
Latch_Waits waiting_tasks_count wait_time_ms
------------- -------------------- --------------------
PAGEIOLATCH_DT 0 0
PAGEIOLATCH_EX 2871 61356
PAGEIOLATCH_KP 0 0
PAGEIOLATCH_NL 0 0
PAGEIOLATCH_SH 7305 98120
PAGEIOLATCH_UP 1372 7318
6. Monitoring Other SQL Server Performance Items
In
addition to the main system items that affect performance which have
been covered already, there are other items in SQL Server that affect
system performance that you should include in your performance
monitoring approach.
Monitoring Locks
One of the often-overlooked areas of performance
degradation is locking. You need to ensure that the correct types of
locks are issued and that the worst kind of lock, a blocking lock, is
kept to a minimum. A blocking lock, as
its name implies, prevents other users from continuing their own work.
An easy way to identify the level of blocking locks is to use the
counter SQLServer:Memory Manager:Lock Blocks. If this counter frequently indicates a value greater than 0, you need to examine the queries being executed or even revisit the database design.
Monitoring Users
Even though you cannot always trace performance
problems directly to the number of users connected, it is a good idea to
occasionally monitor how this number fluctuates. It is fairly easy to
trace one particular user who is causing a massive performance problem.
The leverage point here is to see the current number of user connections with the SQLServer:General Statistics:User Connections
counter in conjunction with other objects and counters. It is easy to
say that the disk subsystem is a bottleneck, but how many users is SQL
Server supporting at the time?