SQL Server 2005 : Performance Testing and Profiling Database Systems

8/11/2011 3:46:45 PM

Verification using unit, functional, and regression tests is extremely important for thoroughly testing an application, but it is performance testing that really gets the attention of most developers. Performance testing is imperative for ensuring a positive user experience. Users don't want to wait any longer than absolutely necessary for data.

Developing high-performance database applications is getting more difficult. As applications grow larger, and especially as disk space continues to get cheaper, the amount of data that must be processed by the typical application has grown to be enormous. Whereas 5 years ago applications with hundreds of gigabytes of data seemed large, today that figure is in the terabytes—and that number will only continue to grow. Jim Gray of Microsoft Research estimates that the cost for a petabyte (that's 1000 terabytes) of storage will be down to $1000 within 15 to 20 years.^[]

^[] Jim Gray, "The Personal Petabyte, the Enterprise Exabyte," http://research.microsoft.com/~Gray/talks/Gray%20IIST%20Personal%20Petabyte%20Enterprise%20Exabyte.ppt (accessed April 12, 2006).

The key to dealing with large amounts of data in a timely manner is to reduce bottlenecks. Unfortunately, problem areas can often be subtle, manifesting themselves only under load—almost always, it seems, when a key user is trying to get some extremely important piece of data. The only way to ensure that this doesn't happen is to exercise the application to the highest level of expected load and determine which areas need improvement.

Capturing Baseline Metrics

Just as with unit and functional testing, having an overall process in place is extremely important when it comes to performance evaluation. Performance tests should be repeatable and should be done in an environment that can be rolled back to duplicate the same conditions for multiple test runs.

Keep in mind that any component in the application may be contributing in some way to performance, so starting from the same point each time is imperative. I recommend using a test database that can be restored to its original state each time, as well as rebooting all servers involved in the test just before beginning a run, in order to make sure that the test starts with the same initial conditions each time. Another option that might be easier than backing up and restoring a test database is using SQL Server 2005's database snapshot feature. Try each technique in your environment to determine which fits better into your testing system.

In addition to making sure the servers are in the same state, you should also collect exactly the same performance counters, query trace data, and other metrics in precisely the same way for each test run. Consistency is the key to not only validating that changes are effective, but also measuring how effective they are.

During a testing process, the first test that is run should be used as a baseline. The metrics captured during the baseline test will be used to compare results for later runs. As problems are solved, or if test conditions change (for instance, if you need to collect more performance counters in a certain area), you should establish a new baseline from which to go forward. Keep in mind that fixing issues in one area of an application might have an impact on performance of another area. For instance, a given query may be I/O-bound, whereas another is CPU-bound. By fixing the I/O problems for the first query, you may introduce greater CPU utilization, which in turn will cause the other query to degrade in performance if they are run simultaneously.

Baselining metrics in a database environment is generally a fairly straightforward process. Server-side traces should be used to capture performance data, including query duration and resources used. This data can then be aggregated in order to determine minimum, maximum, and average statistics for queries. In order to determine which resources are starved, performance counters can be used to track server utilization. As changes are made to fix performance issues, the baseline data can be analyzed against other test data in order to establish performance trends.

Profiling Using Traces and SQL Server Profiler

SQL Server 2005's tracing capabilities allow DBAs and developers to store or view real-time information about activity taking place on the server. With almost 200 events available for monitoring, there is a wealth of information about the state of the server available through traces. However, for most performance monitoring work, there are only a few key events that you'll need to worry about.

When initially baselining an application, I'll generally start by looking at only the SQL:BatchCompleted and RPC:Completed events. Each of these events fires on completion of queries; the only difference between them is that RPC:Completed fires on completion of a remote procedure call (RPC), whereas SQL:BatchCompleted fires on completion of a SQL batch. Different access methods, same end result.

The most valuable columns available for both of these events are Duration, CPU, Reads, and Writes.

The Duration column reports the total time elapsed for the call, in microseconds (note, the Profiler tool shows this column in milliseconds by default). The duration of a query is a direct reflection on the user experience, so this is generally the one to start with. If the application is performing slowly, find the worst offenders using this column.
The CPU column reports the amount of CPU time, in milliseconds, spent parsing, compiling, and executing the query. Due to the fact that this column reports compilation time, it is common to see the reported amount of time drop on consecutive queries thanks to plan caching.
The Reads column reports the number of logical reads performed by the query. A logical I/O occurs any time SQL Server's query engine requests data, whether from the physical disk or from the buffer cache. If you see high numbers in this column, it may not necessarily indicate a performance problem, because the majority of data may be read from cache. However, even reading data from memory does cost the server in terms of CPU time, so it is a good idea to try to keep any kind of reads to a minimum.
The Writes column reports the number of physical writes performed by the query. This means that only writes that were actually persisted to disk during the course of the query will be reported.

By using these basic columns, you can isolate potential candidates for further investigation. First, think about limits that need to be set for any given query in the system. What is the maximum amount of time that a query can be allowed to run? What should the average amount of run time be? By aggregating the Duration column, you can determine whether these times have been exceeded.

Once you've isolated possible problem areas (see the "Granular Analysis " section), you can delve deeper in with more in-depth sets of events. For instance, the Scan:Started event can be used to identify possible queries that are making inefficient use of indexes and therefore may be causing I/O problems. The SP:Recompile event, on the other hand, indicates queries that are getting recompiled by the query optimizer, and may therefore be consuming larger than necessary amounts of CPU time.

Server-Side Traces

It is important to choose your tool wisely when tracing SQL Server during performance tests. The Profiler tool that ships with SQL Server 2005 is extremely useful, and the ease with which it can be used to set up and monitor traces cannot be beat. However, in order to facilitate real-time data collection and display, SQL Server needs to continually stream the data back to the tool—and there is definitely overhead associated with doing so.

In an extremely high-transaction performance test, you should strive to minimize the impact of monitoring on results of the test by using server-side traces instead of the Profiler tool. A server-side trace runs in the background on the SQL Server, saving its results to a local file instead of streaming them to Profiler.

To create a server-side trace, first use Profiler to define the events, columns, and filters needed, and then click the Run button. Immediately stop the trace, click the File menu, expand the Export option, and then expand the Script Trace Definition option. Choose For SQL Server 2005 and select a filename to save the script.

Once the script has been saved, open it for editing in SQL Server Management Studio. The following line of the script must be edited, and a valid path must be specified, including a filename:

exec @rc = sp_trace_create @TraceID output, 0, N'InsertFileNameHere',
    @maxfilesize, NULL

The specified filename should not include an extension of any kind. One will automatically be added by the trace. You might also wish to modify the default maximum file size set at the top of the script, which is 5MB. This will help to minimize the number of rollover files created during the trace. I generally set this to 200MB as a starting point.

Once you have finished editing, the script can be run, and the trace will begin collecting data in the background. The generated script will also select back a trace identifier, which you should make note of so that you can easily control the trace later.

When you are done tracing, use the trace identifier to stop and close the trace, using the following T-SQL (in this case the trace identifier is listed as 99):

EXEC sp_trace_setstatus @traceid=99, @status=0
EXEC sp_trace_setstatus @traceid=99, @status=2

Once the trace is stopped and closed, the fn_trace_gettable function can be used to read the data from the trace file. This function takes two arguments: a full path to the trace file name—including the .trc extension automatically added by SQL Server—and the maximum number of rollover files to read. The following T-SQL would be used to read the trace file from the path C:\Traces\myTrace.trc. The number of rollover files is set high enough that all of the data will be read back, even if the trace rolled over to new files several times:

SELECT *
FROM ::fn_trace_gettable('C:\Traces\myTrace.trc', 999)

Once selected in this way, the trace data can be used just like any other data in SQL Server. It can be inserted into a table, queried, or aggregated in any number of ways in order to evaluate which queries are potentially causing problems.

Evaluating Performance Counters

For a bigger-picture view of the overall performance of a server, performance counters are an invaluable resource. Similar to SQL Server trace events, there are hundreds of counters from which to choose—but only a handful generally need to be monitored when doing an initial performance evaluation of a SQL Server installation.

The following counters are a good starting point for determining what kinds of performance issues to look for (note, the MSSQL$ counter categories include the name of the SQL Server instance, denoted here by <instance_name>):

Processor:% Processor Time reports the total processor time with respect to the available capacity of the server. If this counter is above 70% during peak load periods, it may be worthwhile to begin investigating which routines are making heavy use of CPU time.
PhysicalDisk:Avg. Disk Queue Length indicates whether processes have to wait to use disk resources. As a disk is fulfilling requests (i.e., reading and writing data), requests that cannot be immediately filled are queued. Too many simultaneous requests results in wait times, which can mean query performance problems. It's a good idea to make sure that queue lengths stay below 1 (meaning, effectively, that there is no queue) whenever possible.
PhysicalDisk:Disk Read Bytes/sec and PhysicalDisk:Disk Write Bytes/sec report the number of bytes read from and written to the disk, respectively. These figures are not especially interesting on their own, but coupled with Avg. Disk Queue Length can help to explain problems. Slow SELECT queries coupled with high physical reads and low queue lengths can indicate that the buffer cache is not being effectively utilized. Slow DML queries coupled with high physical writes and high queue lengths are a typical indication of disk contention, and a good sign that you might want to evaluate how to reduce index fragmentation in order to decrease insert and update times.
MSSQL$<instance_name>:Locks:Average Wait Time (ms) reports the average amount of time that queries are waiting on locks. Decreasing lock contention can be quite a challenge, but many shops have discovered that it can be solved in many cases by using either dirty reads (the READ UNCOMMITTEDSNAPSHOT isolation level).
MSSQL$<instance_name>:Buffer Manager:Page life expectancy is the average amount of time, in seconds, that pages remain in buffer cache memory after they are read off of the disk. This counter, coupled with Disk Read Bytes/sec, can help to indicate where disk bottlenecks are occurring—or, it might simply indicate that your server needs more RAM! Either way, values below 300 (i.e., five minutes) may indicate that you have a problem in this area.
MSSQL$<instance_name>:Plan Cache:Cache Hit Ratio and MSSQL$<instance_name>:Plan Cache:Cached Pages are counters that deal with the query plan cache. The Cache Hit Ratio counter is the ratio of cache hits to lookups—in other words, what percentage of issued queries are already in the cache. During a performance run, this number should generally start out low (assuming you've rebooted the SQL Server before starting in order to put it into a consistent state) and go up during the course of the run. Toward the end, you should see this number fairly near to 100, indicating that almost all queries are cached. The Cached Pages counter indicates how many 8KB pages of memory are being used for the procedure cache. A low Cache Hit Ratio combined with a high Cached Pages value means that you need to consider fixing the dynamic SQL being used by the system.

These counters can be read using the System Monitor console (a.k.a. performance monitor or perfmon), although many load testing tools have built-in counter collection and reporting mechanisms. SQL Server Profiler also has the ability to import performance counter logs in order to correlate them with traces. This can be useful for helping to pinpoint the cause of especially large spikes in areas such as CPU time and disk utilization.

Big-Picture Analysis

Once you have set up performance counters and traces, you are ready to begin actual performance testing. But this raises the question, "Where to begin?" Especially in a large legacy application, running an end-to-end performance test can be a daunting task.

A first step is to determine what kinds of unit and functional tests exist, and evaluate whether they can be used as starting points for performance tests. Some load testing tools, such as Microsoft's Visual Studio 2005 Team System, have the ability to directly load test prebuilt unit tests. However, most commercial load tools are designed to exercise applications or web code directly. Try to collect as many tests as possible to cover the most-used parts of the application. Absolute coverage is nice, but is unrealistic in many cases.

The next step is to implement a load testing suite using the prebuilt unit and functional tests. Depending on which load tool you are using, this can take some time. The key is to make sure that the load tool passes random or semirandom inputs into any function that goes back to the database, in order to simulate real traffic and make sure that caching does not play too big a part in skewing numbers. Nonrandom inputs can mask disk I/O issues caused by buffer cache recycling.

Set goals for the test to determine what level of load you need to test at. If you are testing on a system that mirrors the application's production environment, try to test at a load equal to what the application encounters during peak periods. If the test servers are less powerful than the production systems, scale back appropriately. Note that it can be difficult to test against servers that aren't scaled the same as production systems. For instance, if the production database system has eight processors and is attached to a dedicated SAN, and the test database system has four processors and internal drives, there may be an I/O mismatch. In this situation it might be advisable to modify SQL Server's processor affinity on the test system such that less processor power is available, which will make the available processor to available disk I/O ratio fall into line with the actual environment in which code needs to run.

In addition to making sure that the test and production systems are scaled similarly, make sure that the SQL Server configurations in both systems are similar. For example, ensure that the maximum degree of parallelism is set similarly so that processors will be used the same way in queries on both the test and production systems. Likewise, you should monitor the RAM options to ensure that they are configured to equivalent percentages on both systems—so if your production system has 16GB of RAM but SQL Server's maximum server memory setting is 12GB, you'll want to set your test system to use 75 percent of the available RAM as well.

Once user goals are set, load tests should generally be configured to step up load slowly, rather than immediately hit the server with the peak number of users. Stepping up load more closely matches most production environments, in which the server may get rebooted or reset during a maintenance period, then slowly accept user requests and warm up its caches before encountering a larger number of requests during more active times of the day. Note that step testing may not be an accurate figure if you're testing a situation such as a cluster failover, in which a server may be subjected to a full load immediately upon starting up.

The goal of a big-picture test is to see how the system scales overall. Try to look at general trends in the performance counters to determine whether the system can handle load spikes, or generally sustained load over long periods of time (again, depending on actual application usage patterns, if possible). SQL Server traces should be run during these tests in order to capture data that can be used later for more granular analysis of specific components.

Granular Analysis

If the results of a big-picture test show that certain areas need work, a more granular investigation into specific routines will generally be necessary. Using aggregated trace data collected from a full-system test, it's important to evaluate both queries and groups of queries that are long running or resource intensive.

While it is often tempting to look only at the worst offending queries—for instance, those with the maximum duration—this may not tell the complete story. For instance, you may notice that certain stored procedures are taking longer than others to run during peak load. This may translate into longer user interface wait times, but may not indicate the longest user interface wait times due to stored procedures. This is due to the fact many applications call more than one stored procedure every time the interface needs to be updated with additional data. In these cases it is important to group procedures that are called together and aggregate their total resource utilization.

If stored procedures are called sequentially, duration should be totaled in order to determine the total user wait time for that group, and maximum resource utilization should be noted. If, on the other hand, they are called simultaneously (for instance, on different connections), resource utilization should be totaled in order to determine the group's overall impact on the system, and the maximum duration for the group should be noted.

For example, assume that in a given system, whenever a user logs in, three different stored procedures are called to get data for the first screen. Table 1 shows the average data collected for these stored procedures.

Table 1. Stored Procedures Called After Login, with Averaged Data
Stored Procedure	Duration (ms)	CPU	Reads	Writes
LogSessionStart	422	10	140	1
GetSessionData	224	210	3384	0
GetUserInfo	305	166	6408	0

If the system calls these stored procedures sequentially, the total duration that should be recorded for this group is 951 ms. Since each is called individually, total system impact at any given time will only be as much as the maximum values for each of the given columns. So we record 210 for CPU, 6408 for Reads, and 1 for Writes.

On the other hand, if these stored procedures are called simultaneously, the impact will be much different. Total duration will only be as much as the longest running of the three—422 (assuming, of course, that the system has enough resources available to handle all three requests at the same time). However, CPU time during the run should be recorded as 386, Reads as 9932, and Writes as 1.

By grouping stored procedures in this way, the total impact for a given feature can be assessed. It may be the case that individually long-running stored procedures are not the primary performance culprits, and are actually being overshadowed by groups of seemingly less resource-intensive stored procedures. This can also be an issue with cursors that are doing a large number of very small fetch operations. Each individual fetch may fall under the radar, but taken as a whole, it may become clear that the cursor is using a lot of resources.

Another benefit of this kind of grouping is that further aggregation is possible. For instance, given these figures, it is possible to determine how much impact a certain number of users logging in simultaneously would have on the system. That information can be useful when trying to reach specific scalability goals.

Fixing Problems: Is Focusing on the Obvious Issues Enough?

When evaluating the performance of a system and trying to determine where to look to fix problems, it can be tempting to focus on the obvious worst offenders first. However, some care should be taken to make effective use of your time; in many cases what appear to be the obvious problems are actually side effects of other, more subtle issues.

Looking at duration alone is often the easiest mistake to make when analyzing performance issues. Duration tells a major part of the story, but it does not necessarily indicate a performance problem with that stored procedure. It may indicate that the query had to wait for some other query—or queries—to complete, or that the query was competing with other queries for resources. When performance tuning, it is best to be suspicious of long-running queries with very low reported resource utilization. These are often not the real culprits at all.

By using the granular analysis technique and aggregating, it is often possible to find the real offenders more easily. For instance, in one fairly high-transaction system a procedure was getting called from time to time that was writing 10MB of data to the disk. This procedure reported a high duration, which was interpreted as a possible performance problem. Unfortunately, there wasn't much to tune in that stored procedure, but further aggregate analysis revealed another stored procedure in the system that was getting called over 1000 times a minute and writing as much as 50KB to the disk each time it was called. Each call to the second stored procedure reported a small enough duration that it did not appear to be causing performance problems, yet as it turned out it was causing issues in other areas. By tuning it and reducing the amount of data it was writing on each call, the average duration of the first stored procedure was reduced dramatically.