SQL Server 2008 R2 : Common Query Optimization Problems

11/24/2011 5:27:27 PM

Out-of-Date or Insufficient Statistics

Admittedly, having out-of-date or unavailable statistics is not as big a problem as it was in SQL Server releases prior to 7.0. Back in those days, the first question asked when someone was complaining of poor performance was, “When did you last update statistics?” If the answer was “Huh?” we usually found the culprit.

With the Auto-Update Statistics and Auto-Create Statistics features in SQL Server 2008, this problem is not as prevalent as it used to be. If a query detects that statistics are out of date or missing, it causes them to be updated or created and then optimizes the query plan based on the new statistics.

Note

If statistics are missing or out of date, the first running query that detects this condition might run a bit more slowly as it updates or creates the statistics first, especially if the table is relatively large, and also if it has been configured for FULLSCAN when indexes are updated.

However, SQL Server 2008 provides the AUTO_UPDATE_STATISTICS_ASYNC database option. When this option is set to ON, queries do not wait for the statistics to be updated before compiling. Instead, the out-of-date statistics are put on a queue for updating by a worker thread in a background process, and the query and any other concurrent queries compile immediately, using the existing out-of-date statistics. Although there is no delay for updated statistics, the out-of-date statistics may cause the Query Optimizer to choose a less efficient query plan, but the response times are more predictable. Any queries invoked after the updated statistics are ready will use the updated statistics in generating a query plan. This may cause the recompilation of any cached plans that depend on the older statistics.

You should consider setting the AUTO_UPDATE_STATISTICS_ASYNC option to ON when any of your applications have experienced client request timeouts caused by queries waiting for updated statistics or when it is acceptable for your application to run queries with less efficient query plans due to outdated statistics so that you can maintain predictable query response times.

You could have insufficient statistics to properly optimize a query if the sample size used when the statistics were generated wasn’t large enough. Depending on the nature of your data and size of the table, the statistics might not accurately reflect the actual data distribution and cardinality. If you suspect that this is the case, you can update statistics by specifying the FULLSCAN option or a larger sample size, so SQL Server examines more records to derive the statistics.

Poor Index Design

Poor index design is another reason—often a primary reason—why queries might not optimize as you expect them to. If no supporting indexes exist for a query, or if a query contains SARGs that cannot be optimized effectively to use the available indexes, SQL Server ends up performing either a table scan, an index scan, or another hash or merge join strategy that is less efficient. If this appears to be the problem, you need to reevaluate your indexing decisions or rewrite the query so it can take advantage of an available index.

Search Argument Problems

It’s the curse of SQL that there are a number of ways to write a query and get the same result set. Some queries, however, might not be as efficient as others. A good understanding of the Query Optimizer can help you avoid writing search arguments that SQL Server can’t optimize effectively. The following sections highlight some of the common “gotchas” encountered in SQL Server SARGs that can lead to poor or unexpected query performance.

Using Optimizable SARGs

The search argument is in the form of a WHERE clause that equates a column to a constant. The SARGs that optimize most effectively are those that compare a column with a constant value that is not an expression or a variable, and with no operation performed against the column itself. The following is an example:

SELECT column1
   FROM table1
   WHERE column1 = 123

You should try to avoid using any negative logic in your SARGs (for example, !=, <>, not in) or performing operations on, or applying functions to, the columns in the SARG.

No SARGs

You need to watch out for queries in which the SARG might have been left out inadvertently, such as this:

select title_id from titles

A SQL query with no search argument (that is, no WHERE clause) always performs a table or clustered index scan unless a nonclustered index can be used to cover the query. If you don’t want the query to affect the entire table, you need to be sure to specify a valid SARG that matches an index on the table to avoid table scans.

Unknown Values in WHERE Clauses

You need to watch out for expressions in which the search value in the SARG cannot be evaluated until runtime. In these expressions, often the search value is a local variable or subquery that can be materialized to a single value.

SQL Server treats these expressions as SARGs but can’t use the statistics histogram to estimate the number of matching rows because it doesn’t have a value to compare against the histogram values during query optimization. The values for the expressions aren’t known until the query is actually executed. In this situation, the Query Optimizer uses the index density information. The Query Optimizer is generally able to better estimate the number of rows affected by a query when it can compare a known value against the statistics histogram than when it has to use the index density to estimate the average number of rows that match an unknown value. This is especially true if the data in a table isn’t distributed evenly. When you can, you should try to avoid using constant expressions that can’t be evaluated until runtime so that the statistics histogram can be used rather than the density value.

To avoid using constant expressions in WHERE clauses that can’t be evaluated until runtime, you should consider putting the queries into stored procedures and passing in the constant expression as a parameter. Because the Query Optimizer evaluates the value of a parameter prior to optimization, SQL Server evaluates the expression prior to optimizing the stored procedure.

For best results when writing queries inside stored procedures, you should use stored procedure parameters rather than local variables in your SARGs whenever possible. This strategy allows the Query Optimizer to optimize the query by using the statistics histogram, comparing the parameter value against the statistics histogram to estimate the number of matching rows. If you use local variables as SARGs in stored procedures, the Query Optimizer is restricted to using index density, even if the local variable is assigned the value of a parameter.

Other types of constructs for which it is difficult for the Query Optimizer to accurately estimate the number of qualifying rows or the data distribution using the statistics histogram include aggregations in subqueries, scalar expressions, user-defined functions, and noninline table-valued functions.

Data Type Mismatches

Another common problem is data type mismatches. If you attempt to join tables on columns of different data types, the Query Optimizer might not be able to effectively use indexes to evaluate the join. This can result in a less efficient join strategy because SQL Server has to convert all values first before it can process the query. You should avoid this situation by maintaining data type consistency across the join key columns in your database.

Large Complex Queries

For complex queries with a large number of tables and join conditions, the number of possible execution plans can be enormous. The full optimization phase of the Query Optimizer has a time limit to restrict how long it spends analyzing all the possible query plans. There is no known general and effective shortcut to arrive at the optimal plan. To deal with such a large selection of plans, SQL Server 2008 implements a number of heuristics to deal with very large queries and attempt to come up with an efficient query plan within the time available. When it is not possible to analyze the entire set of plan alternatives and the heuristics are applied, it is not uncommon to encounter suboptimal query plans being chosen.

When is your query large enough to be a concern? Answering this question is difficult because the answer depends on the number of tables involved, the form of filter and join predicates, and the operations performed. If a query involves more than 12 tables, it is likely that the Query Optimizer is having to rely on heuristics and shortcuts to generate a query plan and may miss some optimal strategies.

In general, you get more optimal query plans if you can simplify your queries as much as possible.

Triggers

If you are using triggers on INSERT, UPDATE, or DELETE, it is possible that your triggers can cause performance problems. You might think that INSERT, UPDATE, or DELETE is performing poorly when actually it is the trigger that needs to be tuned. In addition, you might have triggers that fire other triggers. If you suspect that you are having performance problems with the triggers, you can monitor the SQL they are executing and the response time, as well as execution plans generated for statements within triggers using SQL Server Profiler.

Other -----------------

- System Center Configuration Manager 2007 : Creating Queries (part 3) - Criterion Type, Operators, and Values

- System Center Configuration Manager 2007 : Creating Queries (part 2) - ConfigMgr Query Builder

- System Center Configuration Manager 2007 : Creating Queries (part 1) - Objects, Classes, and Attributes

- System Center Configuration Manager 2007 : Configuration Manager Queries - Viewing Queries and Query Results

- Microsoft Systems Management Server 2003 : Disaster Recovery - Scheduling Maintenance Tasks

- Microsoft Systems Management Server 2003 : Disaster Recovery - Database Maintenance

- Windows Server 2008 Server Core : Performing a Formatted Printout with Notepad

- Windows Server 2008 Server Core : Obtaining General System Information with the MSInfo32 Utility

- SQL SErver 2008 R2 : Parallel Query Processing

- SQL SErver 2008 R2 : Other Query Processing Strategies