Out-of-Date or Insufficient Statistics
Admittedly, having
out-of-date or unavailable statistics is not as big a problem as it was
in SQL Server releases prior to 7.0. Back in those days, the first
question asked when someone was complaining of poor performance was,
“When did you last update statistics?” If the answer was “Huh?” we
usually found the culprit.
With the
Auto-Update Statistics and Auto-Create Statistics features in SQL Server
2008, this problem is not as prevalent as it used to be. If a query
detects that statistics are out of date or missing, it causes them to be
updated or created and then optimizes the query plan based on the new
statistics.
Note
If
statistics are missing or out of date, the first running query that
detects this condition might run a bit more slowly as it updates or
creates the statistics first, especially if the table is relatively
large, and also if it has been configured for FULLSCAN when indexes are updated.
However, SQL Server 2008 provides the AUTO_UPDATE_STATISTICS_ASYNC database option. When this option is set to ON,
queries do not wait for the statistics to be updated before compiling.
Instead, the out-of-date statistics are put on a queue for updating by a
worker thread in a background process, and the query and any other
concurrent queries compile immediately, using the existing out-of-date
statistics. Although there is no delay for updated statistics, the
out-of-date statistics may cause the Query Optimizer to choose a less
efficient query plan, but the response times are more predictable. Any
queries invoked after the updated statistics are ready will use the
updated statistics in generating a query plan. This may cause the
recompilation of any cached plans that depend on the older statistics.
You should consider setting the AUTO_UPDATE_STATISTICS_ASYNC option to ON
when any of your applications have experienced client request timeouts
caused by queries waiting for updated statistics or when it is
acceptable for your application to run queries with less efficient query
plans due to outdated statistics so that you can maintain predictable
query response times.
You could have
insufficient statistics to properly optimize a query if the sample size
used when the statistics were generated wasn’t large enough. Depending
on the nature of your data and size of the table, the statistics might
not accurately reflect the actual data distribution and cardinality. If
you suspect that this is the case, you can update statistics by
specifying the FULLSCAN option or a larger sample size, so SQL Server examines more records to derive the statistics.
Poor Index Design
Poor index design is another
reason—often a primary reason—why queries might not optimize as you
expect them to. If no supporting indexes exist for a query, or if a
query contains SARGs that cannot be optimized effectively to use the
available indexes, SQL Server ends up performing either a table scan, an
index scan, or another hash or merge join strategy that is less
efficient. If this appears to be the problem, you need to reevaluate
your indexing decisions or rewrite the query so it can take advantage of
an available index.
Search Argument Problems
It’s
the curse of SQL that there are a number of ways to write a query and
get the same result set. Some queries, however, might not be as
efficient as others. A good understanding of the Query Optimizer can
help you avoid writing search arguments that SQL Server can’t optimize
effectively. The following sections highlight some of the common
“gotchas” encountered in SQL Server SARGs that can lead to poor or
unexpected query performance.
Using Optimizable SARGs
The search argument is in the form of a WHERE
clause that equates a column to a constant. The SARGs that optimize
most effectively are those that compare a column with a constant value
that is not an expression or a variable, and with no operation performed
against the column itself. The following is an example:
SELECT column1
FROM table1
WHERE column1 = 123
You should try to avoid using any negative logic in your SARGs (for example, !=, <>, not in) or performing operations on, or applying functions to, the columns in the SARG.
No SARGs
You need to watch out for queries in which the SARG might have been left out inadvertently, such as this:
select title_id from titles
A SQL query with no search argument (that is, no WHERE clause) always performs a table or clustered index scan unless a nonclustered index can be used to cover the query. If you don’t want the query to
affect the entire table, you need to be sure to specify a valid SARG
that matches an index on the table to avoid table scans.
Unknown Values in WHERE Clauses
You need to watch out for
expressions in which the search value in the SARG cannot be evaluated
until runtime. In these expressions, often the search value is a local
variable or subquery that can be materialized to a single value.
SQL Server treats
these expressions as SARGs but can’t use the statistics histogram to
estimate the number of matching rows because it doesn’t have a value to
compare against the histogram values during query optimization. The
values for the expressions aren’t known until the query is actually
executed. In this situation, the Query Optimizer uses the index density
information. The Query Optimizer is generally able to better estimate
the number of rows affected by a query when it can compare a known value
against the statistics histogram
than when it has to use the index density to estimate the average
number of rows that match an unknown value. This is especially true if
the data in a table isn’t distributed evenly. When you can, you should
try to avoid using constant expressions that can’t be evaluated until
runtime so that the statistics histogram can be used rather than the
density value.
To avoid using constant expressions in WHERE
clauses that can’t be evaluated until runtime, you should consider
putting the queries into stored procedures and passing in the constant
expression as a parameter. Because the Query Optimizer evaluates the
value of a parameter prior to optimization, SQL Server evaluates the
expression prior to optimizing the stored procedure.
For best results when
writing queries inside stored procedures, you should use stored
procedure parameters rather than local variables in your SARGs whenever
possible. This strategy allows the Query Optimizer to optimize the query
by using the statistics histogram, comparing the parameter value
against the statistics histogram to estimate the number of matching
rows. If you use local variables as SARGs in stored procedures, the
Query Optimizer is restricted to using index density, even if the local
variable is assigned the value of a parameter.
Other types of
constructs for which it is difficult for the Query Optimizer to
accurately estimate the number of qualifying rows or the data
distribution using the statistics histogram include aggregations in
subqueries, scalar expressions, user-defined functions, and noninline
table-valued functions.
Data Type Mismatches
Another common problem is
data type mismatches. If you attempt to join tables on columns of
different data types, the Query Optimizer might not be able to
effectively use indexes to evaluate the join. This can result in a less
efficient join strategy because SQL Server has to convert all values
first before it can process the query. You should avoid this situation
by maintaining data type consistency across the join key columns in your
database.
Large Complex Queries
For complex queries with a
large number of tables and join conditions, the number of possible
execution plans can be enormous. The full optimization phase of the
Query Optimizer has a time limit to restrict how long it spends
analyzing all the possible query plans. There is no known general and
effective shortcut to arrive at the optimal plan. To deal with such a
large selection of plans, SQL Server 2008 implements a number of
heuristics to deal with very large queries and attempt to come up with
an efficient query plan within the time available. When it is not
possible to analyze the entire set of plan alternatives and the
heuristics are applied, it is not uncommon to encounter suboptimal query
plans being chosen.
When
is your query large enough to be a concern? Answering this question is
difficult because the answer depends on the number of tables involved,
the form of filter and join predicates, and the operations performed. If
a query involves more than 12 tables, it is likely that the Query
Optimizer is having to rely on heuristics and shortcuts to generate a
query plan and may miss some optimal strategies.
In general, you get more optimal query plans if you can simplify your queries as much as possible.
Triggers
If you are using triggers on INSERT, UPDATE, or DELETE, it is possible that your triggers can cause performance problems. You might think that INSERT, UPDATE, or DELETE
is performing poorly when actually it is the trigger that needs to be
tuned. In addition, you might have triggers that fire other triggers. If
you suspect that you are having performance problems with the triggers,
you can monitor the SQL they are executing and the response time, as
well as execution plans generated for statements within triggers using
SQL Server Profiler.