In addition to the optimization strategies covered so
far, SQL Server also has some additional strategies it can apply for
special types of queries. These strategies are used to help further
reduce the cost of executing various types of queries.
Predicate Transitivity
You might be familiar
with the transitive property from algebra. The transitive property
simply states that if A=B and B=C, then A=C. SQL Server supports the
transitive property in its query predicates. Predicate transitivity
enables SQL Server to infer a join equality from two given equalities.
Consider the following example:
SELECT *
FROM table1 t1
join table2 t2 on t1.column1 = t2.column1
join table3 t3 on t2.column1 = t3.column1
Using the principle of predicate transitivity, SQL Server is able to infer that t1.column1 is equal to t3.column1. This capability provides the Query Optimizer with another join strategy to consider when optimizing this query. This might result in a much cheaper execution plan.
The transitive property can also be applied to SARGs used on join columns. Consider the following query:
select *
from sales s
join stores st on s.stor_id = st.stor_id
and s.stor_id = 'B199'
Again, using transitive closure, it follows that st.stor_id is also equal to 'B199'.
SQL Server recognizes this and can compare the search value against the
statistics on both tables to more accurately estimate the number of
matching rows from each table.
Group by Optimization
One way SQL Server can process GROUP BY
results is to retrieve the matching detailed data rows into a worktable
and then sort the rows and calculate the aggregates on the groups
formed. In SQL Server 2008, the Query Optimizer also may choose to use
hashing to organize the data into groups and then compute the
aggregates.
The hash aggregation
strategy uses the same basic method for grouping and calculating
aggregates as for a hash join. At the point where the probe input row is
checked to determine whether it already exists in the hash bucket, the
aggregate is computed if a hash match is found. The following pseudocode
summarizes the hash aggregation strategy:
create a hash table
for each row in the input table
read the row
hash the key value
search the hash table for matches
if match found
aggregate the value into the old record
else
insert the hashed key into the hash bucket
scan and output the hash table contents
drop the hash table
For some join queries that contain GROUP BY
clauses, SQL Server might perform the grouping operation before
processing the join. This could reduce the size of the input table to
the join and lower the overall cost of executing the query.
Note
One important point to keep in mind is that regardless of the GROUP BY
strategy employed, the rows are not guaranteed to be returned in sorted
order by the grouping column(s) as they were in earlier releases. If
the results must be returned in a specific sort order, you need to use
the ORDER BY clause with GROUP BY to ensure ordered results. You might want to get into the habit of doing this regularly.
Queries with DISTINCT
When the DISTINCT
clause is specified in a query, SQL Server can eliminate duplicate rows
by the sorting the result set in a worktable to identify and remove the
duplicates, similar to how a worktable is used for GROUP BY queries. In SQL Server 2008, the Query Optimizer can also employ a hashing strategy similar to that used for GROUP BY to return only the distinct rows before the final result set is determined.
In addition, if the Query
Optimizer can determine at compile time that there will be no
possibility of duplicate rows in the result set (for example, each row
contains the table’s primary key), the strategies for removing duplicate
rows are skipped altogether.
Queries with UNION
When you specify UNION
in a query, SQL Server merges the result sets, applying one of the
merge or concatenation operators with sorting strategies to remove any
duplicate rows. Figure 1 shows an example similar to the OR strategy where the rows are concatenated and then sorted to remove any duplicates.
If you specify UNION ALL
in a query, SQL Server simply appends the result sets together. No
intermediate sorting or merge step is needed to remove duplicates. Figure 2 shows the same query as in Figure 1, except that a UNION ALL is specified.
When you know that you do not need to worry about duplicate rows in a UNION result set, always specify UNION ALL to eliminate the extra overhead required for sorting.
When a UNION is
used to merge large result sets together, SQL Server 2008 may opt to use
a merge join or hash match operation to remove any duplicate rows. Figure 3 shows an example of a UNION query where the rows are concatenated, and then a hash match operation is used to remove any duplicates.