SQL SErver 2008 R2 : Other Query Processing Strategies

11/18/2011 3:16:29 PM

In addition to the optimization strategies covered so far, SQL Server also has some additional strategies it can apply for special types of queries. These strategies are used to help further reduce the cost of executing various types of queries.

Predicate Transitivity

You might be familiar with the transitive property from algebra. The transitive property simply states that if A=B and B=C, then A=C. SQL Server supports the transitive property in its query predicates. Predicate transitivity enables SQL Server to infer a join equality from two given equalities. Consider the following example:

SELECT *
   FROM table1 t1
   join table2 t2 on t1.column1 = t2.column1
   join table3 t3 on t2.column1 = t3.column1

Using the principle of predicate transitivity, SQL Server is able to infer that t1.column1 is equal to t3.column1. This capability provides the Query Optimizer with another join strategy to consider when optimizing this query. This might result in a much cheaper execution plan.

The transitive property can also be applied to SARGs used on join columns. Consider the following query:

select *
   from sales s
   join stores st on s.stor_id = st.stor_id
   and s.stor_id = 'B199'

Again, using transitive closure, it follows that st.stor_id is also equal to 'B199'. SQL Server recognizes this and can compare the search value against the statistics on both tables to more accurately estimate the number of matching rows from each table.

Group by Optimization

One way SQL Server can process GROUP BY results is to retrieve the matching detailed data rows into a worktable and then sort the rows and calculate the aggregates on the groups formed. In SQL Server 2008, the Query Optimizer also may choose to use hashing to organize the data into groups and then compute the aggregates.

The hash aggregation strategy uses the same basic method for grouping and calculating aggregates as for a hash join. At the point where the probe input row is checked to determine whether it already exists in the hash bucket, the aggregate is computed if a hash match is found. The following pseudocode summarizes the hash aggregation strategy:

create a hash table
for each row in the input table
    read the row
    hash the key value
    search the hash table for matches
    if match found
        aggregate the value into the old record
    else
        insert the hashed key into the hash bucket
scan and output the hash table contents
drop the hash table

For some join queries that contain GROUP BY clauses, SQL Server might perform the grouping operation before processing the join. This could reduce the size of the input table to the join and lower the overall cost of executing the query.

Note

One important point to keep in mind is that regardless of the GROUP BY strategy employed, the rows are not guaranteed to be returned in sorted order by the grouping column(s) as they were in earlier releases. If the results must be returned in a specific sort order, you need to use the ORDER BY clause with GROUP BY to ensure ordered results. You might want to get into the habit of doing this regularly.

Queries with DISTINCT

When the DISTINCT clause is specified in a query, SQL Server can eliminate duplicate rows by the sorting the result set in a worktable to identify and remove the duplicates, similar to how a worktable is used for GROUP BY queries. In SQL Server 2008, the Query Optimizer can also employ a hashing strategy similar to that used for GROUP BY to return only the distinct rows before the final result set is determined.

In addition, if the Query Optimizer can determine at compile time that there will be no possibility of duplicate rows in the result set (for example, each row contains the table’s primary key), the strategies for removing duplicate rows are skipped altogether.

Queries with UNION

When you specify UNION in a query, SQL Server merges the result sets, applying one of the merge or concatenation operators with sorting strategies to remove any duplicate rows. Figure 1 shows an example similar to the OR strategy where the rows are concatenated and then sorted to remove any duplicates.

Figure 1. An execution plan for a UNION query.

If you specify UNION ALL in a query, SQL Server simply appends the result sets together. No intermediate sorting or merge step is needed to remove duplicates. Figure 2 shows the same query as in Figure 1, except that a UNION ALL is specified.

Figure 2. An execution plan for a UNION ALL query.

When you know that you do not need to worry about duplicate rows in a UNION result set, always specify UNION ALL to eliminate the extra overhead required for sorting.

When a UNION is used to merge large result sets together, SQL Server 2008 may opt to use a merge join or hash match operation to remove any duplicate rows. Figure 3 shows an example of a UNION query where the rows are concatenated, and then a hash match operation is used to remove any duplicates.

Figure 3. An execution plan for a UNION query, using a hash match to eliminate duplicate rows.

Other -----------------

- Configuring and Using Active Directory Certificate Services (part 4) - Protecting Your AD CS Configuration

- Configuring and Using Active Directory Certificate Services (part 3) - Considerations for the Use and Management of AD CS & Working with Enterprise PKI

- Configuring and Using Active Directory Certificate Services (part 2) - Finalizing the Configuration of an Online Responder

- Configuring and Using Active Directory Certificate Services (part 1) - Finalizing the Configuration of an Issuing CA

- Windows Server 2003 : The Security Configuration Wizard

- Windows Server 2003 : Understanding Security Considerations

- Microsoft Content Management Server : Moving Postings

- Microsoft Content Management Server : Copying Postings

- Upgrading to Systems Management Server 2003 - Upgrading a Primary Site & Upgrading a Secondary Site

- Exchange Server 2007 : Securing Access to ActiveSync Using Internet Security and Acceleration (ISA) Server 2006