The unique nature of tempdb as a shared
resource for temporary objects makes it more prone to specific
performance problems than other databases. This section describes the
most common issues that tempdb is vulnerable to and how to troubleshoot
or even avoid them.
Latch Contention
Compared to a normal database, tempdb’s
use as a temporary storage area makes the workload pattern likely to
contain a disproportionate amount of the creation and destruction of
many small objects. This type of workload can lead to latch contention
on the pages required to allocate objects in a database.
You can’t influence latching behavior by changing
the isolation level or by using “hints,” as you can with normal locks;
latches are used automatically behind the scenes to protect pages in
memory from being modified by another task while the content or
structure is being changed or read from disk.
Allocation Pages
When you create an object such as a
temporary table in tempdb, it needs to be allocated space in exactly
the same way as creating a table in a normal database. You need to be
aware of three pages in the allocation process: Page Free Space, Global
Allocation Map, and Shared Global Allocation Map, all of which are
covered in the following sections.
PFS (Page Free Space)
The PFS page stores 1 byte of
information for each page, indicating how much free space is on it and
what it’s used for, which means that a single PFS page can store
information about roughly 64MB of pages. Therefore, you’ll find a new
PFS page at close to 64MB intervals throughout a database data file.
The first page on any database data file is
always a PFS page, so it’s easy to spot the page in an error message.
If you see “2:1:1” anywhere, it’s referring to the first page on the
first data file in database_id 2, which is tempdb; “5:3:1” would be the first PFS page in file_id 3 in database_id 5.
GAM (Global Allocation Map)
The GAM page tracks 1 bit per extent
(an extent is eight pages), indicating which extents are in use and
which are empty. SQL Server reads the page to find free space to
allocate a full extent to an object.
Storing only 1 bit for each extent (instead of 1
byte per page like the PFS page) means that a single GAM page can track
a lot more space, and you’ll find a new GAM page at roughly 4GB
intervals in a data file. However, the first GAM page in a data file is
always page number 2, so “2:1:2” would refer to the first GAM page in
tempdb.
SGAM (Shared Global Allocation Map)
The SGAM page (pronounced ess-gam)
also stores 1 bit per extent but the values represent whether the
extent is a mixed extent with free space or a full extent. SQL Server
reads this page to find a mixed extent with free space to allocate
space to a small object.
A single SGAM can track 4GB of pages, so you’ll
find them at 4GB intervals just like GAM pages. The first SGAM page in
a data file is page 3, so “2:1:3” is tempdb’s first SGAM page.
Allocation Page Contention
Imagine that you take an action within
an application that needs to create a temporary table. To determine
where in tempdb to create your table, SQL Server will read the SGAM
page (2:1:3) to find a mixed extent with free space to allocate to the
table.
SQL Server takes out an exclusive latch on the SGAM page while it’s updating the page
and then moves on to read the PFS page to find a free page within the
extent to allocate to the object.
An exclusive latch will also be taken out on the
PFS page to ensure that no one else can allocate the same data page,
which is then released when the update is complete.
This is quite a simple process (but maybe not to
explain) and it works very well until tempdb becomes overloaded with
allocation requests. The threshold can be hard to predict and the next
section describes several things you can do to proactively avoid it.
The issue itself manifests as a PAGELATCH wait, with 2:1:1 or 2:1:3 as the resource description. Figure 1 shows contention on the allocation pages because multiple users are trying to allocate many objects at the same time.
Allocation Page Contention: An Example
All the code in this section uses the Ch8_3TempdbContention.sql code file.
In order to demonstrate page contention I’ve created a couple of stored procedures and a table in an empty database called tempdbdemo.
If you want to step through the example yourself, we have provided all
the necessary steps and scripts in the associated code file.
-- Create stored procedure that creates a temp table, a clustered index and
populates with 10 rows
-- The script expects a database called tempdbdemo to exist
USE [tempdbdemo] ;
GO
CREATE PROCEDURE [dbo].[usp_temp_table]
AS
CREATE TABLE #tmpTable
(
c1 INT,
c2 INT,
c3 CHAR(5000)
) ;
CREATE UNIQUE CLUSTERED INDEX cix_c1 ON #tmptable ( c1 ) ;
DECLARE @i INT = 0 ;
WHILE ( @i < 10 )
BEGIN
INSERT INTO #tmpTable ( c1, c2, c3 )
VALUES ( @i, @i + 100, 'coeo' ) ;
SET @i += 1 ;
END ;
GO
-- Create stored procedure that runs usp_temp_table 50 times
CREATE PROCEDURE [dbo].[usp_loop_temp_table]
AS
SET nocount ON ;
DECLARE @i INT = 0 ;
WHILE ( @i < 100 )
BEGIN
EXEC tempdbdemo.dbo.usp_temp_table ;
SET @i += 1 ;
END ;
The usp_temp_table
stored procedure creates a table in tempdb with three columns and a
unique clustered index on Column 1. The table is then populated with 10
rows. The usp_loop_temp_table stored procedure runs the usp_temp_table procedure 100 times.
To simulate multiple users trying to run the same procedure at the same time, I’m going to use a tool called OStress, which is part of a download called RML Utilities.
For the purpose of the demo I’m just going to use OStress very simply to run the usp_loop_temp_table
procedure using 300 connections. The aim is to simulate 300 people
running a stored procedure that recursively calls another stored
procedure 100 times.
OStress needs to be run from the command prompt:
C:\"Program Files\Microsoft Corporation"\RMLUtils\ostress -Schristianvaio\NTK12 -E
-Q"EXEC demo.dbo.usp_loop_temp_table;" -ooutput.txt -n300
Of course, christianvaio\NTK12 is my SQL Server instance name, so change it to your own if you’re following along.
While OStress is running, take a look at the sys.dm_os_waiting_tasks DMV using the following script, reproduced here with the kind permission of Robert Davis (http://www.sqlsoldier.com/wp/sqlserver/breakingdowntempdbcontentionpart2) :
WITH TASKS
AS (SELECT session_id,
wait_type,
wait_duration_ms,
blocking_session_id,
resource_description,
PageID = Cast(Right(resource_description, Len(resource_description)-
Charindex(':', resource_description, 3)) As Int)
From sys.dm_os_waiting_tasks
Where wait_type Like 'PAGE%LATCH_%'
And resource_description Like '2:%')
SELECT session_id,
wait_type,
wait_duration_ms,
blocking_session_id,
resource_description,
ResourceType = Case
When PageID = 1 Or PageID % 8088 = 0 Then 'Is PFS Page'
When PageID = 2 Or PageID % 511232 = 0 Then 'Is GAM Page'
When PageID = 3 Or (PageID − 1) % 511232 = 0 Then 'Is SGAM Page'
Else 'Is Not PFS, GAM, or SGAM page'
End
From Tasks ;
The script is filtered on all PAGELATCH
waits and shows you for each page whether or not it’s PFS, GAM, or
SGAM. Most of the time when you have contention, it will be on the
first allocation pages but this script is more thorough as it will
detect any of these pages throughout the file.
You should see results similar to those shown in Figure 2.
At the time this snapshot of sys.dm_os_waiting_tasks was taken, 291 tasks (from 300 connections) were waiting for a PAGELATCH, and you can see several examples of 2:1:1 (which is the PFS page), so there is evidence of allocation page contention.