You can think of tempdb as the
“scratch” database for SQL Server; it’s a temporary data store used by
both applications and internal operations. It is very similar to other
databases in that it has a data file and a log file and can be found in
SQL Server Management Studio, but it does have some unique
characteristics that affect how you use and manage it.
The first fact to note is that everyone using an
instance shares the same tempdb; you cannot have any more than one
within an instance of SQL Server but you can get detailed information
about who is doing what in tempdb using DMVs.
The following features and attributes should be considered when learning about, using, tuning, and troubleshooting tempdb:
- Nothing stored in tempdb persists after a restart because tempdb is
recreated every time SQL Server starts. This also has implications for
the recovery of tempdb — namely, it doesn’t need to be done. See the following sidebar.
- Tempdb is always set to “Simple” recovery mode, means that transaction log records for
committed transactions are marked for reuse after every checkpoint.
This means you don’t need to back up the transaction log for tempdb,
and in fact, you can’t back up tempdb at all.
- Tempdb can only have one filegroup (the PRIMARY filegroup); you can’t add more.
- Tempdb is used to store three types of objects: user objects, internal objects, and the version store.
TEMPDB HAS FEWER LOGGING OPERATIONS
When you change a value in a normal
database, both the old value and the new value are stored in the
transaction log. The old value is used in case you need to rollback the
transaction that made the change (undo), and the new value is used to
roll-forward the change during recovery (redo) if it hadn’t made it to
the data file before the restart.
You still need to be able to undo a
change in tempdb but you’ll never need to redo the change as everything
is thrown away on restart. Therefore, tempdb doesn’t store the redo
information, which can result in significant performance gains when
making many changes to big columns compared to a user database.
User Temporary Objects
All the code in this section uses the Ch8_1TempDBTempObjects.sql code file.
To store data temporarily you can use local
temporary tables, global temporary tables, or table variables, all of
which are stored in tempdb (you can’t change where they’re stored). A
local temporary table is defined by giving it a prefix of # and it is
scoped to the session in which you created it. This means no one can
see it; and when you disconnect, or your session is reset with
connection pooling, the table is dropped. The following example creates
a local temporary table, populates it with one row, and then selects
from it:
CREATE TABLE #TempTable ( ID INT, NAME CHAR(3) ) ;
INSERT INTO #TempTable ( ID, NAME )
VALUES ( 1, 'abc' ) ;
GO
SELECT *
FROM #TempTable ;
GO
DROP TABLE #TempTable ;
Global temporary tables can be seen by
all sessions connected to the server and are defined by a prefix of ##.
They are used in exactly the same way as local temporary tables, the
only difference being that everyone can see them. They are not used
very often because if you had a requirement for multiple users to use
the same table, you’re more likely to implement a normal table in a
user database, rather than a global temporary table. Here is exactly
the same code just shown but implemented as a global temporary table:
CREATE TABLE ##TempTable ( ID INT, NAME CHAR(3) ) ;
INSERT INTO ##TempTable ( ID, NAME )
VALUES ( 1, 'abc' ) ;
GO
SELECT *
FROM ##TempTable ;
GO
DROP TABLE ##TempTable ;
As you can see, the only difference is
the prefix; both local temporary tables and global temporary tables are
dropped when the session that created them is closed. This means it is
not possible to create a global temporary table in one session, close
the session, and then use it in another.
A table variable is used similarly to a local
temporary table. The differences are explored in the next section. Here
is the same sample again, this time implemented as a table variable:
DECLARE @TempTable TABLE ( ID INT, NAME CHAR(3) ) ;
INSERT INTO @TempTable ( ID, NAME )
VALUES ( 1, 'abc' ) ;
SELECT *
FROM @TempTable ;
The syntax for declaring a table
variable is slightly different from a temporary table; but a more
important difference is that table variables are scoped to the batch,
rather than the session. If you kept the GO batch delimiter as in the previous examples, then an “object does not exist” error would be raised for the last SELECT statement because the table variable would not exist in the scope of the statement.
Temp Tables vs. Table Variables
All the code in this section uses the Ch8_2TempTableAndTVStats.sql code file.
Having touched on the concept and scope of
temporary tables and table variables in the previous section, the
mechanism used to store temporary results usually boils down to the
differences in features between a temporary table (#table) and a table
variable.
Statistics
The major difference between temp
tables and table variables is that statistics are not created on table
variables. This has two major consequences, the first of which is that
the Query Optimizer uses a fixed estimation for the number of rows in a
table variable irrespective of the data it contains. Moreover, adding
or removing data doesn’t change the estimation.
To illustrate this, executing the code below and
looking at the properties of the table scan in the actual execution
plan will give you the properties shown in Figure 1.
To understand the example you need to first understand the Query
Optimizer, statistics, and execution plans.
DECLARE @TableVar TABLE ( c1 INT ) ;
INSERT INTO @TableVar
SELECT TOP 1000000 row_number( ) OVER ( ORDER BY t1.number ) AS N
FROM master..spt_values t1
CROSS JOIN master..spt_values t2 ;
SELECT COUNT(*)
FROM @TableVar ;
Note that the Query Optimizer based the plan on
an estimation of one row being returned, whereas 1 million rows were
actually returned when it was executed. Regardless of the number of
rows in the table variable, the Query Optimizer will always estimate
one row because it has no reliable statistics with which to generate a
better estimation, and this could cause a bad execution plan to be used.
You can do the same test but with a temporary table instead by executing this code:
CREATE TABLE #TempTable ( c1 INT ) ;
INSERT INTO #TempTable
SELECT TOP 1000000 row_number( ) OVER ( ORDER BY t1.number ) AS N
FROM master..spt_values t1
CROSS JOIN master..spt_values t2 ;
SELECT COUNT(*)
FROM #TempTable ;
The properties for the table scan in this scenario are shown in Figure 2, which indicates an accurate row estimate of 1000000.
Indexes
You can’t create indexes on table
variables although you can create constraints. This means that by
creating primary keys or unique constraints, you can have indexes (as
these are created to support constraints) on table variables.
Even if you have constraints, and therefore
indexes that will have statistics, the indexes will not be used when
the query is compiled because they won’t exist at compile time, nor
will they cause recompilations.
Schema Modifications
Schema modifications are possible on
temporary tables but not on table variables. Although schema
modifications are possible on temporary tables, avoid using them
because they cause recompilations of statements that use the tables.
Table 1 provides a brief summary of the differences between temporary tables and table variables.
TABLE 1: Temporary Tables versus Table Variables
|
TEMPORARY TABLES |
TABLE VARIABLES |
Statistics |
Yes |
No |
Indexes |
Yes |
Only with constraints |
Schema modifications |
Yes |
No |
Available in child routines including sp_executesql |
Yes |
No |
Use with INSERT INTO . . . EXEC |
Yes |
No |
In memory structures |
No |
No |
TABLE VARIABLES ARE NOT CREATED IN MEMORY
There is a common misconception that
table variables are in-memory structures and as such will perform
quicker than temporary tables. Thanks to a DMV called sys.dm_db_session_space_usage,
which shows tempdb usage by session, you can prove that’s not the case.
After restarting SQL Server to clear the DMV, run the following script
to confirm that your session_id returns 0 for user_objects_alloc_page_count:
SELECT session_id,
database_id,
user_objects_alloc_page_count
FROM sys.dm_db_session_space_usage
WHERE session_id > 50 ;
Now you can check how much space a
temporary table uses by running the following script to create a
temporary table with one column and populate it with one row:
CREATE TABLE #TempTable ( ID INT ) ;
INSERT INTO #TempTable ( ID )
VALUES ( 1 ) ;
GO
SELECT session_id,
database_id,
user_objects_alloc_page_count
FROM sys.dm_db_session_space_usage
WHERE session_id > 50 ;
The results on my server (shown in Figure 3) indicate that the table was allocated one page in tempdb.
Now run the same script but use a table variable this time:
DECLARE @TempTable TABLE ( ID INT ) ;
INSERT INTO @TempTable ( ID )
VALUES ( 1 ) ;
GO
SELECT session_id,
database_id,
user_objects_alloc_page_count
FROM sys.dm_db_session_space_usage
WHERE session_id > 50 ;
As shown in Figure 4, using the table variable caused another page to be allocated in tempdb, so table variables are not created in memory.
Table variables and temporary tables
are both likely to be cached, however, so in reality, unless your
server is memory constrained and you’re using particularly large
tables, you’ll be working with them in memory anyway.
Whether or not you use temporary tables or table
variables should be decided by thorough testing, but it’s best to lean
towards temporary tables as the default because there are far fewer
things that can go wrong.
I’ve seen customers develop code using table
variables because they were dealing with a small amount of rows, and it
was quicker than a temporary table, but a few years later there were
hundreds of thousands of rows in the table variable and performance was
terrible, so try and allow for some capacity planning when you make
your decision!