2. SQL Server XQuery in Action
SQL Server has a standards-based implementation of XQuery that directly supports XQuery functions on the xml data type by using five methods of the xml data type, as shown here:
xml.exist Uses XQuery input to return 0, 1, or NULL, depending on the result of the query. This method returns 0 if no elements match, 1 if there is a match, and NULL if there is no XML data on which to query. The xml.exist method is often used for query predicates.
xml.value Accepts an XQuery query that resolves to a single value as input and returns a SQL Server scalar type.
xml.query Accepts an XQuery query that resolves to multiple values as input and returns an xml data type stream as output.
xml.nodes
Accepts an XQuery query as input and returns a single-column rowset
from the XML document. In essence, this method shreds XML into multiple
smaller XML results.
xml.modify Allows you to insert, delete, or modify nodes or sequences of nodes in an xml data type instance using an XQuery data manipulation language (DML).
We
will discuss all of these methods shortly. But first, you’ll create
some sample data in a simple table that contains speakers at a software
developer conference and the corresponding classes they will teach.
Traditionally, you would normalize such data and have a one-to-many
relationship between a speakers table and a classes table. Taking an XML
approach instead, you will model this as one table with the speakers’
information and one XML column with the speakers’ classes. In the real
world, you might encounter this scenario when you have a speaker and his
or her classes represented in a series of one-to-many tables in a
back-office database. Then for the web database, you might “publish” a
database on a frequent time interval (such as a reporting database) or
transform normalized data and use the XML column for easy HTML display
with extensible stylesheet transformation (XSLT).
First, create a schema for the XML data, as shown in Example 2.
The schema defines the data types and required properties for
particular XML elements in the list of classes that will be maintained
for each speaker.
Example 2. Creating an XML schema definition for speaker classes.
USE master
GO
IF EXISTS(SELECT name FROM sys.databases WHERE name = 'SampleDB')
DROP DATABASE SampleDB
GO
CREATE DATABASE SampleDB
GO
USE SampleDB
GO
CREATE XML SCHEMA COLLECTION ClassesXSD AS '
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="class">
<xs:complexType>
<xs:attribute name="name" type="xs:string" use="required" />
</xs:complexType>
</xs:element>
<xs:element name="classes">
<xs:complexType>
<xs:sequence>
<xs:element ref="class" minOccurs="1" maxOccurs="unbounded" />
</xs:sequence>
<xs:attribute name="speakerBio" type="xs:string" use="required" />
</xs:complexType>
</xs:element>
</xs:schema>'
Next, create the Speaker table (and indexes), as shown in Example 3. Notice that the xml column, ClassesXML, uses the ClassesXSD XSD schema we just created in Example 2.
Example 3. Creating the Speaker table with the typed (XSD schema-based) indexed XML column ClassesXML.
CREATE TABLE Speaker(
SpeakerId int IDENTITY PRIMARY KEY,
SpeakerName varchar(50),
Country varchar(25),
ClassesXML xml(ClassesXSD) NOT NULL)
-- Create primary XML index
CREATE PRIMARY XML INDEX ix_speakers
ON Speaker(ClassesXML)
-- Create secondary structural (path) XML index
CREATE XML INDEX ix_speakers_path ON Speaker(ClassesXML)
USING XML INDEX ix_speakers FOR PATH
XQuery runs more efficiently when there is an XML
index on the XML column. As you learned earlier, an XML index works
only if there is a primary key constraint on the table (such as the SpeakerId primary key column in the Speaker table). The code in Example 3 creates a primary and then a structural (PATH) index because our examples will apply a lot of where restrictions on the values of particular elements. It’s also important to remember that XQuery
works more efficiently if it is strongly typed, so you should always
use a schema (XSD) on your XML column for the best performance. Without a
schema, the SQL Server XQuery engine assumes that everything is untyped
and simply treats it as string data.
You’re now ready to get data into the table by using some INSERT statements, as shown in Example 4. The final INSERT statement, ‘Bad Speaker’, will fail because it does not contain a <classes> element as required by the ClassesXSD schema. (Because XML is case sensitive, its <CLASSES> element is not a match for the <classes> element specified as required in the schema.)
Example 4. Populating the Speaker table with sample data.
INSERT INTO Speaker VALUES('Jeff Hay', 'USA', '
<classes speakerBio="Jeff has solid security experience from years of hacking">
<class name="Writing Secure Code for ASP .NET" />
<class name="Using XQuery to Manipulate XML Data in SQL Server 2012" />
<class name="SQL Server and Oracle Working Together" />
<class name="Protecting against SQL Injection Attacks" />
</classes>')
INSERT INTO Speaker VALUES('Holly Holt', 'Canada', '
<classes speakerBio="Holly is a Canadian-born database professional">
<class name="SQL Server Profiler" />
<class name="Advanced SQL Querying Techniques" />
<class name="SQL Server and Oracle Working Together" />
</classes>')
INSERT INTO Speaker VALUES('Ted Bremer', 'USA', '
<classes speakerBio="Ted specializes in client development">
<class name="Smart Client Stuff" />
<class name="More Smart Client Stuff" />
</classes>')
INSERT INTO Speaker VALUES('Bad Speaker', 'France', '
<CLASSES SPEAKERBIO="Jean has case-sensitivity issues">
<class name="SQL Server Index" />
<class name="SQL Precon" />
</CLASSES>')
Now that you have some data,
it’s time to start writing some XQuery expressions in T-SQL. To do this,
you will use the query-based methods of the xml data type inside a regular T-SQL query.
Having XML in the database is almost useless unless you can query the elements and attributes of the XML data natively. XQuery becomes very useful when you use it to search XML based on the values of a particular element or attribute. The xml.exist method accepts an XQuery query as input and returns 0, 1, or NULL, depending on the result of the query: 0 is returned if no elements match, 1 is returned if there is a match, and NULL is returned if there is no data to query on. For example, Example 5 shows how to test whether a particular node exists within an XML document.
Example 5. A simple xml.exist example.
DECLARE @SomeData xml = '
<classes>
<class name="SQL Server Index"/>
<class name="SQL Precon"/>
</classes>'
SELECT
@SomeData.exist('/classes') AS HasClasses,
@SomeData.exist('/dogs') AS HasDogs
This query produces the following output:
HasClasses HasDogs
---------- -------
1 0
You will most likely use the return value of xml.exist (0, 1, or NULL) as part of a WHERE
clause. This lets you run a T-SQL query and restrict the query on a
value of a particular XML element. For example, here is an XQuery
expression that finds every <class> element beneath <classes> with a name attribute containing the phrase “SQL Server”:
/classes/class/@name[contains(., "SQL Server ")]
Example 6 shows how you put this expression to work.
Example 6. Using xml.exist to test for an attribute value.
SELECT * FROM Speaker
WHERE
ClassesXML.exist('/classes/class/@name[contains(., "SQL Server")]') = 1
The results look like this:
SpeakerId SpeakerName Country ClassesXML
--------- ----------- ------- ----------
1 Jeff Hay USA <classes speakerBio="Jeff has solid security...
2 Holly Holt Canada <classes speakerBio="Holly is a Canadian-bor...
Jeff and Holly (but not Ted) each give one or more SQL Server classes. The XML returned in these results look like this for Jeff:
<classes speakerBio="Jeff has solid security experience based on years of hacking">
<class name="Writing Secure Code for ASP .NET" />
<class name="Using XQuery to Manipulate XML Data in SQL Server 2012" />
<class name="SQL Server and Oracle Working Together" />
<class name="Protecting against SQL Injection Attacks" />
</classes>
Example 7 shows a query similar to the previous one. This version demonstrates how to seamlessly integrate XQuery with ordinary filtering of relational columns, by simply building out the WHERE clause to further restrict by Country for USA only.
Example 7. Combining XQuery with relational column filtering.
SELECT * FROM Speaker
WHERE
ClassesXML.exist('/classes/class/@name[contains(., "SQL Server")]') = 1
AND Country = 'USA'
Executing this query returns
only Jeff. SQL Server will filter out the other two rows because Ted
does not have any SQL Server classes and Holly is from Canada.
The xml.value method takes an XQuery expression that resolves to a single value
and returns it, cast as the SQL Server data type you specify. You can
leverage this very powerful method to completely shield the internal XML
representation of your data, and expose ordinary scalar values with
ordinary SQL Server data types instead. Consider the query in Example 8.
Example 8. Using xml.value to represent XML data elements as scalar SQL Server data typed-columns
.SELECT
SpeakerName,
Country,
ClassesXML.value('/classes[1]/@speakerBio','varchar(max)') AS SpeakerBio,
ClassesXML.value('count(/classes/class)', 'int') AS SessionCount
FROM
Speaker
ORDER BY
ClassesXML.value('count(/classes/class)', 'int')
From
the output generated by this query, there is no indication that—behind
the scenes—the source for some of the output comes from an embedded XML
document, stored in an xml data type column, and then shredded with XQuery:
SpeakerName Country SpeakerBio SessionCount
----------- ------- -------------------------------------------------------- ------------
Ted Bremer USA Ted specializes in client development 2
Holly Holt Canada Holly is a Canadian-born database professional 3
Jeff Hay USA Jeff has solid security experience from years of hacking 4
The SpeakerName and Country columns came right out of the Speaker table. However, the SpeakerBio and SessionCount columns were each extracted from the ClassesXML column using xml.value with an XQuery expression and a SQL Server data type that the expression’s result was cast to. Because you are requesting a specific data type, the XQuery expression must resolve to a single value. That value can come from a node element’s inner text, attribute, or XQuery function, but it must be a single value. For SpeakerBio, the XQuery drills into the classes element for the speakerBio attribute, extracts its value, and casts it as a varchar(max) type. The XQuery for SessionCount invokes the count function to return the number of class elements nested beneath the classes element cast as an int. The same XQuery is used again in the ORDER BY clause, so that the results of the query themselves are sorted by a value derived from data embedded in XML content.
You can build views and TVFs
over queries such as this, and create an effective abstraction layer
over the way XML is stored internally in your database. This means you
can alter the XSD schemas and then adjust the XQuery expressions in your
views and TVFs accordingly, such that consumers remain unaffected.
Indeed, you could even transparently switch from XML storage to
traditional column storage and back again, without disturbing any
existing clients. SQL Server thus provides extremely flexible
abstraction in both directions, because you’ve seen the myriad of ways
to dynamically construct and serve XML from relational column data with
the various FOR XML options . This flexibility means you can choose just the
right degree of XML integration in your database that best suits your
needs—whether that involves persisting XML data, constructing XML data, or both.
The xml.query method accepts and executes an XQuery expression much like the xml.value method, but it always returns an xml data type result. So unlike xml.value,
the XQuery expression doesn’t need to resolve to a single value, and
can easily return multiple values as a subset of the source XML. But
furthermore, it can transform that source XML and produce entirely
different XML—even injecting values from other non-xml columns living the in same row as the xml column being queried. Example 9 demonstrates how this is achieved using FLWOR expressions and sql:column (a SQL Server XQuery extension).
Example 9. Using xml.query with FLWOR expressions and sql:column for XML transformations.
SELECT
SpeakerId,
ClassesXML.query('
let $c := count(/classes/class)
let $b := data(/classes[1]/@speakerBio)
return
<SpeakerInfo>
<Name>{sql:column("SpeakerName")}</Name>
<Country>{sql:column("Country")}</Country>
<Bio>{$b}</Bio>
<Sessions count="{$c}">
{
for $s in /classes/class
let $n := data($s/@name)
order by $n
return
<Session>{$n}</Session>
}
</Sessions>
</SpeakerInfo>
') AS SpeakerInfo
FROM
Speaker
The XML returned in these results looks like this for Jeff:
<SpeakerInfo>
<Name>Jeff Hay</Name>
<Country>USA</Country>
<Bio>Jeff has solid security experience from years of hacking</Bio>
<Sessions count="4">
<Session>Protecting against SQL Injection Attacks</Session>
<Session>SQL Server and Oracle Working Together</Session>
<Session>Using XQuery to Manipulate XML Data in SQL Server 2012</Session>
<Session>Writing Secure Code for ASP .NET</Session>
</Sessions>
</SpeakerInfo>
Let’s explain the code in detail. The XQuery expression in the xml.query method on the ClassesXML column begins with a FLWOR expression. The two let statements use XPath expressions to capture the speaker’s number of classes (using the count function) and bio text (using the data function), and stores the results into the variables $c and $b respectively. Then the return statement defines the shape of the XML to be constructed, starting with the root node’s <SpeakerInfo> element. Inside the root node, the <Name> and <Country> elements are returned, with values extracted from the SpeakerName and Country columns. These are values that are not present in the XML being parsed by xml.query, but are available as ordinary columns elsewhere in the same row, and are exposed using the special sql:column SQL Server extension to XQuery.
Next, the <Sessions> element is returned with a count attribute that returns the number of class elements beneath the source XML’s classes element. Within <Sessions>, a new (nested) FLWOR expression is used to iterate the speaker’s classes and build a sequence of <Session> elements. The for statement loops through the source XML’s classes element for each nested class element and stores it into the variable $s. The let statement then uses the data function to capture the string value inside the name attribute of the class element in $s and stores it into the variable $n. The inner FLWOR expression results (that is, the sequence of elements returned by the upcoming return statement) are sorted by name using the order by statement. Finally, the return statement generates a new <Session> element. The session name is rendered as the inner text of the <Session> element. This XQuery has essentially transformed the <Classes> and <Class name=“title”> structure of the source XML to a <Sessions> and <Session>title</Session> structure.
The sql:variable function is another very powerful SQL Server extension to XQuery. With it, you can easily parameterize your XQuery expressions using ordinary T-SQL parameters. This technique is demonstrated in Example 10.
Example 10. Using xml.query with sql:variable for parameterized transformations.
DECLARE @Category varchar(max) = 'SQL Server'
SELECT
SpeakerName,
Country,
ClassesXML.query('
<classes
category="{sql:variable("@Category")}"
speakerBio="{data(/classes[1]/@speakerBio)}">
{
for $c in /classes/class
where $c/@name[contains(., sql:variable("@Category"))]
return $c
}
</classes>') AS ClassesXML
FROM
Speaker
WHERE
ClassesXML.exist
('/classes/class/@name[contains(., sql:variable("@Category"))]') = 1
The results look like this:
SpeakerName Country ClassesXML
----------- ------- ----------
Jeff Hay USA <classes category="SQL Server" speakerBio="Jeff has solid e...
Holly Holt Canada <classes category="SQL Server" speakerBio="Holly is a Canad...
The XML returned in these results looks like this for Jeff:
<classes category="SQL Server"
speakerBio="Jeff has solid security experience from years of hacking">
<class name="Using XQuery to Manipulate XML Data in SQL Server 2012" />
<class name="SQL Server and Oracle Working Together" />
</classes>
In this example, the T-SQL @Category parameter is assigned the value SQL Server, and the sql:variable is then used in several places to reference @Category. The first reference adds a category attribute to the classes element. The second reference applies filtering against the name attribute using contains in the inner FLWOR expression’s where statement, and the last reference applies filtering at the resultset row level in the SELECT statement’s WHERE clause. Thus, only rows having SQL Server in the name of at least one class are returned in the resultset, and within those rows, only classes having SQL Server in their name are returned as elements in ClassesXML (all other non-SQL Server classes are filtered out).
Our last xml.query example demonstrates how to combine child elements into a delimited string value, as shown in Example 11.
Example 11. Using xml.query with CONVERT to combine child elements.
SELECT
SpeakerName,
Country,
CONVERT(varchar(max), ClassesXML.query('
for $s in /classes/class
let $n := data($s/@name)
let $p := concat($n, "|")
return $p')) AS SessionList
FROM
Speaker
The SessionList
column produced by this query contains a single pipe-delimited string
containing the names of all the classes given by the speaker:
SpeakerName Country SessionList
----------- ------- ----------------------------------------------------------------------
Jeff Hay USA Writing Secure Code for ASP .NET| Using XQuery to Manipulate XML Da...
Holly Holt Canada SQL Server Profiler| Advanced SQL Querying Techniques| SQL Server a...
Ted Bremer USA Smart Client Stuff| More Smart Client Stuff|
This XQuery expression in Example 11 simply iterates each class element, extracts the name
attribute, and concatenates it with a pipe symbol, appending each
result to build a single string. Although the elements are ultimately
combined to form a single value, they are still multiple values from an
XPath perspective, and so xml.value cannot be used. Instead, xml.query produces the concatenated string, and CONVERT is used to cast the result as a varchar(max) data type.
The W3C XQuery specification does not provide a way for you to modify XML data as you can modify relational table data using the INSERT, UPDATE, and DELETE keywords in T-SQL. So Microsoft has created its own XML data manipulation language, XML DML, which is included in its own XQuery implementation.
XML DML gives you three ways to manipulate the XML data of a column via the xml.modify method:
xml.modify(insert) Allows you to insert a node or sequence of nodes into the xml data type instance you are working with.
xml.modify(delete) Allows you to delete zero or more nodes that are the result of the output sequence of the XQuery expression you specify.
xml.modify(replace) Modifies the value of a single node.
The xml.modify(insert) method allows you to insert a node or sequence of nodes into the xml data type instance you are working with. You use the xml.modify method in conjunction with a T-SQL UPDATE statement and, if necessary, a T-SQL or XQuery where clause (or both). For example, the code in Example 12 adds another <class> element to Jeff’s <classes> element in ClassesXML.
Example 12. Using xml.modify to insert a new element.
UPDATE Speaker
SET ClassesXML.modify('
insert
<class name="Ranking and Windowing Functions in SQL Server" />
into
/classes[1]')
WHERE SpeakerId = 1
The xml.modify(delete) method deletes zero or more nodes based on the criteria you specify. For example, the code in Example 13 deletes the fourth <class> element from Jeff’s <classes> element in ClassesXML.
Example 13. Using xml.modify to delete an element.
UPDATE Speaker
SET ClassesXML.modify('delete /classes/class[4]')
WHERE SpeakerId = 1
Finally, the xml.modify(replace) method allows you to replace XML data with new information. For example, the code in Example 14 updates the name attribute in the third <class> element of Jeff’s <classes> element in ClassesXML.
Example 14. Using xml.modify to update an element.
UPDATE Speaker
SET ClassesXML.modify('
replace value of /classes[1]/class[3]/@name[1]
with "Getting SQL Server and Oracle to Work Together"')
WHERE SpeakerId = 1