Saturday, 6 August 2011

Splitting string to have the best performance for dynamic queries

Optimizing the performance of querying data from database requires the attention to not only the hardware but also the development of the software. In this article, I’ll show you one specific need on splitting strings in SQL 2005 and solutions to use them in the appropriate context for better performance.

1. Introduction
I need to build up a dynamic SQL command to query data from existing database tables. The input filtering condition is a string which contains a very long list of key ids separated by commas as delimiters. However, the EXECUTE or sp_executesql which is used to execute the dynamic query only supports the maximum length of 4000 characters. Therefore, we need to split the string of key ids and store into a temporary storage (temp table or variable table) to reduce the size of query string.
For instance, instead of
exec sp_executesql ‘SELECT * FROM BigDataTable WHERE ID in (1,2,3,4,5,6…..)’
The new statement should be
exec sp_executesql ‘SELECT * FROM BigDataTable data INNER JOIN #temp t ON data.id = t.keyid’
Moreover, we need to consider the performance of the query when we have a long list of key ids and large data tables (for ex: more than 100,000 records).
Above is the challenge which faced with in one of my projects and via this article I will share the solutions which I used to overcome. I try to keep the examples simple as much as possible for you to understand easily. I hope you will find this article helpful when you face with the real situation which is a lot more complicated.
2. Solution
There is no best option for all situations. Depending on the size of data, we can consider one of the options below:
· Manual parsing: The function is to manually loop through the string to parse it and return the ids as a row in a table variable.
· XML with temp table: Convert the delimited string to xml structure and select the nodes into an SQL temporary table
· XML with table variable: Convert the delimited string to xml structure and select the nodes into an SQL table variable
· DelimitedSlit8K: An Improved SQL 8K “CSV Splitter” Function which is described in http://www.sqlservercentral.com/articles/Tally+Table/72993/
Now, let’s take a look in each option in details:
2.1. Manual splitting:
CREATEFUNCTION [dbo].[SplitString]
(
@vcDelimitedString NVARCHAR(Max)
)
RETURNS @tblArray TABLE(Element VARCHAR(10))
AS
BEGIN
DECLARE @Index SMALLINT
DECLARE @Start SMALLINT
DECLARE @DelSize SMALLINT
DECLARE @vcDelimiter SMALLINT(100)
DECLARE @Value SMALLINT(100)
SET @vcDelimiter =','
SET @DelSize =LEN(@vcDelimiter)
WHILELEN(@vcDelimitedString)> 0
BEGIN
SET @Index =CHARINDEX(@vcDelimiter, @vcDelimitedString)
IF @Index = 0 -- No delimiter found or last item of the string.
BEGIN
INSERTINTO @tblArray VALUES(@vcDelimitedString)
BREAK
END
ELSE
BEGIN
SET @Value =SUBSTRING(@vcDelimitedString, 1,@Index - 1)
INSERTINTO @tblArray VALUES(@Value)
SET @Start = @Index + @DelSize
-- Remove the item has been parsed.
SET @vcDelimitedString =SUBSTRING(@vcDelimitedString, @Start ,LEN(@vcDelimitedString)- @Start + 1)
END
END
RETURN
2.2. Using XML
DECLARE
@Split char(1),
@X xml
SET @Split =','
DECLARE @tblArray TABLE(Element varchar(10))
SELECT @X =CONVERT(xml,''+REPLACE(@InputString,@Split,'')+'')
INSERTINTO @tblArray (Element)
SELECT T.c.value('.','varchar(10)')
FROM @X.nodes('/root/s') T(c)
In this option, we may change the table variable tblArray to temp table if the data is too big or join with a very big table. See the performance comparisons for more details.
2.3. Using DelimitedSplit8K function:
See DelimitedSplit8K function in http://www.sqlservercentral.com/articles/Tally+Table/72993/
CREATEFUNCTION dbo.DelimitedSplit8K
--===== Define I/O parameters
(@pString NVARCHAR(MAX), @pDelimiter CHAR(1))
RETURNSTABLEWITHSCHEMABINDINGAS
RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000...
-- enough to cover VARCHAR(8000)
WITH E1(N)AS(
SELECT 1 UNION ALLSELECT 1 UNION ALLSELECT 1 UNION ALL
SELECT 1 UNION ALLSELECT 1 UNION ALLSELECT 1 UNION ALL
SELECT 1 UNION ALLSELECT 1 UNION ALLSELECT 1 UNION ALLSELECT 1
),--10E+1 or 10 rows
E2(N)AS(SELECT 1 FROM E1 a, E1 b),--10E+2 or 100 rows
E4(N)AS(SELECT 1 FROM E2 a, E2 b),--10E+4 or 10,000 rows max
cteTally(N)AS(--==== This provides the "zero base" and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT 0 UNION ALL
SELECTTOP(DATALENGTH(ISNULL(@pString,1)))ROW_NUMBER()OVER(ORDERBY(SELECTNULL))FROM E4
),
cteStart(N1)AS(--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT t.N+1
FROM cteTally t
WHERE(SUBSTRING(@pString,t.N,1)= @pDelimiter OR t.N = 0)
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber =ROW_NUMBER()OVER(ORDERBY s.N1),
Element=SUBSTRING(@pString,s.N1,ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000))
FROM cteStart s
;
3. Performance Comparison:
Below are the queries which I used to check the performance of each option on various sizes of data.
Manually splitting (SplitString function):
SELECT *
FROM dbo.BigDataTable data
INNER JOIN [dbo].[SplitString](@InputString) s
ON data.ID = s.Element
XML with temp table
DECLARE
@Split char(1),
@X xml
SET @Split = ','
DECLARE @tblArray TABLE (Element varchar(10))
SELECT @X = CONVERT(xml,'' + REPLACE(@InputString,@Split,'') + '')
SELECT T.c.value('.','varchar(10)') as Element
INTO #Temp
FROM @X.nodes('/root/s') T(c)
SELECT *
FROM dbo.BigDataTabledata
INNER JOIN #Temp t
ON t.Element = data.ID
XML with variable table
DECLARE
@Split char(1),
@X xml
SET @Split = ','
DECLARE @tblArray TABLE (Element varchar(10))
SELECT @X = CONVERT(xml,'' + REPLACE(@InputString,@Split,'') + '')
INSERT INTO @tblTemp (Element)
SELECT T.c.value('.','varchar(10)')
FROM @X.nodes('/root/s') T(c)
SELECT *
FROM dbo.BigDataTabledata
INNER JOIN @tblTemp t
ON t.Element = data.ID
DelimitedSplit8K
SELECT *
FROM dbo.BigDataTabledata
INNER JOIN DelimitedSplit8K(@InputString,',') s
ON data.ID = s.Element
And the following is the result in millisecond.
No of Records
Option 1
( Manually parse)
Option 2
( XML with Temp table)
Option 2
( XML with table variable)
Option 3 (DelimitedSplit8K)
100
40
321
40
52
500
185
443
174
238
1000
356
574
239
455
2000
991
887
984
964
10000
3931
2703
3329
975
In above options, in case the data table contains less than 2,000 records then the option of XML combined with table variable almost has the best performance. However, when size of the data is bigger the option of DelimitedSplit8K is the best.
Also, the option of temp table takes longer when the size of data is less than 2,000 records and it will be a bit faster in case of large data.
4. Conclusion:
Thus, I’ve shown the two challenges with building and executing the dynamic queries:
· The size of the SQL command
· The performance when the input filtering condition is long and the size of data is big.
I’ve also shown the 4 options which we can consider to have the best performance depending on the context when the query is executed. They are:
· Manual parsing
· XML with temp table
· XML with table variable
· XML with table variableDelimitedSlit8K
Even though this article uses MS SQL 2005 for demonstration the idea should be applicable to other DBMSes.

1 comment: