1. Introduction
I need to build up a dynamic SQL command to query data from existing database tables. The input filtering condition is a string which contains a very long list of key ids separated by commas as delimiters. However, the EXECUTE or sp_executesql which is used to execute the dynamic query only supports the maximum length of 4000 characters. Therefore, we need to split the string of key ids and store into a temporary storage (temp table or variable table) to reduce the size of query string.
For instance, instead of
exec sp_executesql ‘SELECT * FROM BigDataTable WHERE ID in (1,2,3,4,5,6…..)’ |
The new statement should be
exec sp_executesql ‘SELECT * FROM BigDataTable data INNER JOIN #temp t ON data.id = t.keyid’ |
Moreover, we need to consider the performance of the query when we have a long list of key ids and large data tables (for ex: more than 100,000 records).
Above is the challenge which faced with in one of my projects and via this article I will share the solutions which I used to overcome. I try to keep the examples simple as much as possible for you to understand easily. I hope you will find this article helpful when you face with the real situation which is a lot more complicated.
2. Solution
There is no best option for all situations. Depending on the size of data, we can consider one of the options below:
· Manual parsing: The function is to manually loop through the string to parse it and return the ids as a row in a table variable.
· XML with temp table: Convert the delimited string to xml structure and select the nodes into an SQL temporary table
· XML with table variable: Convert the delimited string to xml structure and select the nodes into an SQL table variable
· DelimitedSlit8K: An Improved SQL 8K “CSV Splitter” Function which is described in http://www.sqlservercentral.com/articles/Tally+Table/72993/
Now, let’s take a look in each option in details:
2.1. Manual splitting:
CREATEFUNCTION [dbo].[SplitString] ( @vcDelimitedString NVARCHAR(Max) ) RETURNS @tblArray TABLE(Element VARCHAR(10)) AS BEGIN DECLARE @Index SMALLINT DECLARE @Start SMALLINT DECLARE @DelSize SMALLINT DECLARE @vcDelimiter SMALLINT(100) DECLARE @Value SMALLINT(100) SET @vcDelimiter =',' SET @DelSize =LEN(@vcDelimiter) WHILELEN(@vcDelimitedString)> 0 BEGIN SET @Index =CHARINDEX(@vcDelimiter, @vcDelimitedString) IF @Index = 0 -- No delimiter found or last item of the string. BEGIN INSERTINTO @tblArray VALUES(@vcDelimitedString) BREAK END ELSE BEGIN SET @Value =SUBSTRING(@vcDelimitedString, 1,@Index - 1) INSERTINTO @tblArray VALUES(@Value) SET @Start = @Index + @DelSize -- Remove the item has been parsed. SET @vcDelimitedString =SUBSTRING(@vcDelimitedString, @Start ,LEN(@vcDelimitedString)- @Start + 1) END END RETURN |
2.2. Using XML
DECLARE @Split char(1), @X xml SET @Split =',' DECLARE @tblArray TABLE(Element varchar(10)) SELECT @X =CONVERT(xml,' INSERTINTO @tblArray (Element) SELECT T.c.value('.','varchar(10)') FROM @X.nodes('/root/s') T(c) |
In this option, we may change the table variable tblArray to temp table if the data is too big or join with a very big table. See the performance comparisons for more details.
2.3. Using DelimitedSplit8K function:
See DelimitedSplit8K function in http://www.sqlservercentral.com/articles/Tally+Table/72993/
CREATEFUNCTION dbo.DelimitedSplit8K --===== Define I/O parameters (@pString NVARCHAR(MAX), @pDelimiter CHAR(1)) RETURNSTABLEWITHSCHEMABINDINGAS RETURN --===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000... -- enough to cover VARCHAR(8000) WITH E1(N)AS( SELECT 1 UNION ALLSELECT 1 UNION ALLSELECT 1 UNION ALL SELECT 1 UNION ALLSELECT 1 UNION ALLSELECT 1 UNION ALL SELECT 1 UNION ALLSELECT 1 UNION ALLSELECT 1 UNION ALLSELECT 1 ),--10E+1 or 10 rows E2(N)AS(SELECT 1 FROM E1 a, E1 b),--10E+2 or 100 rows E4(N)AS(SELECT 1 FROM E2 a, E2 b),--10E+4 or 10,000 rows max cteTally(N)AS(--==== This provides the "zero base" and limits the number of rows right up front -- for both a performance gain and prevention of accidental "overruns" SELECT 0 UNION ALL SELECTTOP(DATALENGTH(ISNULL(@pString,1)))ROW_NUMBER()OVER(ORDERBY(SELECTNULL))FROM E4 ), cteStart(N1)AS(--==== This returns N+1 (starting position of each "element" just once for each delimiter) SELECT t.N+1 FROM cteTally t WHERE(SUBSTRING(@pString,t.N,1)= @pDelimiter OR t.N = 0) ) --===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found. SELECT ItemNumber =ROW_NUMBER()OVER(ORDERBY s.N1), Element=SUBSTRING(@pString,s.N1,ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000)) FROM cteStart s ; |
3. Performance Comparison:
Below are the queries which I used to check the performance of each option on various sizes of data.
Manually splitting (SplitString function):
SELECT * FROM dbo.BigDataTable data INNER JOIN [dbo].[SplitString](@InputString) s ON data.ID = s.Element |
XML with temp table
DECLARE @Split char(1), @X xml SET @Split = ',' DECLARE @tblArray TABLE (Element varchar(10)) SELECT @X = CONVERT(xml,' SELECT T.c.value('.','varchar(10)') as Element INTO #Temp FROM @X.nodes('/root/s') T(c) SELECT * FROM dbo.BigDataTabledata INNER JOIN #Temp t ON t.Element = data.ID |
XML with variable table
DECLARE @Split char(1), @X xml SET @Split = ',' DECLARE @tblArray TABLE (Element varchar(10)) SELECT @X = CONVERT(xml,' INSERT INTO @tblTemp (Element) SELECT T.c.value('.','varchar(10)') FROM @X.nodes('/root/s') T(c) SELECT * FROM dbo.BigDataTabledata INNER JOIN @tblTemp t ON t.Element = data.ID |
DelimitedSplit8K
SELECT * FROM dbo.BigDataTabledata INNER JOIN DelimitedSplit8K(@InputString,',') s ON data.ID = s.Element |
And the following is the result in millisecond.
No of Records | Option 1 ( Manually parse) | Option 2 ( XML with Temp table) | Option 2 ( XML with table variable) | Option 3 (DelimitedSplit8K) |
100 | 40 | 321 | 40 | 52 |
500 | 185 | 443 | 174 | 238 |
1000 | 356 | 574 | 239 | 455 |
2000 | 991 | 887 | 984 | 964 |
10000 | 3931 | 2703 | 3329 | 975 |
In above options, in case the data table contains less than 2,000 records then the option of XML combined with table variable almost has the best performance. However, when size of the data is bigger the option of DelimitedSplit8K is the best.
Also, the option of temp table takes longer when the size of data is less than 2,000 records and it will be a bit faster in case of large data.
4. Conclusion:
Thus, I’ve shown the two challenges with building and executing the dynamic queries:
· The size of the SQL command
· The performance when the input filtering condition is long and the size of data is big.
I’ve also shown the 4 options which we can consider to have the best performance depending on the context when the query is executed. They are:
· Manual parsing
· XML with temp table
· XML with table variable
· XML with table variableDelimitedSlit8K
Even though this article uses MS SQL 2005 for demonstration the idea should be applicable to other DBMSes.
Thanks! I found this very useful.
ReplyDelete