Software Development Knowledge and Experience: Splitting string to have the best performance for dynamic queries

Optimizing the performance of querying data from database requires the attention to not only the hardware but also the development of the software. In this article, I’ll show you one specific need on splitting strings in SQL 2005 and solutions to use them in the appropriate context for better performance.

1. Introduction

I need to build up a dynamic SQL command to query data from existing database tables. The input filtering condition is a string which contains a very long list of key ids separated by commas as delimiters. However, the EXECUTE or sp_executesql which is used to execute the dynamic query only supports the maximum length of 4000 characters. Therefore, we need to split the string of key ids and store into a temporary storage (temp table or variable table) to reduce the size of query string.

For instance, instead of

exec sp_executesql ‘SELECT * FROM BigDataTable WHERE ID in (1,2,3,4,5,6…..)’

The new statement should be

exec sp_executesql ‘SELECT * FROM BigDataTable data INNER JOIN #temp t ON data.id = t.keyid’

Moreover, we need to consider the performance of the query when we have a long list of key ids and large data tables (for ex: more than 100,000 records).

Above is the challenge which faced with in one of my projects and via this article I will share the solutions which I used to overcome. I try to keep the examples simple as much as possible for you to understand easily. I hope you will find this article helpful when you face with the real situation which is a lot more complicated.

2. Solution

There is no best option for all situations. Depending on the size of data, we can consider one of the options below:

· Manual parsing: The function is to manually loop through the string to parse it and return the ids as a row in a table variable.

· XML with temp table: Convert the delimited string to xml structure and select the nodes into an SQL temporary table

· XML with table variable: Convert the delimited string to xml structure and select the nodes into an SQL table variable

· DelimitedSlit8K: An Improved SQL 8K “CSV Splitter” Function which is described in http://www.sqlservercentral.com/articles/Tally+Table/72993/

Now, let’s take a look in each option in details:

2.1. Manual splitting:

CREATEFUNCTION [dbo].[SplitString]

(

@vcDelimitedString NVARCHAR(Max)

)

RETURNS @tblArray TABLE(Element VARCHAR(10))

BEGIN

DECLARE @Index SMALLINT

DECLARE @Start SMALLINT

DECLARE @DelSize SMALLINT

DECLARE @vcDelimiter SMALLINT(100)

DECLARE @Value SMALLINT(100)

SET @vcDelimiter =','

SET @DelSize =LEN(@vcDelimiter)

WHILELEN(@vcDelimitedString)> 0

BEGIN

SET @Index =CHARINDEX(@vcDelimiter, @vcDelimitedString)

IF @Index = 0 -- No delimiter found or last item of the string.

BEGIN

INSERTINTO @tblArray VALUES(@vcDelimitedString)

BREAK

END

ELSE

BEGIN

SET @Value =SUBSTRING(@vcDelimitedString, 1,@Index - 1)

INSERTINTO @tblArray VALUES(@Value)

SET @Start = @Index + @DelSize

-- Remove the item has been parsed.

SET @vcDelimitedString =SUBSTRING(@vcDelimitedString, @Start ,LEN(@vcDelimitedString)- @Start + 1)

END

RETURN

2.2. Using XML

DECLARE

@Split char(1),

@X xml

SET @Split =','

DECLARE @tblArray TABLE(Element varchar(10))

SELECT @X =CONVERT(xml,''~~+REPLACE(@InputString,@Split,'~~'~~)+'~~')

INSERTINTO @tblArray (Element)

SELECT T.c.value('.','varchar(10)')

FROM @X.nodes('/root/s') T(c)

In this option, we may change the table variable tblArray to temp table if the data is too big or join with a very big table. See the performance comparisons for more details.

2.3. Using DelimitedSplit8K function:

See DelimitedSplit8K function in http://www.sqlservercentral.com/articles/Tally+Table/72993/

CREATEFUNCTION dbo.DelimitedSplit8K

--===== Define I/O parameters

(@pString NVARCHAR(MAX), @pDelimiter CHAR(1))

RETURNSTABLEWITHSCHEMABINDINGAS

RETURN

--===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000...

-- enough to cover VARCHAR(8000)

WITH E1(N)AS(

SELECT 1 UNION ALLSELECT 1 UNION ALLSELECT 1 UNION ALL

SELECT 1 UNION ALLSELECT 1 UNION ALLSELECT 1 UNION ALLSELECT 1

),--10E+1 or 10 rows

E2(N)AS(SELECT 1 FROM E1 a, E1 b),--10E+2 or 100 rows

E4(N)AS(SELECT 1 FROM E2 a, E2 b),--10E+4 or 10,000 rows max

cteTally(N)AS(--==== This provides the "zero base" and limits the number of rows right up front

-- for both a performance gain and prevention of accidental "overruns"

SELECT 0 UNION ALL

SELECTTOP(DATALENGTH(ISNULL(@pString,1)))ROW_NUMBER()OVER(ORDERBY(SELECTNULL))FROM E4

cteStart(N1)AS(--==== This returns N+1 (starting position of each "element" just once for each delimiter)

SELECT t.N+1

FROM cteTally t

WHERE(SUBSTRING(@pString,t.N,1)= @pDelimiter OR t.N = 0)

)

--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.

SELECT ItemNumber =ROW_NUMBER()OVER(ORDERBY s.N1),

Element=SUBSTRING(@pString,s.N1,ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000))

FROM cteStart s

;

3. Performance Comparison:

Below are the queries which I used to check the performance of each option on various sizes of data.

Manually splitting (SplitString function):

SELECT *

FROM dbo.BigDataTable data

INNER JOIN [dbo].[SplitString](@InputString) s

ON data.ID = s.Element

XML with temp table

DECLARE

@Split char(1),

@X xml

SET @Split = ','

DECLARE @tblArray TABLE (Element varchar(10))

SELECT @X = CONVERT(xml,'~~' + REPLACE(@InputString,@Split,'') + '~~')

SELECT T.c.value('.','varchar(10)') as Element

INTO #Temp

FROM @X.nodes('/root/s') T(c)

SELECT *

FROM dbo.BigDataTabledata

INNER JOIN #Temp t

ON t.Element = data.ID

XML with variable table

DECLARE

@Split char(1),

@X xml

SET @Split = ','

DECLARE @tblArray TABLE (Element varchar(10))

SELECT @X = CONVERT(xml,'~~' + REPLACE(@InputString,@Split,'') + '~~')

INSERT INTO @tblTemp (Element)

SELECT T.c.value('.','varchar(10)')

FROM @X.nodes('/root/s') T(c)

SELECT *

FROM dbo.BigDataTabledata

INNER JOIN @tblTemp t

ON t.Element = data.ID

DelimitedSplit8K

SELECT *

FROM dbo.BigDataTabledata

INNER JOIN DelimitedSplit8K(@InputString,',') s

ON data.ID = s.Element

And the following is the result in millisecond.

No of Records	Option 1 ( Manually parse)	Option 2 ( XML with Temp table)	Option 2 ( XML with table variable)	Option 3 (DelimitedSplit8K)
100	40	321	40	52
500	185	443	174	238
1000	356	574	239	455
2000	991	887	984	964
10000	3931	2703	3329	975

In above options, in case the data table contains less than 2,000 records then the option of XML combined with table variable almost has the best performance. However, when size of the data is bigger the option of DelimitedSplit8K is the best.

Also, the option of temp table takes longer when the size of data is less than 2,000 records and it will be a bit faster in case of large data.

4. Conclusion:

Thus, I’ve shown the two challenges with building and executing the dynamic queries:

· The size of the SQL command

· The performance when the input filtering condition is long and the size of data is big.

I’ve also shown the 4 options which we can consider to have the best performance depending on the context when the query is executed. They are:

· Manual parsing

· XML with temp table

· XML with table variable

· XML with table variableDelimitedSlit8K

Even though this article uses MS SQL 2005 for demonstration the idea should be applicable to other DBMSes.

Pages

Labels

Saturday, 6 August 2011

Splitting string to have the best performance for dynamic queries

1 comment:

Pages

Labels

Saturday, 6 August 2011

Splitting string to have the best performance for dynamic queries

1 comment:

Subscribe To