MS SQL Server Table Variable And Nested Select Gotcha (Bug?)

MS SQL Server Table Variable And Nested Select Gotcha (Bug?)

Posted by Brad Wood
Oct 28, 2009 05:54:00 UTC
My coworker found this interesting little SQL Server behavior today that was quite startling at first glance. It's like punching 2 + 2 into your calculator and having the result come back as 5. After pin-pointing the cause for the behavior, it ALMOST makes sense... except for the fact that it doesn't make sense at all. It involves a randomized sub-select and table variables.This is the part of the post where I pull out the simplified example that is rather contrived, but shows the point quite clearly. Consider the following code:
[code]DECLARE @test TABLE     
	(testID int)
		
INSERT INTO @test (testID)
SELECT 1
union SELECT 2 union SELECT 3 union SELECT 4
union SELECT 5 union SELECT 6 union SELECT 7
union SELECT 8 union SELECT 9 union SELECT 10
		
SELECT testID
FROM @test   
WHERE testID IN ( SELECT TOP 5 testID 
				FROM @test
				ORDER BY newID() )
			
[/code]
Ok, nothing too crazy going on here. Declare a table variable, throw 10 records in it, and select out the five that are returned from our randomized TOP 5 sub select. Right? Wrong! Running this repeatedly returns a DIFFERENT number of records each time between 1 and 10. If we run the sub select by itself, it always returns 5 records, yet the main select seems to return whatever it feels like. Well, what's going on here? The answer is found in the execution plan: Here are the details for the circled portions:       Notice how the table scan for the sub select was processed 10 times-- that's once for every record in the main select. It returned a total of 100 rows. The sort operation also ran 10 times and since they were randomly ordered, it chose a DIFFERENT top 5 every time. Of course there is some overlap, so the actual number of unique rows varies. In case it isn't obvious yet, I'll point out that this behavior is also very bad for performance. Imagine if my table variable had 1,000 records in it. That table scan operation would have returned 1,000,000 records! So, what's the fix? One way is to convert your table variable to a temp table like so:
[code]CREATE TABLE #test
	(testID int)
		
INSERT INTO #test (testID)
SELECT 1
union SELECT 2 union SELECT 3 union SELECT 4
union SELECT 5 union SELECT 6 union SELECT 7
union SELECT 8 union SELECT 9 union SELECT 10
		
SELECT testID
FROM #test   
WHERE testID IN ( SELECT TOP 5 testID 
				FROM #test
				ORDER BY newID() )
[/code]
That code will always return no more than 5 random results at a time. Let's take a peek at its execution plan: As you can see it is very similar. The main difference is that the lower right table scan only runs ONCE, as does the Compute Scalar (calculation the newid), and the sort. The difference here is the table spool that runs 10 times on the SAME 5 records. If the column in question is unique, my second solution is to keep the table variable, but simply make the column used in the join a PRIMARY KEY.
[code]DECLARE @test TABLE     
	(testID int PRIMARY KEY)   
		
INSERT INTO @test (testID)
SELECT 1
union SELECT 2 union SELECT 3 union SELECT 4
union SELECT 5 union SELECT 6 union SELECT 7
union SELECT 8 union SELECT 9 union SELECT 10
		
SELECT testID
FROM @test   
WHERE testID IN ( SELECT TOP 5 testID 
				FROM @test
				ORDER BY newID() )
[/code]
Notice that our table scan has gone away (it's about time) and has been replaced with a sleek Clustered Index Scan that only runs ONE time. In fact, the Compute Scalar and Sort only run 1 time as well, returning only 5 records like we would have expected it to. This is one of those things I can't quite explain. I can look at the plans and see what SQL Server is doing, but I'm not sure why or if it should be classified as a bug or not. I have run these tests on SQL Server 2000 and 2008 with the same results. I have also Googled quite a bit to find an explanation of the behavior, but I can't find one. Perhaps if you know why, you can chime in. But for now, be very careful when using table variables in a sub-select.

Andy Sandefer

I cannot explain this but I've encountered it before. A couple of years ago I was reading an XML file into memory in a Stored Procedure and then parsing it and then working with the parsed data in other statements, ultimately updating the database with it. What I found was strange. If I performed the operations but actually created a hard table instead of using a temporary table and then dropped the table after the procedure was finished with the data then the program worked as expected. However, when I did the same operations with a temporary table I got almost seemingly random sorting and unpredictable record counts. In the end, creating the true table and then dropping it did not actually cause any performance degradation so that was the direction I took to resolve the issue.

Aaron Foote

Confusing, but I can explain. What is actually happening that the code “SELECT TOP( 5 ) testID FROM @test ORDER BY newID() ) is being executed for EACH row in @testID. SQL Server is looping over the table @testID and getting a row. For each row it checks to see if that row is satisfied by the IN clause. It evaluates the IN clause and get a top(5) random result. If the row from tableID is in the inclause it will be in the result set, if not it is excluded. When the next row is fetched from @tableID – sql server again evaluates the IN clause – getting a different set of 5 random results.

Cheers, Aaron

Aaron Foote

I think I should have read ALL of your post before I decided to dive in to SSMS.

Brad Wood

@Aaron: It's all good. If you're like me, you love a good mystery and can't help but jump into "I gotta' solve it" mode. :)

@Andy: Thanks for sharing. I bet that was a pain to troubleshoot.

Leigh

I came across a bug report mentioning similar strange/unexpected behavior with newID(). If you look at the comments from Jim/MS it seems the advice about scalar functions is: expect the unexpected.

https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=350485

Brad Wood

@Leigh: That's really very interesting. Especially the part where Microsoft basically says, "Yes, it's a bug but we're not going to fix it because it works 99% of the time and performs faster".

Leigh

Yes, a bit disconcerting when that 1% of the time defies normal expectations in terms of results. Faster is better .. but only when there is equal accuracy of information ;-)