Tuesday, 22 April 2014

Is DISTINCT faster than GROUP BY?

I have often seen a recommendation that using DISTINCT instead of GROUP BY is better from a performance perspective. DISTINCT is definitely a more readable alternative, but I wanted to test for myself as to what the truth was.
To compare the performance of the two, I wrote the following simple queries.
The query below helps in identifying the number of records that SQL Server will need to process. On my machine, when executed without the GROUP BY or the DISTINCT clause, the query generates records in excess of 380,000.
1.USE AdventureWorks2012;
2.GO
3.SELECT  COUNT(so1.object_id)
4.FROM    sys.objects AS so1
5.CROSS JOIN sys.objects AS so2 ;
6.GO
The queries below fetch the same information – one uses the GROUP BY whereas the other query uses a DISTINCT.
01.USE AdventureWorks2012;
02.GO
03.--Clean the buffers and procedure cache
04.DBCC DROPCLEANBUFFERS;
05.DBCC FREEPROCCACHE;
06.GO
07. 
08.--Evaluate GROUP BY
09.SELECT  so1.object_id,
10.so1.name
11.FROM    sys.objects AS so1
12.CROSS JOIN sys.objects AS so2
13.GROUP BY so1.object_id, so1.name ;
14.GO
15. 
16.--Clean the buffers and procedure cache
17.DBCC DROPCLEANBUFFERS;
18.DBCC FREEPROCCACHE;
19.GO
20. 
21. 
22.--Evaluate DISTINCT
23.SELECT  DISTINCT
24.so1.object_id,
25.so1.name
26.FROM    sys.objects AS so1
27.CROSS JOIN sys.objects AS so2 ;
28.GO
Before executing the two queries (one with the GROUP BY and the other with the DISTINCT), I had turned on the actual execution plan.
image
Upon studying the execution plan, the following observations can be made:
  1. When the SQL server database engine detected that the 2nd query did not have any aggregation function, it was smart enough to convert the GROUP BY into a DISTINCT when the query plan is generated
  2. There is no difference in the execution plan across both the queries
In conclusion,

DISTINCT is not faster than GROUP BY. In fact, Both DISTINCT and GROUP BY demonstrate equal performance impact.
When aggregations are not being used, DISTINCT is more readable when compared to GROUP BY.

No comments:

Post a Comment