I have often seen a recommendation that using DISTINCT instead of
GROUP BY is better from a performance perspective. DISTINCT is
definitely a more readable alternative, but I wanted to test for myself
as to what the truth was.
To compare the performance of the two, I wrote the following simple queries.
The query below helps in identifying the number of records that SQL Server will need to process. On my machine, when executed without the GROUP BY or the DISTINCT clause, the query generates records in excess of 380,000.
The queries below fetch the same information – one uses the GROUP BY whereas the other query uses a DISTINCT.
Before executing the two queries (one with the GROUP BY and the other
with the DISTINCT), I had turned on the actual execution plan.
Upon studying the execution plan, the following observations can be made:
To compare the performance of the two, I wrote the following simple queries.
The query below helps in identifying the number of records that SQL Server will need to process. On my machine, when executed without the GROUP BY or the DISTINCT clause, the query generates records in excess of 380,000.
1.
USE AdventureWorks2012;
2.
GO
3.
SELECT
COUNT
(so1.object_id)
4.
FROM
sys.objects
AS
so1
5.
CROSS
JOIN
sys.objects
AS
so2 ;
6.
GO
01.
USE AdventureWorks2012;
02.
GO
03.
--Clean the buffers and procedure cache
04.
DBCC DROPCLEANBUFFERS;
05.
DBCC FREEPROCCACHE;
06.
GO
07.
08.
--Evaluate GROUP BY
09.
SELECT
so1.object_id,
10.
so1.
name
11.
FROM
sys.objects
AS
so1
12.
CROSS
JOIN
sys.objects
AS
so2
13.
GROUP
BY
so1.object_id, so1.
name
;
14.
GO
15.
16.
--Clean the buffers and procedure cache
17.
DBCC DROPCLEANBUFFERS;
18.
DBCC FREEPROCCACHE;
19.
GO
20.
21.
22.
--Evaluate DISTINCT
23.
SELECT
DISTINCT
24.
so1.object_id,
25.
so1.
name
26.
FROM
sys.objects
AS
so1
27.
CROSS
JOIN
sys.objects
AS
so2 ;
28.
GO
Upon studying the execution plan, the following observations can be made:
- When the SQL server database engine detected that the 2nd query did not have any aggregation function, it was smart enough to convert the GROUP BY into a DISTINCT when the query plan is generated
- There is no difference in the execution plan across both the queries
DISTINCT is not faster than GROUP BY. In fact, Both DISTINCT and GROUP BY demonstrate equal performance impact.
When aggregations are not being used, DISTINCT is more readable when compared to GROUP BY.
No comments:
Post a Comment