在ITPUB上看了一个帖子,根据日期类型对每天的记录进行GROUP BY,帖子的地址如下:http://www.itpub.net/thread-1564295-1-1.html
这种包含全表扫描执行GROUP BY的语句是否还有优化的余地吗,事实上确实还有,因为对于处理日期类型,TO_CHAR并没有TRUNC高效。
下面看一个简单的例子:
SQL> CREATE TABLE T_DATE AS 2 SELECT ROWNUM ID, CREATED 3 FROM DBA_OBJECTS A, (SELECT 1 FROM DUAL CONNECT BY ROWNUM < 100) 4 WHERE ROWNUM <= 1000000; TABLE created. SQL> SELECT COUNT(*) FROM T_DATE; COUNT(*) ---------- 1000000 SQL> SET TIMING ON SQL> SELECT TO_CHAR(CREATED, 'YYYY-MM-DD'), COUNT(*) 2 FROM T_DATE 3 GROUP BY TO_CHAR(CREATED, 'YYYY-MM-DD'); TO_CHAR(CR COUNT(*) ---------- ---------- 2012-01-07 3600 2012-01-08 3750 2012-01-09 4650 2012-01-06 987925 2012-01-10 75 Elapsed: 00:00:00.46 SQL> SELECT TO_CHAR(CREATED, 'YYYY-MM-DD'), COUNT(*) 2 FROM T_DATE 3 GROUP BY TO_CHAR(CREATED, 'YYYY-MM-DD'); TO_CHAR(CR COUNT(*) ---------- ---------- 2012-01-07 3600 2012-01-08 3750 2012-01-09 4650 2012-01-06 987925 2012-01-10 75 Elapsed: 00:00:00.40 SQL> SELECT TO_CHAR(CREATED, 'YYYY-MM-DD'), COUNT(*) 2 FROM T_DATE 3 GROUP BY TO_CHAR(CREATED, 'YYYY-MM-DD'); TO_CHAR(CR COUNT(*) ---------- ---------- 2012-01-07 3600 2012-01-08 3750 2012-01-09 4650 2012-01-06 987925 2012-01-10 75 Elapsed: 00:00:00.39 SQL> SELECT TO_CHAR(CREATED, 'YYYY-MM-DD'), COUNT(*) 2 FROM T_DATE 3 GROUP BY TO_CHAR(CREATED, 'YYYY-MM-DD'); TO_CHAR(CR COUNT(*) ---------- ---------- 2012-01-07 3600 2012-01-08 3750 2012-01-09 4650 2012-01-06 987925 2012-01-10 75 Elapsed: 00:00:00.44 SQL> SELECT TO_CHAR(TRUNC(CREATED), 'YYYY-MM-DD'), COUNT(*) 2 FROM T_DATE 3 GROUP BY TRUNC(CREATED); TO_CHAR(TR COUNT(*) ---------- ---------- 2012-01-06 987925 2012-01-10 75 2012-01-08 3750 2012-01-07 3600 2012-01-09 4650 Elapsed: 00:00:00.36 SQL> SELECT TO_CHAR(TRUNC(CREATED), 'YYYY-MM-DD'), COUNT(*) 2 FROM T_DATE 3 GROUP BY TRUNC(CREATED); TO_CHAR(TR COUNT(*) ---------- ---------- 2012-01-10 75 2012-01-07 3600 2012-01-09 4650 2012-01-06 987925 2012-01-08 3750 Elapsed: 00:00:00.35 SQL> SELECT TO_CHAR(TRUNC(CREATED), 'YYYY-MM-DD'), COUNT(*) 2 FROM T_DATE 3 GROUP BY TRUNC(CREATED); TO_CHAR(TR COUNT(*) ---------- ---------- 2012-01-10 75 2012-01-07 3600 2012-01-09 4650 2012-01-06 987925 2012-01-08 3750 Elapsed: 00:00:00.36 SQL> SELECT TO_CHAR(TRUNC(CREATED), 'YYYY-MM-DD'), COUNT(*) 2 FROM T_DATE 3 GROUP BY TRUNC(CREATED); TO_CHAR(TR COUNT(*) ---------- ---------- 2012-01-10 75 2012-01-07 3600 2012-01-09 4650 2012-01-06 987925 2012-01-08 3750 Elapsed: 00:00:00.34 |
如果仅从执行计划和逻辑读上进行分析,两个SQL没有任何区别:
SQL> SET autot ON SQL> SELECT TO_CHAR(CREATED, 'YYYY-MM-DD'), COUNT(*) 2 FROM T_DATE 3 GROUP BY TO_CHAR(CREATED, 'YYYY-MM-DD'); TO_CHAR(CR COUNT(*) ---------- ---------- 2012-01-07 3600 2012-01-08 3750 2012-01-09 4650 2012-01-06 987925 2012-01-10 75 Elapsed: 00:00:00.43 Execution Plan ---------------------------------------------------------- Plan hash VALUE: 534547868 ----------------------------------------------------------------------------- | Id | Operation | Name | ROWS | Bytes | Cost (%CPU)| TIME | ----------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1294K| 11M| 726 (6)| 00:00:09 | | 1 | HASH GROUP BY | | 1294K| 11M| 726 (6)| 00:00:09 | | 2 | TABLE ACCESS FULL| T_DATE | 1294K| 11M| 694 (1)| 00:00:09 | ----------------------------------------------------------------------------- Note ----- - dynamic sampling used FOR this statement (level=2) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 2490 consistent gets 2487 physical reads 0 redo SIZE 754 bytes sent via SQL*Net TO client 524 bytes received via SQL*Net FROM client 2 SQL*Net roundtrips TO/FROM client 0 sorts (memory) 0 sorts (disk) 5 ROWS processed SQL> SELECT TO_CHAR(TRUNC(CREATED), 'YYYY-MM-DD'), COUNT(*) 2 FROM T_DATE 3 GROUP BY TRUNC(CREATED); TO_CHAR(TR COUNT(*) ---------- ---------- 2012-01-10 75 2012-01-07 3600 2012-01-09 4650 2012-01-06 987925 2012-01-08 3750 Elapsed: 00:00:00.34 Execution Plan ---------------------------------------------------------- Plan hash VALUE: 534547868 ----------------------------------------------------------------------------- | Id | Operation | Name | ROWS | Bytes | Cost (%CPU)| TIME | ----------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1294K| 11M| 726 (6)| 00:00:09 | | 1 | HASH GROUP BY | | 1294K| 11M| 726 (6)| 00:00:09 | | 2 | TABLE ACCESS FULL| T_DATE | 1294K| 11M| 694 (1)| 00:00:09 | ----------------------------------------------------------------------------- Note ----- - dynamic sampling used FOR this statement (level=2) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 2490 consistent gets 2487 physical reads 0 redo SIZE 761 bytes sent via SQL*Net TO client 524 bytes received via SQL*Net FROM client 2 SQL*Net roundtrips TO/FROM client 0 sorts (memory) 0 sorts (disk) 5 ROWS processed |
但是观察两个SQL的平均执行时间,会发现使用TRUNC方式比TO_CHAR有1/8的性能提升,对于执行计划完全相同的情况而言,这个比率已经很高了。
其实导致问题的原因在于DATE类型的存储,DATE由7个字节组成,分别为世纪、年、月、日、时、分、秒。对于TRUNC函数而言,只是简单的舍弃掉后面三个字节,因此效率最高,而TO_CHAR需要将内部的存储格式转化为字符格式,显然会消耗更多的资源。
两个SQL返回结果顺序的不同也说明了这一点,TRUNC函数进行HASH GROUP的是日期格式,而TO_CHAR函数进行HASH GROUP的是字符类型,导致了最终结果返回顺序的差异性。