不同estimate_percent选项下表的分析时间对比

编程入门行业动态更新时间:2024-10-28 00:17:40

最近遇到一个问题，表的统计信息收集时间过长导致后续计算等待或者推迟，通过将estimate_percent => null改为estimate_percent => dbms_stats.auto_sample_size，表的统计信息收集时间大大缩短，问题得以解决。

在Oracle的统计信息收集中，estimate_percent选项表示采样行的百分比，取值范围为[0.000001,100]，其中null表示全部分析，不采样。其中DBMS_STATS.AUTO_SAMPLE_SIZE是一种比较新的采样参数，它允许Oracle的dbms_stats在收集统计数据时，自动估计要采样的一个segment的最佳百分比：

estimate_percent => dbms_stats.auto_sample_size

要验证自动统计采样的准确性，可以检查dba_tables sample_size列。一个有趣的地方是，在使用自动采样时，Oracle会为一个样本尺寸选择5到20的百分比。记住，统计数据质量越好，CBO做出的决定越好。

接下来我们做一个测试，比较estimate_percent=null和estimate_percent= dbms_stats.auto_sample_size的差异：

SQL> set timing on

当estimate_percent=null时：

SQL> alter system flush shared_pool;

SQL> exec dbms_stats.gather_table_stats('Stage','TempP4PProduct',cascade=>true,estimate_percent=>null,method_opt=>'for all columns size 1');

SQL> select owner,table_name,tablespace_name,sample_size from all_tables where table_name=upper('TempP4PProduct');

当estimate_percent= dbms_stats.auto_sample_size时：

SQL> alter system flush shared_pool;

SQL> exec dbms_stats.gather_table_stats('Stage','TempP4PProduct',cascade=>true,estimate_percent=>dbms_stats.auto_sample_size,method_opt=>'for all columns size 1');

SQL> select owner,table_name,tablespace_name,sample_size from all_tables where table_name=upper('TempP4PProduct');

通过上面的比较可以看出，当estimate_percent=>null时所花费的统计时间为61.437 seconds，SAMPLE_SIZE为3691265；而当estimate_percent=>dbms_stats.auto_sample_size时所花费的统计时间为9.781 seconds，SAMPLE_SIZE为3691265。两种方式的SAMPLE_SIZE相同，表示它们的统计质量相近，即两者会给予CBO相同的执行计划抉择，但是两者在花费时间上却相差很大，9.781/61.437=0.1592037371，第二种方式较第一种方式能节约大概84%的时间。

最后，附上DBMS_STATS.GATHER_TABLE_STATS的语法供以后查看：

DBMS_STATS.GATHER_TABLE_STATS (

ownname VARCHAR2,

tabname VARCHAR2,

partname VARCHAR2,

estimate_percent NUMBER,

block_sample BOOLEAN,

method_opt VARCHAR2,

degree NUMBER,

granularity VARCHAR2,

cascade BOOLEAN,

stattab VARCHAR2,

statid VARCHAR2,

statown VARCHAR2,

no_invalidate BOOLEAN,

force BOOLEAN);

参数说明:

ownname:要分析表的拥有者

tabname:要分析的表名.

partname:分区的名字,只对分区表或分区索引有用.

estimate_percent:采样行的百分比,取值范围[0.000001,100],null为全部分析,不采样.常量:DBMS_STATS.AUTO_SAMPLE_SIZE是默认值,由oracle决定最佳取采样值.

block_sapmple:是否用块采样代替行采样.

method_opt:决定histograms信息是怎样被统计的.method_opt的取值如下:

for all columns:统计所有列的histograms.

for all indexed columns:统计所有indexed列的histograms.

for all hidden columns:统计你看不到列的histograms

for columns SIZE | REPEAT | AUTO | SKEWONLY:统计指定列的histograms.N的取值范围[1,254]; REPEAT上次统计过的histograms;AUTO由oracle决定N的大小;SKEWONLY multiple end-points with the same value which is what we define by "there is skew in thedata