Archive

Posts Tagged ‘dbms_stats’

Gather Statistics with DBMS_STATS by Jeff

February 20th, 2009 1 comment


Contents

  1. Overview
  2. Missing statistics
  3. Analyze vs. DBMS_STATS
  4. What gets collected?
  5. Where are the statistics stored?
  6. Compute statistics vs. Estimate statistics
  7. DBMS_STATS functions and variable definitions
  8. DBMS_STATS in action (Examples)
  9. Automated table monitoring and stale statistics gathering example
  10. How to determine if dictionary statistics are RDBMS-generated or user-defined

Overview

Oracle’s cost-based optimizer (COB) uses statistics to calculate the selectivity (the fraction of rows in a table that the SQL statement’s predicate chooses) of predicates and to estimate the “cost” of each execution plan. The COB will use the selectivity of a predicate to estimate the cost of a particular access method and to determin the optimal join order. Statistics are used to quantify the data distribution and storage characteristics of tables, columns, indexes and partitions. The COB uses these statistics to estimate how much I/O and memory are required to execute a SQL statement using a particular execution plan. Statistics are stored in the data dictionary, and they can be exported from one database and imported into another. Situations in where you would want to perform this, might be to transfer production statistics to a test system to simulate the real environment, even though the test system may only have small samples of the data.

In order to give the Oracle cost-based optimizer the most up-to-date information about schema objects (and the best chance for choosing a good execution plan) all application tables and indexes to be accessed must be analyzed. New statistics should be gathered on schema objects that are out of date. After loading or deleting large amounts of data would obviously change the number of rows. Other changes like updating a large amount of rows would not effect the number of rows, but may effect the average row length.

Statistics can be generated with the ANALYZE statement or with the package DBMS_STATS (introduced in Oracle8i). The DBMS_STATS package is great for DBA’s in managing database statistics only for use by the COB. The package itself allows the DBA to create, modify, view and delete statistics from a standard, well-defined set of package procedures. The statistics can be gathered on tables, indexes, columns, partitions and schemas, but note that it does not generate statistics for clusters.

DBMS_STATS provides a mechanism for you to view and modify optimizer statistics gathered for database objects.The statistics can reside in two different locations:

Only statistics stored in the dictionary itself have an impact on the cost-based optimizer.

When you generate statistics for a table, column, or index, if the data dictionary already contains statistics for the object, then Oracle updates the existing statistics. Oracle also invalidates any currently parsed SQL statements that access the object.

The next time such a statement executes, the optimizer automatically chooses a new execution plan based on the new statistics. Distributed statements issued on remote databases that access the analyzed objects use the new statistics the next time Oracle parses them.

When you associate a statistics type with a column or domain index, Oracle calls the statistics collection method in the statistics type if you analyze the column or domain index.

Missing statistics

When statistics do not exist on schema objects, the optimizer uses the following default values.

Tables
Statistic Default Value Used by Optimizer
Cardinality 100 rows
Avg. row len 20 bytes
No. of blocks 100
Remote cardinality 2000 rows
Remote average row length 100 bytes
Indexes
Statistic Default Value Used by Optimizer
Levels 1
Leaf blocks 25
Leaf blocks/key 1
Data blocks/key 1
Distinct keys 100
Clustering factor 800 (8*no. of blocks)

Analyze vs. DBMS_STATS

The following is a quick overview of the two.

What gets collected?

Table Statistics

Oracle collects the following statistics for a table. Statistics marked with an asterisk are always computed exactly. Table statistics, including the status of domain indexes, appear in the data dictionary views USER_TABLES, ALL_TABLES, and DBA_TABLES in the columns shown in parentheses.

Index Statistics

Oracle collects the following statistics for an index. Statistics marked with an asterisk are always computed exactly. For conventional indexes, the statistics appear in the data dictionary views USER_INDEXES, ALL_INDEXES, and DBA_INDEXES in the columns in parentheses.

Where are the statistics stored?

Statistics are stored into the Oracle Data Dictionary, in tables owned by SYS. Views are created on these tables to retrieve data more easily. These views are prefixed with DBA_ or ALL_ or USER_. For ease of reading, we will use DBA_% views, but ALL_% views or USER_% views could be used as well.

Conventions Used

- Statistics available only since 8.0.X rdbms release         : (*)
- Statistics available only since 8.1.X rdbms release         : (**)
- Statistics not available at partition or subpartition level : (G)
- Statistics not available at subpartition level              : (GP)

Table level statistics can be retrieved from:

  • DBA_ALL_TABLES – (8.X onwards)
  • DBA_OBJECT_TABLES – (8.X onwards
  • DBA_TABLES – (all versions)
  • DBA_TAB_PARTITIONS – (8.X onwards)
  • DBA_TAB_SUBPARTITIONS – (8.1 onwards) Columns to look at are:
      NUM_ROWS                         : Number of rows (always exact even when computed
                                         with ESTIMATE method)
      BLOCKS                           : Number of blocks which have been used even
                                         if they are empty due to delete statements
      EMPTY_BLOCKS                     : Number of empty blocks (these blocks have
                                         never been used)
      AVG_SPACE                        : Average amount of FREE space in bytes in blocks
                                         allocated to the table : Blocks + Empty Blocks
      CHAIN_CNT                        : Number of chained or migrated rows
      AVG_ROW_LEN                      : Average length of rows in bytes
      AVG_SPACE_FREELIST_BLOCKS (*)(G) : Average free space of blocks in the freelist
      NUM_FREELIST_BLOCKS       (*)(G) : Number of blocks in the freelist
      SAMPLE_SIZE                      : Sample defined in ESTIMATE method (0 if COMPUTE)
      LAST_ANALYZED                    : Timestamp of last analysis
      GLOBAL_STATS             (**)    : For partitioned tables, YES means statistics
                                         are collected for the TABLE as a whole
                                         NO means statistics are estimated from statistics
                                         on underlying table partitions or subpartitions 
      USER_STATS               (**)    : YES if statistics entered directly by the user

  • Index level statistics can be retrieved from:

  • DBA_INDEXES – (all versions )
  • DBA_IND_PARTITIONS – (8.X onwards)
  • DBA_IND_SUBPARTITIONS – (8.1 onwards ) Columns to look at are:
      BLEVEL                       : B*Tree level : depth of the index from its root
                                     block to its leaf blocks
      LEAF_BLOCKS                  : Number of leaf blocks
      DISTINCT_KEYS                : Number of distinct keys
      AVG_LEAF_BLOCKS_PER_KEY      : Average number of leaf blocks in which each
                                     distinct key appears (1 for a UNIQUE index)
      AVG_DATA_BLOCKS_PER_KEY      : Average number of data blocks in the table that
                                     are pointed to by a distinct key
      CLUSTERING_FACTOR            : - if near the number of blocks, then the table is
                                       ordered : index entries in a single leaf block
                                       tend to point to rows in same data block
                                     - if near the number of rows, the table is
                                       randomly ordered : index entries in a single
                                       leaf block are unlikely to point to rows in
                                       same data block
      SAMPLE_SIZE                  : Sample defined in ESTIMATE method (0 if COMPUTE)
      LAST_ANALYZED                : Timestamp of last analysis
      GLOBAL_STATS            (**) : For partitioned indexes, YES means statistics
                                     are collected for the INDEX as a whole
                                     NO means statistics are estimated from statistics
                                     on underlying index partitions or subpartitions
      USER_STATS              (**) : YES if statistics entered directly by the user
      PCT_DIRECT_ACCESS   (**)(GP) : For secondary indexes on IOTs, percentage of
                                     rows with VALID guess

  • Column level statistics can be retrieved from:

  • DBA_TAB_COLUMNS – (all versions)
  • DBA_TAB_COL_STATISTICS – (Version 8.X onwards)
  • DBA_PART_COL_STATISTICS – (Version 8.X onwards)
  • DBA_SUBPART_COL_STATISTICS – (Version 8.1 onwards) The last three views extract statistics data from DBA_TAB_COLUMNS. Columns to look at are:
      NUM_DISTINCT                 : Number of distinct values
      LOW_VALUE                    : Lowest value
      LOW_VALUE                    : Highest value
      DENSITY                      : Density
      NUM_NULLS                    : Number of columns having a NULL value
      AVG_COL_LEN                  : Average length in bytes
      NUM_BUCKETS                  : Number of buckets in histogram for the column
      SAMPLE_SIZE                  : Sample defined in ESTIMATE method (0 if COMPUTE)
      LAST_ANALYZED                : Timestamp of last analysis
      (**)GLOBAL_STATS             : For partitioned tables, YES means statistics
                                     are collected for the TABLE as a whole
                                     NO means statistics are estimated from statistics
                                     on underlying table partitions or subpartitions
      (**)USER_STATS               : YES if statistics entered directly by the user
    
    

  • Compute statistics vs. Estimate statistics

    Both computed and estimated statistics are used by the Oracle optimizer to choose the execution plan for SQL statements that access analyzed objects. These statistics may also be useful to application developers who write such statements. COMPUTE STATISTICS

    COMPUTE STATISTICS instructs Oracle to compute exact statistics about the analyzed object and store them in the data dictionary. When computing statistics, an entire object is scanned to gather data about the object. This data is used by Oracle to compute exact statistics about the object. Slight variances throughout the object are accounted for in these computed statistics. Because an entire object is scanned to gather information for computed statistics, the larger the size of an object, the more work that is required to gather the necessary information.

    To perform an exact computation, Oracle requires enough space to perform a scan and sort of the table. If there is not enough space in memory, then temporary space may be required. For estimations, Oracle requires enough space to perform a scan and sort of only the rows in the requested sample of the table. For indexes, computation does not take up as much time or space, so it is best to perform a full computation.

    Some statistics are always computed exactly, such as the number of data blocks currently containing data in a table or the depth of an index from its root block to its leaf blocks. 某些信息总被精确地统计,如表当前使用的block数、index的高度

    Use estimation for tables and clusters rather than computation, unless you need exact values. Because estimation rarely sorts, it is often much faster than computation, especially for large tables.

    ESTIMATE STATISTICS

    ESTIMATE STATISTICS instructs Oracle to estimate statistics about the analyzed object and stores them in the data dictionary. When estimating statistics, Oracle gathers representative information from portions of an object. This subset of information provides reasonable, estimated statistics about the object. The accuracy of estimated statistics depends upon how representative the sampling used by Oracle is. Only parts of an object are scanned to gather information for estimated statistics, so an object can be analyzed quickly. You can optionally specify the number or percentage of rows that Oracle should use in making the estimate.

    To estimate statistics, Oracle selects a random sample of data. You can specify the sampling percentage and whether sampling should be based on rows or blocks.

    estimate statistics会随机选择样本大小。你可以定义基于row数或block数样本的大小

    Notes on estimating statistics

    DBMS_STATS functions and variable definitions

    Most of the DBMS_STATS procedures include the three parameters statown, stattab, and statid. These parameters allow you to store statistics in your own tables (outside of the dictionary), which does not affect the optimizer. Therefore, you can maintain and experiment with sets of statistics. The stattab parameter specifies the name of a table in which to hold statistics, and it is assumed that it resides in the same schema as the object for which statistics are collected (unless the statown parameter is specified). Users may create multiple tables with different stattab identifiers to hold separate sets of statistics.

    Additionally, users can maintain different sets of statistics within a single stattab by using the statid parameter, which can help avoid cluttering the user’s schema.

    For all of the SET or GET procedures, if stattab is not provided (i.e., NULL), then the operation works directly on the dictionary statistics; therefore, users do not need to create these statistics tables if they only plan to modify the dictionary directly. However, if stattab is not NULL, then the SET or GET operation works on the specified user statistics table, and not the dictionary.

    Create Stats Table

    DBMS_STATS.CREATE_STAT_TABLE (
      ownname  VARCHAR2,
      stattab  VARCHAR2,
      tblspace VARCHAR2 DEFAULT NULL);

    Drop Stats Table

    DBMS_STATS.drop_stat_table (
      ownname VARCHAR2,
      stattab VARCHAR2);

    Gather Schema Stats

    DBMS_STATS.gather_schema_stats (
      ownname          VARCHAR2,
      estimate_percent NUMBER   DEFAULT NULL,
      block_sample     BOOLEAN  DEFAULT FALSE,
      method_opt       VARCHAR2 DEFAULT 'FOR ALL COLUMNS SIZE 1',
      degree           NUMBER   DEFAULT NULL,
      granularity      VARCHAR2 DEFAULT 'DEFAULT',
      cascade          BOOLEAN  DEFAULT FALSE,
      stattab          VARCHAR2 DEFAULT NULL,
      statid           VARCHAR2 DEFAULT NULL,
      options          VARCHAR2 DEFAULT 'GATHER',
      objlist     OUT  ObjectTab,
      statown          VARCHAR2 DEFAULT NULL);

    Export Schema Stats

    DBMS_STATS.export_schema_stats (
      ownname VARCHAR2,
      stattab VARCHAR2,
      statid  VARCHAR2 DEFAULT NULL,
      statown VARCHAR2 DEFAULT NULL);

    Import Schema Stats

    DBMS_STATS.import_schema_stats (
      ownname VARCHAR2,
      stattab VARCHAR2,
      statid  VARCHAR2 DEFAULT NULL,
      statown VARCHAR2 DEFAULT NULL);

    Delete Schema Stats

    DBMS_STATS.delete_schema_stats (
      ownname VARCHAR2,
      stattab VARCHAR2 DEFAULT NULL,
      statid  VARCHAR2 DEFAULT NULL,
      statown VARCHAR2 DEFAULT NULL);

    Set Table Stats 手工设置统计信息

    DBMS_STATS.set_table_stats (
      ownname  VARCHAR2,
      tabname  VARCHAR2,
      partname VARCHAR2 DEFAULT NULL,
      stattab  VARCHAR2 DEFAULT NULL,
      statid   VARCHAR2 DEFAULT NULL,
      numrows  NUMBER   DEFAULT NULL,
      numblks  NUMBER   DEFAULT NULL,
      avgrlen  NUMBER   DEFAULT NULL,
      flags    NUMBER   DEFAULT NULL,
      statown  VARCHAR2 DEFAULT NULL);

    Get Table Stats

    DBMS_STATS.get_table_stats (
      ownname     VARCHAR2,
      tabname     VARCHAR2,
      partname    VARCHAR2 DEFAULT NULL,
      stattab     VARCHAR2 DEFAULT NULL,
      statid      VARCHAR2 DEFAULT NULL,
      numrows OUT NUMBER,
      numblks OUT NUMBER,
      avgrlen OUT NUMBER,
      statown     VARCHAR2 DEFAULT NULL);

    Get Index Stats

    DBMS_STATS.GET_INDEX_STATS (
      ownname      VARCHAR2,
      indname      VARCHAR2,
      partname     VARCHAR2 DEFAULT NULL,
      stattab      VARCHAR2 DEFAULT NULL,
      statid       VARCHAR2 DEFAULT NULL,
      numrows  OUT NUMBER,
      numlblks OUT NUMBER,
      numdist  OUT NUMBER,
      avglblk  OUT NUMBER,
      avgdblk  OUT NUMBER,
      clstfct  OUT NUMBER,
      indlevel OUT NUMBER,
      statown      VARCHAR2 DEFAULT NULL);

    DBMS_STATS in action (Examples)

    Create Stats Table

    BEGIN
      DBMS_STATS.create_stat_table (
        ownname  => 'scott',
        stattab  => 'stats_table',
        tblspace => 'users');
    END;
    /

    Drop Stats Table

    BEGIN
      DBMS_STATS.drop_stat_table (
        ownname => 'scott',
        stattab => 'stats_table');
    END;
    /

    Gather Schema Stats to Data Dictionary

    BEGIN
      DBMS_STATS.gather_schema_stats (
        ownname          => 'scott',
        estimate_percent => null,
        block_sample     => false,
        method_opt       => 'FOR ALL COLUMNS SIZE 1',
        degree           => null,
        granularity      => 'ALL',
        cascade          => true,
        options          => 'GATHER');
    END;
    /

    Gather Schema Stats to Stats Table

    BEGIN
      DBMS_STATS.gather_schema_stats (
        ownname          => 'scott',
        estimate_percent => null,
        block_sample     => false,
        method_opt       => 'FOR ALL COLUMNS SIZE 1',
        degree           => null,
        granularity      => 'ALL',
        cascade          => true,
        stattab          => 'stats_table',
        statid           => 'TEST1',
        options          => 'GATHER',
        statown          => 'scott');
    END;
    /

    Export Schema Statistics from Data Dictionary to Stats Table

    BEGIN
      DBMS_STATS.export_schema_stats (
        ownname  => 'scott',
        stattab  => 'stats_table_backup',
        statid   => 'BACKUP_TEST1',
        statown  => 'scott');
    END;
    /

    Import Schema Statistics from Data Dictionary to Stats Table

    BEGIN
      DBMS_STATS.import_schema_stats (
        ownname  => 'scott',
        stattab  => 'stats_table',
        statid   => 'TEST1',
        statown  => 'scott');
    END;
    /

    Delete Schema Stats from Data Dictionary

    BEGIN
      DBMS_STATS.delete_schema_stats ('scott');
    END;
    /

    Delete Schema Stats from Stats Table

    BEGIN
      DBMS_STATS.delete_schema_stats (
        ownname  => 'scott',
        stattab  => 'stats_table_backup',
        statid   => 'BACKUP_TEST1',
        statown  => 'scott');
    END;
    /

    Set Table Stats (Manually) in the Data Dictionary

    BEGIN
      DBMS_STATS.set_table_stats (
        ownname  => 'scott',
        tabname  => 'emp',
        partname => null,
        numrows  => 650000000,
        numblks  => 53455443,
        avgrlen  => 212,
        flags    => null);
    END;
    /

    Set Table Stats (Manually) in the Stats Table

    BEGIN
      DBMS_STATS.set_table_stats (
        ownname  => 'scott',
        tabname  => 'emp',
        partname => null,
        stattab  => 'stats_table',
        statid   => 'TEST1',
        numrows  => 650000000,
        numblks  => 53455443,
        avgrlen  => 212,
        flags    => null,
        statown  => 'scott');
    END;
    /

    Get Table Statistics in SQL*Plus Variables

    SQL> variable NUMROWS number
    SQL> variable NUMBLKS number
    SQL> variable AVGRLEN number
    
    BEGIN
      DBMS_STATS.get_table_stats (
        'scott',
        'emp',
        NUMROWS=>:numrows,
        NUMBLKS=>:numblks,
        AVGRLEN=>:avgrlen);
    END;
    /
    
    PL/SQL procedure successfully completed.
    
    SQL> print NUMROWS NUMBLKS AVGRLEN
    
       NUMROWS
    ----------
          1000
    
       NUMBLKS
    ----------
            28
    
       AVGRLEN
    ----------
            92

    Get Index Statistics in SQL*Plus Variables

    SQL> variable NUMROWS number
    SQL> variable NUMLBLKS number
    SQL> variable NUMDIST number
    SQL> variable AVGLBLK number
    SQL> variable AVGDBLK number
    SQL> variable CLSTFCT number
    SQL> variable INDLEVEL number
    
    BEGIN
      DBMS_STATS.get_index_stats (
        'SCOTT',
        'EMP_PK',
        NUMROWS   => :NUMROWS,
        NUMLBLKS  => :NUMLBLKS,
        NUMDIST   => :NUMDIST,
        AVGLBLK   => :AVGLBLK,
        AVGDBLK   => :AVGDBLK,
        CLSTFCT   => :CLSTFCT,
        INDLEVEL  => :INDLEVEL);
     END;
    /
    
    PL/SQL procedure successfully completed.
    
    SQL> print NUMROWS NUMLBLKS NUMDIST AVGLBLK AVGDBLK CLSTFCT INDLEVEL
    
       NUMROWS
    ----------
          1000
    
      NUMLBLKS
    ----------
             3
    
       NUMDIST
    ----------
          1000
    
       AVGLBLK
    ----------
             1
    
       AVGDBLK
    ----------
             1
    
       CLSTFCT
    ----------
            15
    
      INDLEVEL
    ----------
             1

    Automated table monitoring and stale statistics gathering example

    You can automatically gather statistics or create lists of tables that have stale or no statistics.

    To automatically gather statistics, run the DBMS_STATS.GATHER_SCHEMA_STATS and DBMS_STATS.GATHER_DATABASE_STATS procedures with the OPTIONS and objlist parameters. Use the following values for the options parameter:

    The objlist parameter identifies an output parameter for the LIST STALE and LIST EMPTY options. The objlist parameter is of type DBMS_STATS.OBJECTTAB.

    Step 1 : Perform a quick analyze to load in base statistics

    BEGIN
    DBMS_STATS.GATHER_SCHEMA_STATS (
      ownname           => 'scott',
      estimate_percent  => null,              -- Small table, lets compute
      block_sample      => false,
      method_opt        => 'FOR ALL COLUMNS',
      degree            => null,              -- No parallelism used in this example
      granularity       => 'ALL',
      cascade           => true,              -- Make sure we include indexes
      options           => 'GATHER'           -- Gather mode
      );
    END;
    /
    
    PL/SQL procedure successfully completed.

    Step 2 : Examine the current statistics

    SELECT table_name, num_rows, blocks, avg_row_len
    FROM user_tables
    WHERE table_name='EMP';
    
    TABLE_NAME                       NUM_ROWS     BLOCKS AVG_ROW_LEN
    ------------------------------ ---------- ---------- -----------
    EMP                                  1500         28          92

    Step 3 : Turn on Automatic Monitoring

    Now turn on automatic monitoring for the emp table. This can be done using the alter table method. Starting with Oracle 9i, you can also perform this at the “schema”, and “entire database” level. I provide the syntax for all three methods below. Monitor only the EMP table.

    alter table emp monitoring;
    
    Table altered.

    Monitor all of the tables within Scott’s schema. (Oracle 9i and higher)

    BEGIN
      DBMS_STATS.alter_schema_tab_monitoring('scott', true);
    END;
    /
    
    PL/SQL procedure successfully completed.

    Monitor all of the tables within the database. (Oracle 9i and higher) Note: Although the option to collect statistics for SYS tables is available via ALTER_DATABASE_TAB_MONITORING, Oracle continues to recommend against this practice until the next major release after 9i Release 2. Also note that the ALTER_DATABASE_TAB_MONITORING procedure in the DBMS_STATS package only monitors tables; there is an ALTER INDEX…MONITORING statement which can be used to monitor indexes. Thanks to Nabil Nawaz for providing this and pointing out an error I made in the previous version of this article.

    BEGIN
      DBMS_STATS.alter_database_tab_monitoring (
        monitoring => true,
        sysobjs    => false);      -- Don't set to true, see note above.
    END;
    /
    
    PL/SQL procedure successfully completed.

    Step 4 : Verify that monitoring is turned on.

    Note: The results of the following query are from running the alter table ... statement on the emp table only.

    SELECT table_name, monitoring
    FROM user_tables
    ORDER BY monitoring;
    
    TABLE_NAME                     MONITORING
    ------------------------------ ----------
    DEPT                           NO
    EMP                            YES

    Step 5 : Delete some rows from the database.

    SQL> DELETE FROM emp WHERE rownum < 501;
    
    500 rows deleted.
    
    SQL> commit;
    
    Commit complete.

    Step 6 : Wait until the monitered data is flushed.

    Data can be flushed in several ways.

    OK, I’m impatient…

    exec dbms_stats.flush_database_monitoring_info;
    
    PL/SQL procedure successfully completed.

    Step 7 : Check for what it has collected.

    As user “scott”, check USER_TAB_MODIFICATIONS to see what it was collected.

    SELECT * FROM user_tab_modifications;
    
    TABLE_NAME PARTITION_NAME SUBPARTITION_NAME INSERTS UPDATES DELETES TIMESTAMP TRUNCATED
    ---------- -------------- ----------------- ------- ------- ------- --------- ---------
    EMP                                               0       0     500 18-SEP-02 NO

    Step 8 : Execute DBMS_STATS to gather stats on all “stale” tables.

    BEGIN
      DBMS_STATS.GATHER_SCHEMA_STATS(
        ownname           => 'scott',
        estimate_percent  => null,
        block_sample      => false,
        method_opt        => 'FOR ALL COLUMNS',
        degree            => null,
        granularity       => 'ALL',
        cascade           => true,
        options           => 'GATHER STALE');
    END;
    /
    
    PL/SQL procedure successfully completed.

    Step 9 : Verify that the table is no longer listed in USER_TAB_MODIFICATIONS.

    SQL> SELECT * FROM user_tab_modifications;
    
    no rows selected.

    Step 10 : Examine some of new statistics collected.

    SELECT table_name, num_rows, blocks, avg_row_len
    FROM user_tables where table_name='EMP';
    
    TABLE_NAME                       NUM_ROWS     BLOCKS AVG_ROW_LEN
    ------------------------------ ---------- ---------- -----------
    EMP                                  1000         28          92

    How to determine if dictionary statistics are RDBMS-generated or user-defined

    The following section explains how to determine if your dictionary statistics are RDBMS-generated or set by users through one of the DBMS_STATS.SET_xx_STATS procedures. This is crucial for development environments that are testing the performance of SQL statements with various sets of statistics. The DBA will need to know if the relying statistics are RDBMS-defined or user-defined.

    RDBMS-generated statistics are generated by the following:

    User generated statistics are only done through the use of the DBMS_STATS.SET_xx_STATS procedures The column USER_STATS from DBA_TABLES, ALL_TABLES, USER_TABLES displays:

    Categories: dbms_stats, oracle Tags:

    Automatic Statistics Gathering scheduler job

    February 20th, 2009 No comments

    HTML clipboard .style3 { color: #0000FF; } .style4 { color: #FF0000; }在Oracle10g中引入的优化器统计信息(Optimizer Statistics)自动收集,是一个看上去很不错的功能,但是在实际应用中却往往没有起到相应的效果,甚至在某些系统中我们会建议禁用这个功能。

    阐述一些该功能的相关知识点。

    1. Automatic Statistics Gathering是由Scheduler调度GATHER_STATS_JOB作业来完成的,在GATHER_STATS_JOB作业中则调用 DBMS_STATS.GATHER_DATABASE_STATS_JOB_PROC存储过程。

    2. 该作业在创建数据库的自动创建,并且设置为每天晚上10点到第二天早上6点和周六周日的全天为运行窗口期。在运行窗口期内,该作业都会运行,根据 stop_on_window_close属性来决定,如在窗口期结束以后,该作业如果还没有运行完毕,是继续运行还是结束运行。

    3. GATHER_DATABASE_STATS_JOB_PROC是内部的存储过程,基本上跟DBMS_STATS.GATHER_DATABASE_STATS的功能一样,但是有内部的优先顺序考虑,更新越多的表将会越优先收集统计信息

    4. 收集统计信息的表对象是,之前从来没有收集过的或者是更新的(包括insert,update,delete,truncate)记录数超过当前总记录数 10%的表。记录数的更改量由Oracle数据库自动监控,在初始化参数statistics_level设置为TYPICAL或者ALL时,自动监控即会生效。

    5. 在USER_TAB_MODIFICATIONS表中记录了所有被监控的表的数据量更改信息。该信息的更新将会稍微滞后于真实的修改,可以通过 DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO存储过程来立刻将更改的信息更新到 USER_TAB_MODIFICATIONS表中。对于更新之后再rollback的记录,仍然算为已经受影响的记录,Oracle不会在 rollback之后再去更新USER_TAB_MODIFICATIONS表。

    1. SQL> select * from user_tab_modifications where table_name=EMP;
    2. no rows selected
    3. SQL> select count(*) from emp;
    4. COUNT(*)
    5. ——–
    6. 14
    7. SQL> update emp set sal=sal+100;
    8. 14 rows updated.
    9. SQL> select * from user_tab_modifications where table_name=EMP;
    10. no rows selected
    11. SQL> exec DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO();
    12. PL/SQL procedure successfully completed.
    13. SQL> select inserts,updates,deletes from user_tab_modifications where table_name
    14. =EMP;
    15. INSERTS UPDATES DELETES
    16. ——–– ———- ———-
    17. 0 14 0
    18. SQL> rollback;
    19. Rollback complete.
    20. SQL> exec DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO();
    21. PL/SQL procedure successfully completed.
    22. SQL> select inserts,updates,deletes from user_tab_modifications where table_name
    23. =EMP;
    24. INSERTS UPDATES DELETES
    25. ——–– ———- ———-
    26. 0 14 0
    27. SQL>

    6. 在Oracle10g版本(包括最新的10.2.0.4)中没有已知的修改10%这个阀值的方法。但是在Oracle11g中则提供了SET_TABLE_PREFS等函数。

    以下命令将指定表的STALE默认值从10%改为5%,该值可以从新的dba_tab_stat_prefs数据字典中查询获得。

    1. –仅限于Oracle11g版本
    2. BEGIN
    3. DBMS_STATS.SET_TABLE_PREFS ( ownname =>KAMUS, tabname =>T1, pname =>STALE_PERCENT, pvalue =>5);
    4. END;
    5. /
    6. SQL> select * from dba_tab_stat_prefs;
    7. OWNER TABLE_NAME PREFERENCE_NAME PREFE
    8. ——–– ———- ——————– —–
    9. KAMUS T1 STALE_PERCENT 5

    7. 运行以下命令,可以禁用统计信息自动收集功能。

    1. BEGIN
    2. DBMS_SCHEDULER.DISABLE(GATHER_STATS_JOB);
    3. END;
    4. /
    Categories: dbms_stats, oracle Tags:

    10g dbms_stats

    February 20th, 2009 No comments

    1. 使用取样来做statistics的收集

    如果没有设置取样,那么收集统计数据需要全表扫描和整个数据库表的排序。取样最小化了收集统计数据的所需资源。

    Estimate_percent参数用来设置取样值。

    Oracle公司建议设置estimate_percent参数为dbms_stats.auto_sample_size来最大化性能并获得所需的统计数据的准确度。

    Auto_sample_sizeoracle判断用于好的统计数据所需的最好的取样尺寸,根据对象的统计数据的属性。

    例如

    EXECUTE DBMS_STATS.GATHER_SCHEMA_STATS('OE',DBMS_STATS.AUTO_SAMPLE_SIZE);

    2. 并发统计数据的收集

    使用DEGREE来指定dbms_stats的并发度。并发统计数据的收集可以和sample一同使用。Oracle建议设置DEGREE参数为dbms_stats.auto_degree

    这就是让oracle根据对象的尺寸和并发有关的init.ora参数设置来选择一个合适的并发度。


    3.在分区对象上的统计数据

    对于分区表和分区索引,DBMS_STATS能为每个分区收集分别的统计数据,能为整个表或索引收集全局统计数据。被收集的分区统计数据的类型在GRANULARITY参数来指定。


    4Column统计数据和直方图

    当在表上收集统计数据的时候,DBMS_STATS收集表的字段的数据分布的信息。

    关于数据分布的最基本的信息就是字段中的最大值和最小值。


    但是,如果字段中的数据是skewed,统计数据的level对于优化器的需求就不足了。对于skewed数据分布,直方图也能被创建,用来作为column统计数据来描述指定字段的数据分布。

    直方图使用method_opt来指定。Oracle建议设定method_opfor all columns size auto。有了这设定,oracle自动判断哪一个字段需要直方图,每个直方图的bucket的数量。


    5.判断陈旧的统计数据

    为了判断数据库对象是否需要新的统计数据,oracle提供一个表监控工具。当statistics_level=typicalall时,监控被开启。监控追踪表自从上次统计数据被收集以来,表被insert,update,delete的近似值,还有表是否被truncate。表的变化信息可以查看user_tab_modifications视图。随着数据修改,当oracle移植信息到视图的时候,可能会有几分钟的迟延。DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO过程来立即反映在内存中显著的监控信息。

    options参数被设置为gather stalegather auto Gather_database_statsgather_schema_stats过程使用旧的统计数据来收集新的统计数据。如果监控表的已经被修改了超过10%,那么这些统计数据被认为是旧的,需要再次被收集。


    6.用户自定义统计数据

    你能创建用户定义的优化器统计数据来支持用户定义的索引和函数。当你把统计数据类型和columndomain索引联系起来时,oracle 以统计数据类型来call统计数据收集方法,无论何时统计数据被收集。

    你在创建一个基于函数的索引之后,收集表上的新字段的统计数据,允许oracle来为表达式收集字段统计数据相等信息。通过带有method_opt=’for all hidden columns’call收集统计数据的过程

    Videos, Slideshows and Podcasts by Cincopa Wordpress Plugin