The Blog from the DBA Classroom

By: Joel Goodman

Archive for July, 2011

Add Bulk to your Parallel Updates

Posted by Joel Goodman on 01/07/2011


 

Oracle Database server 11gR2 added several new Parallel Execution features. I had several questions on a recent Exadata administration course concerning parallelism, and an interesting discussion arose about these new features. Although parallel execution is useful in all Oracle Database Data Warehouse implementations, and often in batch systems, it is particularly relevant to Exadata. In Exadata, “smart” storage operations, which offload filtering, column projections and some join overheads to the cells may be done for:

  • Full Table Scans
  • Fast Full Index Scans

But these “smart” scans, only occur if a “direct path read” access method is done. This occurs in the following situations:

  • On a Parallel Query
  • On a Serial Query – when the size of the table exceeds the “_small_table_threshold” hidden parameter
  • On a Serial Query – when “_serial_direct_read” hidden parameter is set to “always” regardless of table size

Note: both these parameters are session modifiable.

In addition for Exadata, “direct path insert” is required when loading rows into Hybrid Columnar Compressed (HCC) tables, in order to get the HCC compression done into multi-block compression units. Direct path insert may occur when:

  • When inserting in parallel
  • When doing “Create table as select” (CTAS) in parallel
  • When inserting serially using the “append” hint

Since Exadata is used for Data Warehouses, Parallel Execution is likely to be used very often, both for queries of large tables and for bulk inserts and updates of existing tables.

The new features of Parallel Execution in Oracle 11gR2 are:

  1. Automatic Degree of Parallelism – to set Degree of Parallelism automatically for a statement
  2. Parallel Statement Queuing – to queue a statement if inadequate execution slave processes are not available
  3. In Memory Parallel Updates – to utilise the RAC buffer caches for Parallel queries as an alternative to cache fusion overheads
  4. Parallel Bulk Updates – to allow updates to be done in sections which an succeed or fail independently of other sections

In this post I will focus on Parallel Bulk updates which:

  • Are useful for bulk updates of large data volumes
  • Avoids the “All or Nothing” approach of traditional Parallel DML
  • Groups sets of rows into smaller sized chunks
  • Chunk updates run by a scheduler job which commits on completion
  • Chunking is based on one of three possible techniques:
    • Using ROWID ranges of rows in the queried table
    • Ranges of values in a numeric column of  the queried table
    • Results of User provided SQL statements

Parallel Bulk update is done using the DBMS_PARALLEL_EXECUTE package, which permits the chunking of rows using any of the 3 methods listed. Then the updates are done with several benefits:

  • Only one chunk of rows need be locked at one time, rather than having the entire table locked as in normal PDML
  • Chunks commit independently of of one another
  • Failure of one chunk update to commit, does not cause all the updates to roll back
  • Less Undo space is required since the chunks may commit their updates independently
  • Performance may benefit from the combination of reduced simultaneous locking, and reduced undo consumption
  • Avoids having a huge rollback of a nearly complete bulk update due to failure near the end
  • Failed chunks may be retried after correcting the problem causing the failure

Here is the DBMS_PARALLEL_EXECUTE package described in Oracle 11.2.0.2:

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for Linux: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production

SQL>  desc dbms_parallel_execute
PROCEDURE ADM_DROP_CHUNKS
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TASK_OWNER                     VARCHAR2                IN
 TASK_NAME                      VARCHAR2                IN
PROCEDURE ADM_DROP_TASK
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TASK_OWNER                     VARCHAR2                IN
 TASK_NAME                      VARCHAR2                IN
PROCEDURE ADM_STOP_TASK
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TASK_OWNER                     VARCHAR2                IN
 TASK_NAME                      VARCHAR2                IN
FUNCTION ADM_TASK_STATUS RETURNS NUMBER
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TASK_OWNER                     VARCHAR2                IN
 TASK_NAME                      VARCHAR2                IN
PROCEDURE CREATE_CHUNKS_BY_NUMBER_COL
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TASK_NAME                      VARCHAR2                IN
 TABLE_OWNER                    VARCHAR2                IN
 TABLE_NAME                     VARCHAR2                IN
 TABLE_COLUMN                   VARCHAR2                IN
 CHUNK_SIZE                     NUMBER                  IN
PROCEDURE CREATE_CHUNKS_BY_ROWID
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TASK_NAME                      VARCHAR2                IN
 TABLE_OWNER                    VARCHAR2                IN
 TABLE_NAME                     VARCHAR2                IN
 BY_ROW                         BOOLEAN                 IN
 CHUNK_SIZE                     NUMBER                  IN
PROCEDURE CREATE_CHUNKS_BY_SQL
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TASK_NAME                      VARCHAR2                IN
 SQL_STMT                       CLOB                    IN
 BY_ROWID                       BOOLEAN                 IN
PROCEDURE CREATE_TASK
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TASK_NAME                      VARCHAR2                IN
 COMMENT                        VARCHAR2                IN     DEFAULT
PROCEDURE DROP_CHUNKS
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TASK_NAME                      VARCHAR2                IN
PROCEDURE DROP_TASK
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TASK_NAME                      VARCHAR2                IN
FUNCTION GENERATE_TASK_NAME RETURNS VARCHAR2
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 PREFIX                         VARCHAR2                IN     DEFAULT
PROCEDURE GET_NUMBER_COL_CHUNK
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TASK_NAME                      VARCHAR2                IN
 CHUNK_ID                       NUMBER                  OUT
 START_ID                       NUMBER                  OUT
 END_ID                         NUMBER                  OUT
 ANY_ROWS                       BOOLEAN                 OUT
PROCEDURE GET_ROWID_CHUNK
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TASK_NAME                      VARCHAR2                IN
 CHUNK_ID                       NUMBER                  OUT
 START_ROWID                    ROWID                   OUT
 END_ROWID                      ROWID                   OUT
 ANY_ROWS                       BOOLEAN                 OUT
PROCEDURE PURGE_PROCESSED_CHUNKS
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TASK_NAME                      VARCHAR2                IN
PROCEDURE RESUME_TASK
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TASK_NAME                      VARCHAR2                IN
 SQL_STMT                       CLOB                    IN
 LANGUAGE_FLAG                  NUMBER                  IN
 EDITION                        VARCHAR2                IN     DEFAULT
 APPLY_CROSSEDITION_TRIGGER     VARCHAR2                IN     DEFAULT
 FIRE_APPLY_TRIGGER             BOOLEAN                 IN     DEFAULT
 PARALLEL_LEVEL                 NUMBER                  IN     DEFAULT
 JOB_CLASS                      VARCHAR2                IN     DEFAULT
 FORCE                          BOOLEAN                 IN     DEFAULT
PROCEDURE RESUME_TASK
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TASK_NAME                      VARCHAR2                IN
 FORCE                          BOOLEAN                 IN     DEFAULT
PROCEDURE RUN_INTERNAL_WORKER
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TASK_NAME                      VARCHAR2                IN
 JOB_NAME                       VARCHAR2                IN
PROCEDURE RUN_TASK
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TASK_NAME                      VARCHAR2                IN
 SQL_STMT                       CLOB                    IN
 LANGUAGE_FLAG                  NUMBER                  IN
 EDITION                        VARCHAR2                IN     DEFAULT
 APPLY_CROSSEDITION_TRIGGER     VARCHAR2                IN     DEFAULT
 FIRE_APPLY_TRIGGER             BOOLEAN                 IN     DEFAULT
 PARALLEL_LEVEL                 NUMBER                  IN     DEFAULT
 JOB_CLASS                      VARCHAR2                IN     DEFAULT
PROCEDURE SET_CHUNK_STATUS
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TASK_NAME                      VARCHAR2                IN
 CHUNK_ID                       NUMBER                  IN
 STATUS                         NUMBER                  IN
 ERR_NUM                        NUMBER                  IN     DEFAULT
 ERR_MSG                        VARCHAR2                IN     DEFAULT
PROCEDURE STOP_TASK
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TASK_NAME                      VARCHAR2                IN
FUNCTION TASK_STATUS RETURNS NUMBER
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TASK_NAME                      VARCHAR2                IN

To divide the table into chucks, one of three procedures may be used:

  1. DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_ROWID
  2. DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_NUMBER_COL
  3. DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_SQL

Note: use of this package attempts to create jobs, so using this requires the CREATE_JOB system privilege.

Using ROWID based chunks

To chunk by rowid,  requires the CHUNK_SIZE parameter to be passed in addition to the table owner and table name. The CHUNK_SIZE is the rough number of rows or blocks in each chunk and therefore in each separately committed transaction. The boolean parameter BY_ROW determines of CHUNK_SIZE refers to rows or blocks.

Using numeric column value based chunks

To chunk by numeric column, requires the TABLE_COLUMN parameter, which names the numeric column used to determine the chunks and the CHUNK_SIZE parameter. For this method, the minimum and maximum values in the column are obtained from the Data Dictionary, and the table is then chunked evenly by CHUNK_SIZE. For popular values, there could be more than one chunk, and conversely a chunk could contain multiple values.

Using user provided SQL based chunks

To chunk using SQL requires that the user executing the package have execute privileges for the DBMS_SQL package, as this is called by the DBMS_PARALLEL_EXECUTE package to execute the SQL statement used to do the chunking. This method requires the SQL_STATEMENT parameter containing the text of the statement whose output is used to chunk the data, either by rowids or by numeric columns. It uses the BY_ROWID parameter. If this is  TRUE, then the SQL statement must have columns called START_ID and END_ID of type ROWID. If BY_ROWID is false, then then the SQL statement must have columns called START_ID and END_ID of type NUMBER.

Using the package

The package is generally used as follows:

  1. Call CREATE_TASK to establish the task providing a name
  2. Chunk the table using one of the three methods described above
  3. Call RUN_TASK, providing the SQL used to update the table
  4. Call DROP_TASK when complete

Some other procedures in the package allow the following operations:

  • Stopping a task and its job slaves using STOP_TASK
  • Resuming a stopped task and its job slaves using RESUME TASK
  • Managing and inquiring about a task’s or chunk’s status using TASK_STATUS or SET_CHUNK_STATUS
  • the ADM procedures are for DBAs and permit stopping and dropping tasks of other users, dropping chunks of other users and checking the task status of other users.

Note: to use RUN_TASK or RESUME_TASK also requires execute privileges on DBMS_SQL.

For a full list of all the procedures see the documentation here.The doc also contains some useful examples using the HR schema.

My Oracle support note 1066555.1 is also a useful reference.
Dictionary views used for this feature are:

  • DBA_PARALLEL_EXECUTE_TASKS
  • DBA_PARALLEL_EXECUTE_CHUNKS
  • USER_PARALLEL_EXECUTE_TASKS
  • USER_PARALLEL_EXECUTE_CHUNKS

Here are the DBA views:

SQL> desc dba_parallel_execute_tasks

 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 TASK_OWNER                                NOT NULL VARCHAR2(30)
 TASK_NAME                                 NOT NULL VARCHAR2(128)
 CHUNK_TYPE                                         VARCHAR2(12)
 STATUS                                             VARCHAR2(19)
 TABLE_OWNER                                        VARCHAR2(30)
 TABLE_NAME                                         VARCHAR2(30)
 NUMBER_COLUMN                                      VARCHAR2(30)
 TASK_COMMENT                                       VARCHAR2(4000)
 JOB_PREFIX                                         VARCHAR2(30)
 SQL_STMT                                           CLOB
 LANGUAGE_FLAG                                      NUMBER
 EDITION                                            VARCHAR2(30)
 APPLY_CROSSEDITION_TRIGGER                         VARCHAR2(30)
 FIRE_APPLY_TRIGGER                                 VARCHAR2(10)
 PARALLEL_LEVEL                                     NUMBER
 JOB_CLASS                                          VARCHAR2(30)

SQL> desc dba_parallel_execute_chunks
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 CHUNK_ID                                  NOT NULL NUMBER
 TASK_OWNER                                NOT NULL VARCHAR2(30)
 TASK_NAME                                 NOT NULL VARCHAR2(128)
 STATUS                                             VARCHAR2(20)
 START_ROWID                                        ROWID
 END_ROWID                                          ROWID
 START_ID                                           NUMBER
 END_ID                                             NUMBER
 JOB_NAME                                           VARCHAR2(30)
 START_TS                                           TIMESTAMP(6)
 END_TS                                             TIMESTAMP(6)
 ERROR_CODE                                         NUMBER
 ERROR_MESSAGE                                      VARCHAR2(4000)


Have a go at reducing the bulk of your updates!
Joel
July 2011



Advertisements

Posted in Oracle | 2 Comments »