Oracle Database server 11gR2 added several new Parallel Execution features. I had several questions on a recent Exadata administration course concerning parallelism, and an interesting discussion arose about these new features. Although parallel execution is useful in all Oracle Database Data Warehouse implementations, and often in batch systems, it is particularly relevant to Exadata. In Exadata, “smart” storage operations, which offload filtering, column projections and some join overheads to the cells may be done for:
- Full Table Scans
- Fast Full Index Scans
But these “smart” scans, only occur if a “direct path read” access method is done. This occurs in the following situations:
- On a Parallel Query
- On a Serial Query – when the size of the table exceeds the “_small_table_threshold” hidden parameter
- On a Serial Query – when “_serial_direct_read” hidden parameter is set to “always” regardless of table size
Note: both these parameters are session modifiable.
In addition for Exadata, “direct path insert” is required when loading rows into Hybrid Columnar Compressed (HCC) tables, in order to get the HCC compression done into multi-block compression units. Direct path insert may occur when:
- When inserting in parallel
- When doing “Create table as select” (CTAS) in parallel
- When inserting serially using the “append” hint
Since Exadata is used for Data Warehouses, Parallel Execution is likely to be used very often, both for queries of large tables and for bulk inserts and updates of existing tables.
The new features of Parallel Execution in Oracle 11gR2 are:
- Automatic Degree of Parallelism – to set Degree of Parallelism automatically for a statement
- Parallel Statement Queuing – to queue a statement if inadequate execution slave processes are not available
- In Memory Parallel Updates – to utilise the RAC buffer caches for Parallel queries as an alternative to cache fusion overheads
- Parallel Bulk Updates – to allow updates to be done in sections which an succeed or fail independently of other sections
In this post I will focus on Parallel Bulk updates which:
-
Are useful for bulk updates of large data volumes
-
Avoids the “All or Nothing” approach of traditional Parallel DML
- Groups sets of rows into smaller sized chunks
-
Chunk updates run by a scheduler job which commits on completion
-
Chunking is based on one of three possible techniques:
- Using ROWID ranges of rows in the queried table
- Ranges of values in a numeric column of the queried table
- Results of User provided SQL statements
Parallel Bulk update is done using the DBMS_PARALLEL_EXECUTE package, which permits the chunking of rows using any of the 3 methods listed. Then the updates are done with several benefits:
- Only one chunk of rows need be locked at one time, rather than having the entire table locked as in normal PDML
- Chunks commit independently of of one another
- Failure of one chunk update to commit, does not cause all the updates to roll back
- Less Undo space is required since the chunks may commit their updates independently
- Performance may benefit from the combination of reduced simultaneous locking, and reduced undo consumption
- Avoids having a huge rollback of a nearly complete bulk update due to failure near the end
- Failed chunks may be retried after correcting the problem causing the failure
Here is the DBMS_PARALLEL_EXECUTE package described in Oracle 11.2.0.2:
SQL> select * from v$version; BANNER -------------------------------------------------------------------------------- Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production PL/SQL Release 11.2.0.2.0 - Production CORE 11.2.0.2.0 Production TNS for Linux: Version 11.2.0.2.0 - Production NLSRTL Version 11.2.0.2.0 - Production SQL> desc dbms_parallel_execute PROCEDURE ADM_DROP_CHUNKS Argument Name Type In/Out Default? ------------------------------ ----------------------- ------ -------- TASK_OWNER VARCHAR2 IN TASK_NAME VARCHAR2 IN PROCEDURE ADM_DROP_TASK Argument Name Type In/Out Default? ------------------------------ ----------------------- ------ -------- TASK_OWNER VARCHAR2 IN TASK_NAME VARCHAR2 IN PROCEDURE ADM_STOP_TASK Argument Name Type In/Out Default? ------------------------------ ----------------------- ------ -------- TASK_OWNER VARCHAR2 IN TASK_NAME VARCHAR2 IN FUNCTION ADM_TASK_STATUS RETURNS NUMBER Argument Name Type In/Out Default? ------------------------------ ----------------------- ------ -------- TASK_OWNER VARCHAR2 IN TASK_NAME VARCHAR2 IN PROCEDURE CREATE_CHUNKS_BY_NUMBER_COL Argument Name Type In/Out Default? ------------------------------ ----------------------- ------ -------- TASK_NAME VARCHAR2 IN TABLE_OWNER VARCHAR2 IN TABLE_NAME VARCHAR2 IN TABLE_COLUMN VARCHAR2 IN CHUNK_SIZE NUMBER IN PROCEDURE CREATE_CHUNKS_BY_ROWID Argument Name Type In/Out Default? ------------------------------ ----------------------- ------ -------- TASK_NAME VARCHAR2 IN TABLE_OWNER VARCHAR2 IN TABLE_NAME VARCHAR2 IN BY_ROW BOOLEAN IN CHUNK_SIZE NUMBER IN PROCEDURE CREATE_CHUNKS_BY_SQL Argument Name Type In/Out Default? ------------------------------ ----------------------- ------ -------- TASK_NAME VARCHAR2 IN SQL_STMT CLOB IN BY_ROWID BOOLEAN IN PROCEDURE CREATE_TASK Argument Name Type In/Out Default? ------------------------------ ----------------------- ------ -------- TASK_NAME VARCHAR2 IN COMMENT VARCHAR2 IN DEFAULT PROCEDURE DROP_CHUNKS Argument Name Type In/Out Default? ------------------------------ ----------------------- ------ -------- TASK_NAME VARCHAR2 IN PROCEDURE DROP_TASK Argument Name Type In/Out Default? ------------------------------ ----------------------- ------ -------- TASK_NAME VARCHAR2 IN FUNCTION GENERATE_TASK_NAME RETURNS VARCHAR2 Argument Name Type In/Out Default? ------------------------------ ----------------------- ------ -------- PREFIX VARCHAR2 IN DEFAULT PROCEDURE GET_NUMBER_COL_CHUNK Argument Name Type In/Out Default? ------------------------------ ----------------------- ------ -------- TASK_NAME VARCHAR2 IN CHUNK_ID NUMBER OUT START_ID NUMBER OUT END_ID NUMBER OUT ANY_ROWS BOOLEAN OUT PROCEDURE GET_ROWID_CHUNK Argument Name Type In/Out Default? ------------------------------ ----------------------- ------ -------- TASK_NAME VARCHAR2 IN CHUNK_ID NUMBER OUT START_ROWID ROWID OUT END_ROWID ROWID OUT ANY_ROWS BOOLEAN OUT PROCEDURE PURGE_PROCESSED_CHUNKS Argument Name Type In/Out Default? ------------------------------ ----------------------- ------ -------- TASK_NAME VARCHAR2 IN PROCEDURE RESUME_TASK Argument Name Type In/Out Default? ------------------------------ ----------------------- ------ -------- TASK_NAME VARCHAR2 IN SQL_STMT CLOB IN LANGUAGE_FLAG NUMBER IN EDITION VARCHAR2 IN DEFAULT APPLY_CROSSEDITION_TRIGGER VARCHAR2 IN DEFAULT FIRE_APPLY_TRIGGER BOOLEAN IN DEFAULT PARALLEL_LEVEL NUMBER IN DEFAULT JOB_CLASS VARCHAR2 IN DEFAULT FORCE BOOLEAN IN DEFAULT PROCEDURE RESUME_TASK Argument Name Type In/Out Default? ------------------------------ ----------------------- ------ -------- TASK_NAME VARCHAR2 IN FORCE BOOLEAN IN DEFAULT PROCEDURE RUN_INTERNAL_WORKER Argument Name Type In/Out Default? ------------------------------ ----------------------- ------ -------- TASK_NAME VARCHAR2 IN JOB_NAME VARCHAR2 IN PROCEDURE RUN_TASK Argument Name Type In/Out Default? ------------------------------ ----------------------- ------ -------- TASK_NAME VARCHAR2 IN SQL_STMT CLOB IN LANGUAGE_FLAG NUMBER IN EDITION VARCHAR2 IN DEFAULT APPLY_CROSSEDITION_TRIGGER VARCHAR2 IN DEFAULT FIRE_APPLY_TRIGGER BOOLEAN IN DEFAULT PARALLEL_LEVEL NUMBER IN DEFAULT JOB_CLASS VARCHAR2 IN DEFAULT PROCEDURE SET_CHUNK_STATUS Argument Name Type In/Out Default? ------------------------------ ----------------------- ------ -------- TASK_NAME VARCHAR2 IN CHUNK_ID NUMBER IN STATUS NUMBER IN ERR_NUM NUMBER IN DEFAULT ERR_MSG VARCHAR2 IN DEFAULT PROCEDURE STOP_TASK Argument Name Type In/Out Default? ------------------------------ ----------------------- ------ -------- TASK_NAME VARCHAR2 IN FUNCTION TASK_STATUS RETURNS NUMBER Argument Name Type In/Out Default? ------------------------------ ----------------------- ------ -------- TASK_NAME VARCHAR2 IN
To divide the table into chucks, one of three procedures may be used:
- DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_ROWID
- DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_NUMBER_COL
- DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_SQL
Note: use of this package attempts to create jobs, so using this requires the CREATE_JOB system privilege.
Using ROWID based chunks
To chunk by rowid, requires the CHUNK_SIZE parameter to be passed in addition to the table owner and table name. The CHUNK_SIZE is the rough number of rows or blocks in each chunk and therefore in each separately committed transaction. The boolean parameter BY_ROW determines of CHUNK_SIZE refers to rows or blocks.
Using numeric column value based chunks
To chunk by numeric column, requires the TABLE_COLUMN parameter, which names the numeric column used to determine the chunks and the CHUNK_SIZE parameter. For this method, the minimum and maximum values in the column are obtained from the Data Dictionary, and the table is then chunked evenly by CHUNK_SIZE. For popular values, there could be more than one chunk, and conversely a chunk could contain multiple values.
Using user provided SQL based chunks
To chunk using SQL requires that the user executing the package have execute privileges for the DBMS_SQL package, as this is called by the DBMS_PARALLEL_EXECUTE package to execute the SQL statement used to do the chunking. This method requires the SQL_STATEMENT parameter containing the text of the statement whose output is used to chunk the data, either by rowids or by numeric columns. It uses the BY_ROWID parameter. If this is TRUE, then the SQL statement must have columns called START_ID and END_ID of type ROWID. If BY_ROWID is false, then then the SQL statement must have columns called START_ID and END_ID of type NUMBER.
Using the package
The package is generally used as follows:
- Call CREATE_TASK to establish the task providing a name
- Chunk the table using one of the three methods described above
- Call RUN_TASK, providing the SQL used to update the table
- Call DROP_TASK when complete
Some other procedures in the package allow the following operations:
- Stopping a task and its job slaves using STOP_TASK
- Resuming a stopped task and its job slaves using RESUME TASK
- Managing and inquiring about a task’s or chunk’s status using TASK_STATUS or SET_CHUNK_STATUS
- the ADM procedures are for DBAs and permit stopping and dropping tasks of other users, dropping chunks of other users and checking the task status of other users.
Note: to use RUN_TASK or RESUME_TASK also requires execute privileges on DBMS_SQL.
For a full list of all the procedures see the documentation here.The doc also contains some useful examples using the HR schema.
My Oracle support note 1066555.1 is also a useful reference.
Dictionary views used for this feature are:
- DBA_PARALLEL_EXECUTE_TASKS
- DBA_PARALLEL_EXECUTE_CHUNKS
- USER_PARALLEL_EXECUTE_TASKS
- USER_PARALLEL_EXECUTE_CHUNKS
Here are the DBA views:
SQL> desc dba_parallel_execute_tasks Name Null? Type ----------------------------------------- -------- ---------------------------- TASK_OWNER NOT NULL VARCHAR2(30) TASK_NAME NOT NULL VARCHAR2(128) CHUNK_TYPE VARCHAR2(12) STATUS VARCHAR2(19) TABLE_OWNER VARCHAR2(30) TABLE_NAME VARCHAR2(30) NUMBER_COLUMN VARCHAR2(30) TASK_COMMENT VARCHAR2(4000) JOB_PREFIX VARCHAR2(30) SQL_STMT CLOB LANGUAGE_FLAG NUMBER EDITION VARCHAR2(30) APPLY_CROSSEDITION_TRIGGER VARCHAR2(30) FIRE_APPLY_TRIGGER VARCHAR2(10) PARALLEL_LEVEL NUMBER JOB_CLASS VARCHAR2(30) SQL> desc dba_parallel_execute_chunks Name Null? Type ----------------------------------------- -------- ---------------------------- CHUNK_ID NOT NULL NUMBER TASK_OWNER NOT NULL VARCHAR2(30) TASK_NAME NOT NULL VARCHAR2(128) STATUS VARCHAR2(20) START_ROWID ROWID END_ROWID ROWID START_ID NUMBER END_ID NUMBER JOB_NAME VARCHAR2(30) START_TS TIMESTAMP(6) END_TS TIMESTAMP(6) ERROR_CODE NUMBER ERROR_MESSAGE VARCHAR2(4000)
Have a go at reducing the bulk of your updates!
Joel
July 2011