On a recent Oracle Certified Masters (OCM) prep workshop, a question arose concerning the purpose and use of the RMAN Snapshot control file. This in turn led to some interesting exploration of a change in Oracle 11g R2 beginning in 126.96.36.199.
To understand this change it is first a good idea to review the snapshot controlfile which in turn requires an understanding of RMAN Resync operations.
RMAN Resync Operations
RMAN performs Resync operations when the RMAN Client is connected to a target and an RMAN catalog. There are also two ways for Resync to be invoked:
- Manual Resync – using the RESYNC CATALOG command;
- Automatic Resync – this may be partial or full as explained below.
There are two types of Resync:
- Full Resync
- Partial Resync
To understand when these are done, consider that the controlfile contains certain records that are able to be overwritten based on the CONTROLFILE_RECORD_KEEP_TIME parameter. These are records in the controlfile, that contain details of archivelogs produced and which may be overwritten once those details are copied to the catalogue. But many of the other controlfile record types are not overwritten as they represent persistent configuration details. These records include but are not restricted to things such as:
- Database Record
- Redo Thread Records
- Log File Records
- Checkpoint Progress Records
- Tablespace Records
- Filename Records
- RMAN Persistent Configuration Records
Details of all the record types, including the number of records per type is available in the V$CONTROLFILE_RECORD_SECTION view.
When changes occur only to the records that may be overwritten, then a Partial Resync is performed automatically the next time that the RMAN client connects to the target and the catalogue and does any of the following:
A Full Resync occurs in the following cases:
- When changes occur to any non-overwriteable section of the controlfile and any of the three commands listed above is issued
- When a database is first registered in the catalogue using the REGISTER DATABASE command
- When a RESYNC command is issued from the RMAN Prompt
Snapshot Controlfile Locations
Prior to Oracle 188.8.131.52, The Snapshot Controlfile was used only during a Full Resync to reduce the duration of the CF ENQUEUE held exclusively on the controlfile. The enqueue prevents updates to any of the control file records, because there are dependencies between records in some sections and others. For example, there are tablespace records and file records, and during a Full Resync, the consistency between these two record types must be maintained until all the controlfile records are resync’ed to the catalogue. This prevents Oracle processes making changes to anything in the controlfile that might cause the records to become inconsistent.
But the controlfile is frequently updated in a busy system, so instead of holding the CF ENQUEUE for the duration of the resync, the controlfile is essentially copied to the “Snapshot” controlfile location in one go whilst holding the CF ENQUEUE, and then it is dequeued. The catalogue is then resync’ed from the “Snapshot”.
By default, the “Snapshot” is written to the $ORACLE_HOME/dbs directory which may be seen by doing an RMAN SHOW ALL command. But this may be configured to another location if required.
In RAC for example, the directory for the snapshots, must be a valid path name on any node in the cluster which has an instance of the database, to guarantee that the snapshot may be created regardless of which instance is serving the RMAN client. But here is where the change occurs.
Prior to 184.108.40.206, the Snapshot Controlfile location did NOT require shared storage, only that the path was valid from all nodes as stated above. From 220.127.116.11, it must be in shared storage when using RAC. If not then one might see the following:
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-03009: failure of Control File and SPFILE Autobackup command on
ORA_DISK_1 channel at 31/08/2011 13:34:07
ORA-00245: control file backup operation failed
What is going on? The answer is that the Snapshot Controlfile is being used for ordinary controlfile backups from 18.104.22.168 for:
- Control file backups taken manually from SQL*PLUS
- Control file auto backups taken by the Oracle server when configured from the RMAN client
For example here is what happens if a control file backup is attempted in RAC in 22.214.171.124 when not using shared storage:
SQL> alter database backup controlfile to ‘/u01/app/oracle/product/126.96.36.199/dbhome_i/dbs/ctlbackup.ctl’ ;
ERROR at line 1:
ORA-00245: control file backup operation failed
The $ORACLE_HOME/dbs directory is normally local in RAC, since the entire ORACLE_HOME is on local storage. This facilitates rolling patches, but from 188.8.131.52 the dbs directory is no longer useful for the snapshots.
The reason for this change is that control file backups traditionally used the CF ENQUEUE to lock the controlfile for the duration of the backup but in some high workload systems this was a performance bottleneck. The solution, was to take the controlfile backup from the snapshot and avoid holding the CF ENQUEUE whilst doing it. But that required maintaining the snapshot as changes occur to the controlfile. Since changes could occur from any instance of the database, the snapshot must be on shared storage, just like the proper control file is.
Look at the OERR output for the error:
> oerr ora 00245
00245, 00000, “control file backup failed; target is likely on a local file system”
// *Cause: Failed to create a control file backup because some process
// signaled an error during backup creation. This is likely caused
// by the backup target being on a local file system so it could not
// be accessed by other instances. It can also be caused by other
// I/O errors to the backup target. Any process of any instance that
// starts a read/write control file transaction must have access
// to the backup control file during backup creation.
// *Action: Check alert files of all instances for further information.
When upgrading to 184.108.40.206 under RAC, make sure to avoid this problem by using RMAN to configure the Snapshot Controlfile to a shared location:
RMAN> CONFIGURE SNAPSHOT CONTROLFILE NAME TO ‘<a_shared_location>/snapcf_?.f’;