The Blog from the DBA Classroom

By: Joel Goodman

Great Speakers – and in Stereo Too!

Posted by Joel Goodman on 14/05/2012


It has been my pleasure to be involved for the past 14 years with the UKOUG as an Oracle Liaison to the DBA oriented technical SIGs, as a speaker at the SIGs and at the Annual Conference.

This is one of the best if not the best Oracle conferences in the world, organised by a dedicated team of conference specialists, with many years of expertise in both technology and conference organisation.

It features well known experts in the latest database, O/S and storage technology and a culture of sharing information and ideas. Many of the bloggers and authors that you may have read of whom you have heard, will be there.

If you have expertise, then there will be others willing to listen and share. There is no better way to cut to the key issues, than by meeting other experts who can also share with you.

There is also a great atmosphere, with social evening programmes and entertainment.

So consider submitting one or more abstracts, while time remains to do so.

To register go to the UKOUG Conference 2012 Call for Papers page.

I hope you can make it!!

Joel

London

May 2012

Posted in Oracle | Leave a Comment »

Smooth Sailing at the OUGN

Posted by Joel Goodman on 22/03/2012


I have spent the past two days at the Norwegian Oracle User Group (OUGN) along with many  colleagues and friends from the Oracle world in both EMEA and further afield. I’ve been honoured to have been invited for the past three years and am joined this year by two distinguished Oracle University trainers with whom I regularly collaborate on the EMEA OU Subject Matter Experts team: Harald van Breederode and Uwe Hesse.

Several other people have blogged about the various events but the unique aspect of this event is the venue:

Held on the Colour Magic Cruise Ferry sailing from Oslo to Kiel and then back over two days makes this conference unlike any other in which I have participated.

This year we had a land day on Wednesday in Olso first followed by the cruise portion of the conference.

Uwe will speak on “Data Guard Real Time Query”; a feature that is part of Active Data Guard on Friday.

Harald has already presented two talks:

  • Server Side Smart Flash cache on Wednesday in Oslo
  • Adaptive Cursor Sharing today on board the Colour Magic

Harald will finish tomorrow with a talk on Client Connectivity in a Data Guard environment.

 

I presented two talks already:

  • Database Machine Overview and Features on Wednesday in Oslo
  • Database Machine Monitoring using Enterprise Manager and Command line on board the Colour Magic

I will finish tomorrow with a talk on Automatic Parallelism from 11gR2.

 

It is a  well run and friendly conference and the organisers are to be congratulated for the way they do this year after year.

 

Joel

 

Somewhere at Sea

22/03/2012

 

 

 

Posted in Oracle | Leave a Comment »

Weblogic JTA

Posted by Joel Goodman on 20/03/2012


Are you an Oracle WebLogic Server Administrator?

Would you like to help define the depth and scope of future Oracle WebLogic Server training and certification?

Oracle is conducting this online Job Task survey prior to the next major Oracle WebLogic Server product release to ensure that the tasks Oracle WebLogic Server Administrators consider important are prominent in the design of future WebLogic Server Administrator curriculum and certification offerings.

Don’t miss this chance to do what DBAs did last year, to help define the curriculum and certification for the Oracle Database. Join with other Oracle experts to take this online survey and tell us what tasks are important to you.

Survey is here: http://www.surveymonkey.com/s/7XKX983

 

Joel

 

 

Posted in Oracle | Leave a Comment »

Don’t be a Crash Test Dummy

Posted by Joel Goodman on 28/02/2012


During a recent Oracle 11g Tuning Course in London, a discussion of performance problems related to locking and resource serialisation arose during the lessons dealing with contention. I was then asked by a student to summarise the types of serialisation that occur and to describe the worst types that I find at customers. I replied that “in-doubt” locks on tables involved in distributed transactions have created the worst cases that I have seen. I will focus on this type of lock later in this post. But first, the answer to the first question:

There are different types of serialisation mechanisms for different purposes:

1. Latches – used to protect low level memory structures or access to certain sections of code. Latches are held for very short durations and have been described very nicely in numerous blogs. A good in depth discussion may be found here in a white paper by David Gornshtein and Boris Tamarkin. Latches are typically held for microseconds and have no longer have any documented parameters to control or administer them. Monitoring may be done using GUI tools or by using dynamic performance views such as:

  1. V$LATCH
  2. V$LATCHNAME
  3. V$LATCH_PARENT
  4. V$LATCH_CHILDREN
  5. V$LATCHHOLDER
  6. V$LATCHNAME
  7. V$LATCH_MISSES

2. Mutexes- which are used from 10.2.0.2 onwards for library cache management as they are faster and use less memory than do latches. Mutexes, like latches are typically held for short durations and have no documented controlling parameters, They may be monitored using dynamic performance views such as:

  1. V$MUTEX_SLEEP
  2. V$MUTEX_SLEEP_HISTORY

3. ENQUEUES – used to control access to resources for a longer period of time. ENQUEUES (also called locks) are obtained using enqueues. Unlike mutexes or latches, they may be held for long periods by the owning session, possibly until the session commits. A good presentation on locks, latches and mutexes by my colleague Tanel Poder may be found here.  DBAs can control this potential contention problem by having sessions that hold locks sniped after a period of time, setting IDLE_TIME in the older implementation with Oracle Kernel profiles, or using the Database Resource Manager, and setting timeouts differentially for sessions that don’t currently block other sessions and those that do.  Locks may be monitored using dynamic performance views such as:

  1. V$LOCK
  2. V$LOCK_TYPE
  3. V$LOCK_ELEMENT
  4. V$LOCKED_OBJECT
  5. V$LOCK_ACTIVITY
  6. V$LOCKS_WITH_COLLISIONS

4. User Defined Locks – also  known as UL lock types, these locks are created and managed using the DBMS_LOCK package. While it is beyond the scope of this post, to discuss the details, one unusual aspect of user managed locks is that they may persist across commits. This may be done by setting the RELEASE_ON_COMMIT boolean parameter of the DBMS_LOCK.REQUEST function to FALSE. This may lead to contention problems introduced by this level and duration of locking. DBAs ideally should know the purpose of these locks, and their likely duration, in order to monitor effectively. Monitoring of these locks is done using the same view as for ENQUEUES listed above.

5.  “In Doubt” locks on tables – these occur during the two phase commit process that occur when distributed transaction over database links are being committed.  A good discussion of the two-phased commit processing may be found in the Oracle documentation. What is unusual here is that an “in doubt” transaction lock remains on the table, returning an error on reads AND writes,  until the “in doubt” state is resolved.

When a network, node or instance failure prevents a distributed transaction from completing a “two phase commit” process that is already underway, one or more of the sites involved in the coordination of the commit, may not know the disposition at other sites. If a site is in the “prepare” state, then it must wait to be told whether to commit or to roll back  updates to any local objects involved in the distributed transaction. Until this decision is made, the objects remain locked. But since it is not known whether or not rows involved in the transaction will be committed, the lock is for both updates AND reads!! This is quite a serious lock as tables in this state may not even be read. Attempting to query a table in such a state results in an

ora-01591 : lock held by in-doubt distributed transaction x.yy.zzzz

error, where x.yy.zzzz represents the Global Transaction Identifier of the distributed transaction.

Normally if an instance or node failure occurs, it is resolved quickly, when connectivity is re-established amongst the database instances involved in the distributed transaction. This process is managed using the following components:

  1. The RECO background process, also known as the recoverer process
  2. The DBA_2PC_PENDING Data Dictionary view – based on sys.pending_trans$ base table
  3. The DBA_2PC_NEIGHBORS  Data Dictionary view – based on sys.pss1$ base table

RECO, on each instance will handshake with the other RECO processes and a decision will be made on any “in doubt” resources, based on whether the transaction did indeed commit or roll back and once the affected databases perform these actions, the affected tables are unlocked.

But if there is a delay in re-establishing connectivity, tables may remain locked for long periods of time; minutes, hours or more, because RECO can not make decisions if one or more of the instances is not available. If one of the sites has gone down due to a hardware problem for example.  and requires a resolution time longer than is acceptable, then DBAs may need to resolve this manually by performing either a force commit or a force rollback on the global transaction for the surviving instances and databases.

Oracle provides a way to practice these manual “in doubt” resolutions on test systems by using two separate but related features:

  • Disabling the RECO process
  • Performing 2 phase commit “crash tests

Disabling the RECO process

This is done by using the following statement logged as SYS, SYSTEM or with an appropriate role.

alter system disable distributed recovery;

The RECO process remains up, but does not participate in any automated “in doubt” resolution. By doing this, a DBA may simulate numerous failures using the crash tests, without having to shut down any instances.

 

The Crash Tests

These tests are performed by using specially worded comments in the COMMIT statement, which support comments for the purpose of general documentation in the code. There are many crash tests that are possible, and DBAs are better off if all the scenarios that might apply are tested.

The commit statement for crash tests has the following form:

commit comment ‘ORA-2PC-CRASH-TEST-<nn>’ ;

where nn is the number of the crash test.

To understand the tests, note that the participants in Oracle two phased commit processing may play one of a number of roles in the proceedings:

1. Global Transaction Coordinator – This is the database instance that issues the commit statement and to which the client or middle tier is connected.

2. Local Transaction Coordinator – This is a database instance that issues prepare, commit or rollback statements to database instances not connected directly to the Global Transaction Coordinator. This occurs if one or more of the databases that are directly connected to by the Global Transaction Coordinator,  connect onwards in a “daisy chain” fashion to other remote databases.

3. Commit Point Site - This is the instance that has the highest value for the COMMIT_POINT_STRENGTH parameter. The database that is the commit point site, is never asked to prepare. All the other sites are asked to prepare and if they do successfully, then a commit is sent to the commit point site. Usually, this should be the site where most of the updates are done as it saves preparing, when there is a possibility of rolling back, as preparing causes redo log buffer flush to be done. If there are only two databases, and the commit point strengths are equal, then the remote site is asked to prepare and the local site acts as both the global transaction coordinator and the commit point site.

Note also that the global transaction coordinator only has visibility of remote sites that are directly connected via database links. If there is a daisy-chain, then a local transaction coordinator is also used. For example if  database instance A connects to database instances B and C, and database instance B in turn connects to database instance D, then database instance D can not be the commit point site.

Assuming that database A is the global transaction coordinator, database B is the commit point site, database D is “daisy chained” from database B, and database instance A sends a prepare to database instance B.  Database instance B must do the following:

  • Send a prepare request to database instance D
  • Get a positive reply to the prepare
  • Issue a commit to database instance D
  • Get a positive reply for the commit
  • Commit its own resources
  • Return a positive result for the commit to database A

If database B is not the commit point site, then it gets a prepare and later on a commit from the global transaction coordinator.

Here is what it must do for the prepare:

  • Send a prepare request to database instance D
  • Get a positive reply to the prepare
  • Prepare its own resources
  • Return a positive result for the prepare to database A

And here is what it must do for the commit:

  • Send a commit request to database instance D
  • Get a positive reply to the commit
  • Commit its own resources
  • Return a positive result for the commit to database A

Note the following too

  • If any site fails to prepare then all sites will be instructed to roll back by the global or the local coordinator.
  • Once a transaction is either committed or rolled back, then a forget message is propagated to all the sites so that they may remove the appropriate rows from sys.pending_trans$ and sys.pss1$

Now we can look at the crash tests with their associated number to be placed in the comment:

  1. Crash commit point site after collect
  2. Crash non-commit point site after collect
  3. Crash non-commit point site before prepare
  4. Crash non-commit point site after prepare
  5. Crash commit point site before commit
  6. Crash commit point site after commit
  7. Crash non-commit point site before commit
  8. Crash non-commit point site after commit
  9. Crash commit point site before forget
  10. Crash non-commit point site before forget

To make this a bit simpler I have a demo that I use in the classroom. The script and the spooled output may be downloaded.

Have a go and don’t be a crash test dummy!

Joel

February 2012

Posted in Oracle | Tagged: , , , , | 2 Comments »

Mounting Failures With Your Failgroups?

Posted by Joel Goodman on 16/01/2012


Last year I wrote a post about mirroring in ASM in the context of Exadata which discussed aspects of ASM mirroring, including failgroups, and the ability of ASM to cope with different failure scenarios. The post dealt with cases such as:

  • Single disk failure in a single cell
  • Multiple disk failure in a single cell
  • Single cell failure
  • Overlapping disk failures in multiple cells

It also covered the management of free space in ASM.

But the assumption was that the diskgroup(s) were already mounted, when the failures occurred. If an ASM instance attempts to mount a diskgroup however, then things are very different.

I have been working closely with members of the Oracle Advanced Customer Support Services team over the past 18 months, to train EMEA staff in Grid Infrastructure, RAC, Exadata and Enterprise Linux, to help skill up for supporting the Oracle Database Machine.  Recently, this collaboration resulted in a joint project between Oracle University and ACS to triage some problems at an EMEA customer.

The customer’s configuration used Grid Infrastructure for a standalone server on a two node Solaris stretch cluster. This was done to utilise ASM on each node, to support a live environment on one node, and a test environment on the other node in a different location.  But the test node was also part of the HA strategy, in case of problems on the live node. Oracle Restart was not used, so essentially, it was two separate nodes each with Oracle and ASM.

The ASM disks, were provided by SAN LUNs, with arrays in both the live and test locations. Each array was connected to both the live and test servers.

The live system diskgroups each had two failgroups to support ASM normal redundancy

  • First failgroup on LUNs provided by the storage array at the live location
  • Second failgroup on LUNs provided by the storage array at the test location

Normally, the ASM instance on the live system would have two diskgroups mounted, each with two failgroups. But to handle live location failures, scripts were written, that would:

  • Shut the test database instances on the test server
  • Unmount the test database diskgroups in the ASM instance on the test server
  • Mount the live database diskgroups in the ASM instance on the test server
  • Start the live database instance on the test server

In effect, this was a cold failover of the database. But without Grid Infrastructure for a cluster and clustered ASM, the diskgroups could only be mounted by a single ASM instance at one time. Thus the live disk groups could be mounted on the test server, only if not currently mounted on the live one.

Test were done by the customer, demonstrating that the scripts worked properly when the live database node failed. Eventually after all testing was completed the system for which this database was created, became operational.

I was called in, partly to triage a failure of the same recovery scripts that had worked during testing, and also to do a health check on the database and ASM environments. What I discovered from the ASM and database logs, was that one of the failgroups, on the storage array at the live data centre, had gone offline. ASM continued to read mirror copies from the other failgroup,  as it is designed to do, whenever I/O for primary copies of ASM allocation units (AUs) on the offline failgroup were done. Eventually, the OS on the live server failed, possibly related to access to swap space or other system files on the array.

When the failover to the test system was invoked and the scripts executed:

  • The ASM instance failed to mount the live database diskgroups
  • Startup of the live database instance on the test server failed as it could not access the database files stored in the ASM diskgroups

An attempt was eventually made to fail back to the live server, once the live system OS was back up, but the same problem arose. When I examined the scripts, and the logs, it showed that the MOUNT command for the ASM diskgroups was missing the FORCE option. This is what caused the mount failures in both cases.

Mirrored ASM diskgroups, may survive the loss of ASM disks  or even a complete failgroup, when already mounted, but to mount a mirrored ASM disk group requires that each AU have at least two copies discovered for normal redundancy diskgroups. If one of more ASM disks, is not accessible, then the diskgroup will not mount, because the stated level of redundancy will not have been met. The FORCE option overcomes this problem, by requesting that the diskgroup be mounted, without providing the stated level of redundancy, should a subsequent failure of  an ASM disk occur.

Contributing to the problem, was the lack of testing for storage array failure. The tests had only been done on database node failure. No tests were done for storage array failure, storage network failure or for failure of any of the components upon which the storage depends. This was why the failure to mount the diskgroups in this situation, was not discovered during testing.

An alternative approach to the software architecture, would have been to use Grid Infrastructure for a cluster, rather than Grid Infrastructure for a standalone server. This would have meant the following:

  • The live database diskgroups could be mounted simultaneously from the ASM instances on both the live and test servers, because clustered ASM can do this.
  • No need to write custom scripts to mount the diskgroups on the test server would be needed.
  • The Grid Infrastructure could be used to control the “cold failover” of the non-RAC database instance from the live server to the test server
  • Since the diskgroups could be mounted already they would be accessible and the live database could start up normally on the test server

It was an interesting bit of troubleshooting, and served to re-enforce my classroom message about testing all failure scenarios when planning for High Availability. Instead of suffering mounting failures, aim for success.

Joel

1/2012

Posted in Oracle | Tagged: , , , , | 1 Comment »

Maintaining your Cell’s Image

Posted by Joel Goodman on 14/10/2011


I have been teaching the Oracle Exadata Database Machine Administration course for 2 years now and recently the one day seminar on Database Machine (DBM) monitoring. These two will be merged into a newly upgraded course soon that covers both topics and also new material on monitoring Infiniband switches, Cisco Switches, KVM hardware, patching and much more.

In the classroom, Oracle use a virtualised image of an X2-2 Quarter rack on each delegate desktop running under OVM. But the virtual machines for the Exadata storage servers also known as cells have some differences when compared to real storage servers:

  • The 12 Physical disks and associated LUNs are virtualised as files, rather than being device files as they are in a real cell
  • the 16 Flash FDOMs and associated LUNs are virtualised as files, rather than being device files as they are in a real cell
  • The sizes of these 28 files are smaller than the devices in a real cell
  • These is no virtualised Infiniband network. All the virtual machines in the virtual quarter rack use a virtual network under Oracle virtual machine (OVM)
  • There are no Infiniband switches, Cisco switches, KVM hardware, Power Distribution Units, or ILOMs to administer or monitor.
  • The root, boot and Oracle base file systems are not mirrored in different partitions on the first two hard disks of the cell. They are contained in directories on the virtual disk used by the virtual machine.

The classroom setup however is very good for the purpose of the course:

  • The Cell Software is the same as the software in real cells
  • The virtualised quarter rack permits delegates to lean how to administer for Exadata All the following:
    • Grid Infrastructure
    • ASM Disk Groups
    • Cell Discovery via the /etc/oracle/cell/network-config/cellip.ora and /etc/oracle/cell/network-config/cellinit.ora files
    • Cell administration using cellcli and dcli
    • Database I/O Resource management
    • Cell I/O Resource Management
    • Cell Security using Realms
    • Flash device configuration for Flash Cache
    • Flash device configuration for flash based disk groups
    • Flash device configuration for Smart Flash Logging
  • The Monitoring of the following is also available in the virtualised setup:
    • Cell Smart Scans
    • Use of Hybrid Columnar Compression (HCC)
    • The effect of Storage Indexes on I/O
    • The effect of Flash Cache on I/O
    • Interpreting cell oriented statistics
    • Interpreting execution plans and statement execution for Smart Storage operations
    • Monitoring the effect of I/O resource management

One of the common questions I get in the course concerns the configuration of the software on the cells. Cells contain “images” consisting of the following components:

  • Enterprise Linux OS kernel
  • The Root file system
  • The Oracle Software
  • Device Drivers
  • Firmware Golden Copies

But in the virtualised classroom setup it is not possible to demonstrate this, so I will use a real cell to show how this is configured.

The first two Hard Disks on each cell contain various partitions that are not present on the other ten Hard Disks which have no partition table.

Lets start by finding the device files associated with the 12 physical disks on the cell. We can use the cellcli utility for this:

CellCLI> list lun attributes diskType ,deviceName, lunSize,isSystemLun where disktype = HardDisk
         HardDisk        /dev/sda        557.861328125G   TRUE
         HardDisk        /dev/sdb        557.861328125G   TRUE
         HardDisk        /dev/sdc        557.861328125G   FALSE
         HardDisk        /dev/sdd        557.861328125G   FALSE
         HardDisk        /dev/sde        557.861328125G   FALSE
         HardDisk        /dev/sdf        557.861328125G   FALSE
         HardDisk        /dev/sdg        557.861328125G  FALSE
         HardDisk        /dev/sdh        557.861328125G  FALSE
         HardDisk        /dev/sdi        557.861328125G   FALSE
         HardDisk        /dev/sdj        557.861328125G   FALSE
         HardDisk        /dev/sdk        557.861328125G  FALSE
         HardDisk        /dev/sdl        557.861328125G   FALSE

We can see the 12 device files. /dev/sda and /dev/sdb correspond to disks 0 and 1 containing the “image” of  the cells. The “IsSystemLun” attribute shows that these two devices contain the image. This cell has 600G disks as the “lunSize” attribute indicates.

Here are the device files:

[root@dmorlcel05 ~]# ls -als /dev/sd[a-l]
0 brw-r—– 1 root disk 8,   0 Aug 27 02:22 /dev/sda
0 brw-r—– 1 root disk 8,  16 Aug 27 02:22 /dev/sdb
0 brw-r—– 1 root disk 8,  32 Oct 14 06:07 /dev/sdc
0 brw-r—– 1 root disk 8,  48 Oct 14 06:07 /dev/sdd
0 brw-r—– 1 root disk 8,  64 Oct 14 06:07 /dev/sde
0 brw-r—– 1 root disk 8,  80 Oct 14 06:05 /dev/sdf
0 brw-r—– 1 root disk 8,  96 Oct 14 06:07 /dev/sdg
0 brw-r—– 1 root disk 8, 112 Oct 14 06:07 /dev/sdh
0 brw-r—– 1 root disk 8, 128 Oct 14 06:07 /dev/sdi
0 brw-r—– 1 root disk 8, 144 Oct 14 06:07 /dev/sdj
0 brw-r—– 1 root disk 8, 160 Oct 14 06:07 /dev/sdk
0 brw-r—– 1 root disk 8, 176 Oct 14 06:07 /dev/sdl

Now lets look at some details of  these devices. First lets see the partition tables of the first two devices:

[root@dmorlcel05 grub]# fdisk -l /dev/sd[a-b]

Disk /dev/sda: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          15      120456   fd  Linux raid autodetect
/dev/sda2              16          16        8032+  83  Linux
/dev/sda3              17       69039   554427247+  83  Linux
/dev/sda4           69040       72824    30403012+   f  W95 Ext’d (LBA)
/dev/sda5           69040       70344    10482381   fd  Linux raid autodetect
/dev/sda6           70345       71649    10482381   fd  Linux raid autodetect
/dev/sda7           71650       71910     2096451   fd  Linux raid autodetect
/dev/sda8           71911       72171     2096451   fd  Linux raid autodetect
/dev/sda9           72172       72432     2096451   fd  Linux raid autodetect
/dev/sda10          72433       72521      714861   fd  Linux raid autodetect
/dev/sda11          72522       72824     2433816   fd  Linux raid autodetect

Disk /dev/sdb: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1          15      120456   fd  Linux raid autodetect
/dev/sdb2              16          16        8032+  83  Linux
/dev/sdb3              17       69039   554427247+  83  Linux
/dev/sdb4           69040       72824    30403012+   f  W95 Ext’d (LBA)
/dev/sdb5           69040       70344    10482381   fd  Linux raid autodetect
/dev/sdb6           70345       71649    10482381   fd  Linux raid autodetect
/dev/sdb7           71650       71910     2096451   fd  Linux raid autodetect
/dev/sdb8           71911       72171     2096451   fd  Linux raid autodetect
/dev/sdb9           72172       72432     2096451   fd  Linux raid autodetect
/dev/sdb10          72433       72521      714861   fd  Linux raid autodetect
/dev/sdb11          72522       72824     2433816   fd  Linux raid autodetect

/dev/sda and /dev/sdb each contain the same partitions with the same sizes. But lets look at the other 10 disks and see what they look like:

[root@dmorlcel05 ~]# fdisk -l /dev/sd[c-l]

Disk /dev/sdc: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdc doesn’t contain a valid partition table

Disk /dev/sdd: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdd doesn’t contain a valid partition table

Disk /dev/sde: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sde doesn’t contain a valid partition table

Disk /dev/sdf: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdf doesn’t contain a valid partition table

Disk /dev/sdg: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdg doesn’t contain a valid partition table

Disk /dev/sdh: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdh doesn’t contain a valid partition table

Disk /dev/sdi: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdi doesn’t contain a valid partition table

Disk /dev/sdj: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdj doesn’t contain a valid partition table

Disk /dev/sdk: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdk doesn’t contain a valid partition table

Disk /dev/sdl: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdl doesn’t contain a valid partition table

We can clearly see that disks 0 and 1 contain 11 partitions but that disks 2 through 11 do not. These partitions are what makes the first two disks on each cell into “system luns“. The total size used for this set of system partitions is seen by looking at the celldisks:

CellCLI> list celldisk attributes name,deviceName,devicePartition,diskType,size where diskType = HardDisk
         CD_00_dmorlcel05        /dev/sda        /dev/sda3       HardDisk        528.734375G
         CD_01_dmorlcel05        /dev/sdb        /dev/sdb3       HardDisk        528.734375G
         CD_02_dmorlcel05        /dev/sdc        /dev/sdc        HardDisk        557.859375G
         CD_03_dmorlcel05        /dev/sdd        /dev/sdd        HardDisk        557.859375G
         CD_04_dmorlcel05        /dev/sde        /dev/sde        HardDisk        557.859375G
         CD_05_dmorlcel05        /dev/sdf        /dev/sdf        HardDisk        557.859375G
         CD_06_dmorlcel05        /dev/sdg        /dev/sdg        HardDisk        557.859375G
         CD_07_dmorlcel05        /dev/sdh        /dev/sdh        HardDisk        557.859375G
         CD_08_dmorlcel05        /dev/sdi        /dev/sdi        HardDisk        557.859375G
         CD_09_dmorlcel05        /dev/sdj        /dev/sdj        HardDisk        557.859375G
         CD_10_dmorlcel05        /dev/sdk        /dev/sdk        HardDisk        557.859375G
         CD_11_dmorlcel05        /dev/sdl        /dev/sdl        HardDisk        557.859375G

We can see that about 29 gig is used for the system area on the first two hard disks. We can also see that the “devicePartition” attribute for the first two celldisks is the third partition of  the device but that for the other 10 disks, it corresponds to the device name. But now lets see how these partitions are really used. First lets try to see what is mounted:

[root@dmorlcel05 ~]# mount
/dev/md6 on / type ext3 (rw,usrquota,grpquota)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
/dev/md8 on /opt/oracle type ext3 (rw,nodev)
/dev/md4 on /boot type ext3 (rw,nodev)
/dev/md11 on /var/log/oracle type ext3 (rw,nodev)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
//10.141.138.97/monitor on /home/monitor type cifs (rw,mand)
/dev/sdm1 on /bla type ext3 (rw)
/dev/sdm1 on /mnt/usb type ext3 (rw)

The mounts show no evidence of partitions on /dev/sda or /dev/sdb. Instead we see various references to /dev/md partitions. We can also see this in /etc/fstab:

[root@dmorlcel05 ~]# cat /etc/fstab
/dev/md6           /                       ext3    defaults,usrquota,grpquota        1 1
tmpfs                   /dev/shm                tmpfs   defaults        0 0
devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
sysfs                   /sys                    sysfs   defaults        0 0
proc                    /proc                   proc    defaults        0 0
/dev/md2              swap                    swap    defaults        0 0
/dev/md8                /opt/oracle             ext3    defaults,nodev  1 1
/dev/md4                /boot                   ext3    defaults,nodev  1 1
/dev/md11               /var/log/oracle         ext3    defaults,nodev  1 1
//10.141.138.97/monitor     /home/monitor           cifs    user=monitor,password=monitor,uid=root,gid=root 0 0

We can see the following:

  • the swap partition is /dev/md2
  • the boot file system is mounted on /dev/md4
  • the root file system on /dev/md6
  • the oracle cell software base is mounted on /dev/md8
  • the oracle cell software log file system is mounted on /dev/md11

Also we already know the following:

  • /dev/sda3 and /dev/sdb3 are used for the celldisks on the first two luns. We saw this in the celldisk listing above. There is no corresponding metadevice.
  • /dev/sda4 and /dev/sdb4 are just extended partitions.
  • /dev/sda1 and /dev/sdb1 are bootable partitions from the fdisk output we saw earlier

To understand the use of the mounted devices, note that many of the partitions of the first two disks are managed with software raid mirroring using mdadm. Lets first see all the metadevices:

[root@dmorlcel05 grub]# ls -als /dev/md*
0 brw-r—– 1 root disk 9,  0 Aug 27 02:24 /dev/md0
0 brw-r—– 1 root disk 9,  1 Aug 27 02:22 /dev/md1
0 brw-r—– 1 root disk 9, 11 Aug 27 02:24 /dev/md11
0 brw-r—– 1 root disk 9,  2 Aug 27 02:22 /dev/md2
0 brw-r—– 1 root disk 9,  4 Aug 27 02:24 /dev/md4
0 brw-r—– 1 root disk 9,  5 Aug 27 02:27 /dev/md5
0 brw-r—– 1 root disk 9,  6 Aug 27 02:27 /dev/md6
0 brw-r—– 1 root disk 9,  7 Aug 27 02:27 /dev/md7
0 brw-r—– 1 root disk 9,  8 Aug 27 02:27 /dev/md8

We can see that there are metadevices for most of the partitions but not for:

  • /dev/sda3 which is used for the first celldisk
  • /dev/sdb3 which is used for the second celldisk

Lets see what the software raid setup is like using mdadm first with /dev/md2 which we saw earlier was used for the swap partition. First lets confirm this using swapon.

[root@dmorlcel05 grub]# swapon -s
Filename                                Type            Size    Used    Priority
/dev/md2                                partition       2096376 0       -1

Now lets see what /dev/md2 is:

root@dmorlcel05 grub]# mdadm --misc -D /dev/md2
/dev/md2:
        Version : 0.90
  Creation Time : Thu Feb  4 06:03:17 2010
     Raid Level : raid1
     Array Size : 2096384 (2047.59 MiB 2146.70 MB)
  Used Dev Size : 2096384 (2047.59 MiB 2146.70 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Sun Oct  9 04:22:45 2011
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : 3db204f1:d7a6a84b:cd086034:be3fb671
         Events : 0.58

    Number   Major   Minor   RaidDevice State
       0       8        9        0      active sync   /dev/sda9
       1       8       25        1      active sync   /dev/sdb9

We can see that /dev/md2 is mirroring across partitions /dev/sda9 and /dev/sdb9.

Lets see about some other partitions, firstly the boot file system on /dev/md4:

[root@dmorlcel05 grub]# mdadm --misc -D /dev/md4
/dev/md4:
        Version : 0.90
  Creation Time : Thu Feb  4 06:03:28 2010
     Raid Level : raid1
     Array Size : 120384 (117.58 MiB 123.27 MB)
  Used Dev Size : 120384 (117.58 MiB 123.27 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 4
    Persistence : Superblock is persistent

    Update Time : Fri Oct 14 06:44:52 2011
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : e325de40:410dd5d7:f477be17:f36a61aa
         Events : 0.64

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1

We can see that the boot file system is also mirrored across /dev/sda1 and /dev/sdb1. Next is the root file system:

[root@dmorlcel05 grub]# mdadm --misc -D /dev/md6
 /dev/md6:
         Version : 0.90
   Creation Time : Thu Feb  4 06:03:41 2010
      Raid Level : raid1
      Array Size : 10482304 (10.00 GiB 10.73 GB)
   Used Dev Size : 10482304 (10.00 GiB 10.73 GB)
    Raid Devices : 2
   Total Devices : 2
 Preferred Minor : 6
     Persistence : Superblock is persistent
    Update Time : Fri Oct 14 07:15:20 2011
           State : clean
  Active Devices : 2
 Working Devices : 2
  Failed Devices : 0
   Spare Devices : 0
           UUID : 73867382:a65ee3ec:8bc6831d:a67e037c
          Events : 0.6
    Number   Major   Minor   RaidDevice State
        0       8        6        0      active sync   /dev/sda6
        1       8       22        1      active sync   /dev/sdb6

Next the Oracle base file system on /dev/md8:

[root@dmorlcel05 grub]# mdadm --misc -D /dev/md8
/dev/md8:
        Version : 0.90
  Creation Time : Thu Feb  4 06:04:15 2010
     Raid Level : raid1
     Array Size : 2096384 (2047.59 MiB 2146.70 MB)
  Used Dev Size : 2096384 (2047.59 MiB 2146.70 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 8
    Persistence : Superblock is persistent

    Update Time : Fri Oct 14 07:16:41 2011
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : 353f0c30:88a291fa:36b88fb4:d274f195
         Events : 0.60

    Number   Major   Minor   RaidDevice State
       0       8        8        0      active sync   /dev/sda8
       1       8       24        1      active sync   /dev/sdb8

Finally we have the log filesystem:

[root@dmorlcel05 grub]# mdadm --misc -D /dev/md11
/dev/md11:
        Version : 0.90
  Creation Time : Tue Aug 24 15:55:27 2010
     Raid Level : raid1
     Array Size : 2433728 (2.32 GiB 2.49 GB)
  Used Dev Size : 2433728 (2.32 GiB 2.49 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 11
    Persistence : Superblock is persistent

    Update Time : Fri Oct 14 07:19:15 2011
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : f04ecbf4:58fdc681:0e1668d1:9049b7b4
         Events : 0.66

    Number   Major   Minor   RaidDevice State
       0       8       11        0      active sync   /dev/sda11
       1       8       27        1      active sync   /dev/sdb11 

We can see that the root, boot, cell software, cell log and swap partitions are mirrored onto partitions of the first
two disks using mdadm. But this leaves some partitions on each of the disks unaccounted for. For example we see no evidence of  /dev/md5 or /dev/md7 and the partitions over which they might be created.

To understand this we must see how cell get imaged. A  cell contains both the current image and the previous image of root and software file systems as well as different kernels. When a new image is “patched” onto the cell, the “inactive” partitions on /dev/sda and /dev/sdb get updated with the new image and updates are made to /etc/fstab and to the grub.conf file for booting the new images kernel. When the cell is rebooted, the new image is the active image and the previous active image becomes the inactive one. We can explore this using some utilities used on the database machine.

[root@dmorlcel05 grub]# imageinfo -all

Kernel version: 2.6.18-194.3.1.0.4.el5 #1 SMP Sat Feb 19 03:38:37 EST 2011 x86_64
Cell version: OSS_11.2.0.3.0_LINUX.X64_110429.1
Cell rpm version: cell-11.2.2.3.1_LINUX.X64_110429.1-1

Active image version: 11.2.2.3.1.110429.1
Active image created: 2011-04-29 21:20:48 -0700
Active image activated: 2011-05-09 16:26:31 -0400
Active image type: production
Active image status: success
Active internal version:
Active image label: OSS_11.2.0.3.0_LINUX.X64_110429.1
Active node type: STORAGE
Active system partition on device: /dev/md6
Active software partition on device: /dev/md8

In partition rollback: Impossible

Cell boot usb partition: /dev/sdm1
Cell boot usb version: 11.2.2.3.1.110429.1

rmdir: /mnt/dev/md5: Directory not empty
Inactive image version: 11.2.1.2.6
Inactive image created: 2010-05-11 13:37:31 -0700
Inactive image activated: 2010-10-05 02:34:01 -0400
Inactive image type: production
Inactive image status: success
Inactive internal version: 999999
Inactive image label: OSS_11.2.1.2.6_LINUX.X64_100511
Inactive node type: STORAGE
Inactive system partition on device: /dev/md5
Inactive software partition on device: /dev/md7

Boot area has rollback archive for the version: 11.2.1.2.6
Rollback to the inactive partitions: Possible

Here we can see that /dev/md5 and /dev/md7 are “inactive. Each of these metadevices is mirrored across partitions on /dev/sda and /dev/sdb. Note that there is also a utility to show all the images that have been on the cell:

[root@dmorlcel05 grub]# imagehistory -a
Version                              : 11.2.1.2.1
Image activation date                : 2010-02-04 15:00:53 -0800
Imaging mode                         : fresh
Imaging status                       : success
Imaging type                         : image
Image creation date                  : 2010-02-01 02:52:14 -0800
Image type                           : production
Image node                           : STORAGE
Internal version                     : 100131
Internal label                       : OSS_11.2.1.2.1_LINUX.X64_100131
Internal cell rpm version            : undefined
Internal cell software version       : undefined
Image id                             : 1265021534

Version                              : 11.2.1.2.3
Image activation date                : 2010-04-30 12:12:04 -0400
Imaging mode                         : patch
Imaging status                       : success
Imaging type                         : in partition
Image creation date                  : 2010-03-05 11:35:15 -0800
Image type                           : production
Image node                           : STORAGE
Internal version                     : 999999
Internal label                       : OSS_11.2.1.2.3_LINUX.X64_100305
Internal cell rpm version            : undefined
Internal cell software version       : undefined
Image id                             : 1267817715

Version                              : 11.2.1.2.4
Image activation date                : 2010-04-30 12:21:54 -0400
Imaging mode                         : patch
Imaging status                       : success
Imaging type                         : in partition
Image creation date                  : 2010-04-13 21:53:04 -0700
Image type                           : production
Image node                           : STORAGE
Internal version                     : 999999
Internal label                       : OSS_11.2.1.2.4_LINUX.X64_100413
Internal cell rpm version            : undefined
Internal cell software version       : undefined
Image id                             : 1271220784

Version                              : 11.2.1.3.1
Image activation date                : 2010-08-24 18:13:07 -0400
Imaging mode                         : out of partition upgrade
Upgrade logs                         : /var/log/cellos/patchlogs/1286251665.ex_part_rollback.11.2.1.2.4_11.2.1.3.1/1286251665.ex_part_rollback.11.2.1.2.4_11.2.1.3.1.tar.gz
Imaging status                       : success
Imaging type                         : out of partition
Image creation date                  : 2010-08-18 21:52:15 -0700
Image type                           : production
Image node                           : STORAGE
Internal version                     : 100818.1
Internal label                       : OSS_11.2.1.3.1_LINUX.X64_100818.1
Internal cell rpm version            : undefined
Internal cell software version       : undefined
Image id                             : 1282193535

Version                              : 11.2.1.2.4
Image activation date                : 2010-10-05 00:24:08 -0400
Imaging mode                         : out of partition rollback
Imaging status                       : success
Imaging type                         : in partition
Image creation date                  : 2010-04-13 21:53:04 -0700
Image type                           : production
Image node                           : STORAGE
Internal version                     : 999999
Internal label                       : OSS_11.2.1.2.4_LINUX.X64_100413
Internal cell rpm version            : undefined
Internal cell software version       : undefined
Image id                             : 1271220784

Version                              : 11.2.1.2.6
Image activation date                : 2010-10-05 02:34:01 -0400
Imaging mode                         : patch
Imaging status                       : success
Imaging type                         : in partition
Image creation date                  : 2010-05-11 13:37:31 -0700
Image type                           : production
Image node                           : STORAGE
Internal version                     : 999999
Internal label                       : OSS_11.2.1.2.6_LINUX.X64_100511
Internal cell rpm version            : undefined
Internal cell software version       : undefined
Image id                             : 1273610251

Version                              : 11.2.2.3.1.110429.1
Image activation date                : 2011-05-09 16:26:31 -0400
Imaging mode                         : out of partition upgrade
Imaging status                       : success
Imaging type                         : out of partition
Image creation date                  : 2011-04-29 21:20:48 -0700
Image type                           : production
Image node                           : STORAGE
Internal version                     : undefined
Internal label                       : OSS_11.2.0.3.0_LINUX.X64_110429.1
Internal cell rpm version            : OSS_11.2.2.3.1_LINUX.X64_110429.1
Internal cell software version       : OSS_11.2.0.3.0_LINUX.X64_110429.1
Image id                             : 1304137248

Note: the imageinfo and imagehistory may also be used on the database nodes in a database machine but the output is somewhat different for imageinfo. Here is imageinfo on a database server node ni the database machine:

[root@dmorldb03 ~]# imageinfo -all

Kernel version: 2.6.18-194.3.1.0.4.el5 #1 SMP Sat Feb 19 03:38:37 EST 2011 x86_64
Image version: 11.2.2.3.5.110815
Image created: 2011-08-16 13:35:02 -0700
Image activated: 2011-10-08 17:28:56 -0400
Image image type: production
Image status: success
Internal version:
Image label: OSS_11.2.2.3.5_LINUX.X64_110815
Node type: COMPUTE
System partition on device: /dev/sda1

And here is the imagehistory output on a database node:

[root@dmorldb03 ~]# imagehistory -a
Version                              : 11.2.1.2.1
Image activation date                : 2010-02-03 22:59:09 -0800
Imaging mode                         : fresh
Imaging status                       : success
Imaging type                         : image
Image creation date                  : 2010-02-01 02:53:01 -0800
Image type                           : production
Image node                           : COMPUTE
Internal version                     : 100131
Internal label                       : OSS_11.2.1.2.1_LINUX.X64_100131
Image id                             : 1265021581

Version                              : 11.2.1.2.3
Image activation date                : 2010-04-30 14:29:15 -0400
Imaging mode                         : patch
Imaging status                       : success
Imaging type                         : in partition
Image creation date                  : 2010-03-05 11:38:07 -0800
Image type                           : production
Image node                           : COMPUTE
Internal version                     : 999999
Internal label                       : OSS_11.2.1.2.3_LINUX.X64_100305
Image id                             : 1267817887

Version                              : 11.2.1.3.1
Image activation date                : 2010-08-24 18:50:13 -0400
Imaging mode                         : patch
Imaging status                       : success
Imaging type                         : in partition
Image creation date                  : 2010-08-18 21:49:31 -0700
Image type                           : production
Image node                           : COMPUTE
Internal version                     : 100818.1
Internal label                       : OSS_11.2.1.3.1_LINUX.X64_100818.1
Image id                             : 1282193371

Version                              : 11.2.2.3.1.110429.1
Image activation date                : 2011-05-09 17:58:47 -0400
Imaging mode                         : patch
Imaging status                       : success
Imaging type                         : in partition
Image creation date                  : 2011-04-29 21:14:31 -0700
Image type                           : production
Image node                           : COMPUTE
Internal version                     : undefined
Internal label                       : OSS_11.2.0.3.0_LINUX.X64_110429.1
Image id                             : 1304136871

Version                              : 11.2.2.3.5.110815
Image activation date                : 2011-10-08 16:15:41 -0400
Imaging mode                         : patch
Imaging status                       : success
Imaging type                         : in partition
Image creation date                  : 2011-08-16 13:35:02 -0700
Image type                           : production
Image node                           : COMPUTE
Internal version                     : undefined
Internal label                       : OSS_11.2.2.3.5_LINUX.X64_110815
Image id                             : 1313526902

Version                              : 11.2.2.3.1.110429.1
Image activation date                : 2011-10-08 17:01:49 -0400
Imaging mode                         : rollback
Imaging status                       : success
Imaging type                         : in partition
Image creation date                  : 2011-04-29 21:14:31 -0700
Image type                           : production
Image node                           : COMPUTE
Internal version                     : undefined
Internal label                       : OSS_11.2.0.3.0_LINUX.X64_110429.1
Image id                             : 1304136871

Version                              : 11.2.2.3.5.110815
Image activation date                : 2011-10-08 17:28:56 -0400
Imaging mode                         : patch
Imaging status                       : success
Imaging type                         : in partition
Image creation date                  : 2011-08-16 13:35:02 -0700
Image type                           : production
Image node                           : COMPUTE
Internal version                     : undefined
Internal label                       : OSS_11.2.2.3.5_LINUX.X64_110815
Image id                             : 1313526902

I hope this helps in the understanding of how cells maintain their images!

Joel

October 2011


Posted in Oracle | 8 Comments »

Train Your DBAs – From DBATRAIN

Posted by Joel Goodman on 12/10/2011


Last chance to contribute to the footprint of the new Oracle DBA training curriculum and certification track footprint.

The Job Task Analysis project survey closes at end of the month and will be used to help define the structure and content of the Oracle University curriculum for the next release. This covers the core DBA training curriculum, advanced topics such as Grid Infrastructure, RAC, Data Guard, Partitioning, Exadata Database Machine, Performance Management and much more.

There will of course be a certification track for OCA and OCP based on the core curriculum and several Oracle Certified Expert (OCE) exams based on advanced topics, and of course the Oracle Certified Master (OCM) exam.

I will be meeting with Harald van Breederode, Uwe Hesse and Branislav Valny from the Oracle University EMEA SME team, and with the DBA curriculum Managers from Oracle University to help create the blueprint for the next generation of training and certification paths.

Share your views by taking  this online survey and tell us what tasks are important to you.

Your input will assure that Oracle University courses and certification tracks, reflect the evolving skill set and job requirements of today’s DBA 2.0

Joel

October 2011

Posted in Oracle | Leave a Comment »

Calling All DBAs to Survey Their Tasks

Posted by Joel Goodman on 09/09/2011


You may have been to an Oracle DBA class, or done your certification. Perhaps you have considered doing so.

Either way, you can contribute to the definition of  the depth and scope of future Oracle Database training courses and Oracle certification exams?

Share your views by taking  this online survey and tell us what tasks are important to you.

Your input will assure that Oracle University courses and certification tracks, reflect the evolving skill set and job requirements of today’s DBA 2.0

Joel

September 2011

Posted in Oracle | Leave a Comment »

Control How You Share Your Snaps

Posted by Joel Goodman on 01/09/2011


On a recent Oracle Certified Masters (OCM) prep workshop, a question arose concerning the purpose and use of the RMAN Snapshot control file. This in turn led to some interesting exploration of a change in Oracle 11g R2 beginning in 11.2.0.2.

To understand this change it is first a good idea to review the snapshot controlfile which in turn requires an understanding of RMAN Resync operations.

 

RMAN Resync Operations
RMAN performs Resync operations when the RMAN Client is connected to a target and an RMAN catalog. There are also two ways for Resync to be invoked:

  • Manual Resync – using the RESYNC CATALOG command;
  • Automatic Resync – this may be partial or full as explained below.

There are two types of Resync:

  • Full Resync
  • Partial Resync

To understand when these are done, consider that the controlfile contains certain records that are able to be overwritten based on the CONTROLFILE_RECORD_KEEP_TIME parameter. These are records in the controlfile, that contain details of archivelogs produced and which may be overwritten once those details are copied to the catalogue. But many of the other controlfile record types are not overwritten as they represent persistent configuration details. These records include but are not restricted to things such as:

  • Database Record
  • Redo Thread Records
  • Log File Records
  • Checkpoint Progress Records
  • Tablespace Records
  • Filename Records
  • RMAN Persistent Configuration Records

Details of all the record types, including the number of records per type is available in the V$CONTROLFILE_RECORD_SECTION view.

When changes occur only to the records that may be overwritten, then a Partial Resync is performed automatically the next time that the RMAN client connects to the target and the catalogue and does any of the following:

  • Backup
  • Restore
  • Recover

A Full Resync occurs in the following cases:

  • When changes occur to any non-overwriteable section of the controlfile and any of the three commands listed above is issued
  • When a database is first registered in the catalogue using the REGISTER DATABASE command
  • When a RESYNC command is issued from the RMAN Prompt

Snapshot Controlfile Locations

Prior to Oracle 11.2.0.2, The Snapshot Controlfile was used only during a Full Resync to reduce the duration of the CF ENQUEUE held exclusively on the controlfile. The enqueue prevents updates to any of the control file records, because there are dependencies between records in some sections and others. For example, there are tablespace records and file records, and during a Full Resync, the consistency between these two record types must be maintained until all the controlfile records are resync’ed to the catalogue. This prevents Oracle processes making changes to anything in the controlfile that might cause the records to become inconsistent.

But the controlfile is frequently updated in a busy system, so instead of holding the CF ENQUEUE for the duration of the resync, the controlfile is essentially copied to the “Snapshot” controlfile location in one go whilst holding the CF ENQUEUE, and then it is dequeued. The catalogue is then resync’ed from the “Snapshot”.

By default, the “Snapshot” is written to the $ORACLE_HOME/dbs directory which may be seen by doing an RMAN SHOW ALL command. But this may be configured to another location if required.

In RAC for example, the directory for the snapshots, must be a valid path name on any node in the cluster which has an instance of the database, to guarantee that the snapshot may be created regardless of which instance is serving the RMAN client. But here is where the change occurs.

Prior to 11.2.0.2, the Snapshot Controlfile location did NOT require shared storage, only that the path was valid from all nodes as stated above. From 11.2.0.2, it must be in shared storage when using RAC. If not then one might see the following:

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of Control File and SPFILE Autobackup command on
ORA_DISK_1 channel at 31/08/2011 13:34:07
ORA-00245: control file backup operation failed

What is going on?  The answer is that the Snapshot Controlfile is being used for ordinary controlfile backups from 11.2.0.2 for:

  • Control file backups taken manually from SQL*PLUS
  • Control file auto backups taken by the Oracle server when configured from the RMAN client

For example here is what happens if a control file backup is attempted in RAC in 11.2.0.2 when not using shared storage:

SQL> alter database backup controlfile to ‘/u01/app/oracle/product/11.2.0.2/dbhome_i/dbs/ctlbackup.ctl’ ;
ERROR at line 1:
ORA-00245: control file backup operation failed

The $ORACLE_HOME/dbs directory is normally local in RAC, since the entire ORACLE_HOME is on local storage. This facilitates rolling patches, but from 11.2.0.2 the dbs directory is no longer useful for the snapshots.

The reason for this change is that control file backups traditionally used the CF ENQUEUE to lock the controlfile for the duration of the backup but in some high workload systems this was a performance bottleneck. The solution, was to take the controlfile backup from the snapshot and avoid holding the CF ENQUEUE whilst doing it. But that required maintaining the snapshot as changes occur to the controlfile. Since changes could occur from any instance of the database, the snapshot must be on shared storage, just like the proper control file is.

Look at the OERR output for the error:

> oerr ora 00245

00245, 00000, “control file backup failed; target is likely on a local file system”
// *Cause: Failed to create a control file backup because some process
// signaled an error during backup creation. This is likely caused
// by the backup target being on a local file system so it could not
// be accessed by other instances. It can also be caused by other
// I/O errors to the backup target. Any process of any instance that
// starts a read/write control file transaction must have access
// to the backup control file during backup creation.
// *Action: Check alert files of all instances for further information.

When upgrading to 11.2.0.2 under RAC, make sure to avoid this problem by using RMAN to configure the Snapshot Controlfile to a shared location:

RMAN> CONFIGURE SNAPSHOT CONTROLFILE NAME TO ‘<a_shared_location>/snapcf_?.f’;

Joel

01/09/2011

Posted in Oracle | 1 Comment »

Add Bulk to your Parallel Updates

Posted by Joel Goodman on 01/07/2011


 

Oracle Database server 11gR2 added several new Parallel Execution features. I had several questions on a recent Exadata administration course concerning parallelism, and an interesting discussion arose about these new features. Although parallel execution is useful in all Oracle Database Data Warehouse implementations, and often in batch systems, it is particularly relevant to Exadata. In Exadata, “smart” storage operations, which offload filtering, column projections and some join overheads to the cells may be done for:

  • Full Table Scans
  • Fast Full Index Scans

But these “smart” scans, only occur if a “direct path read” access method is done. This occurs in the following situations:

  • On a Parallel Query
  • On a Serial Query – when the size of the table exceeds the “_small_table_threshold” hidden parameter
  • On a Serial Query – when “_serial_direct_read” hidden parameter is set to “always” regardless of table size

Note: both these parameters are session modifiable.

In addition for Exadata, “direct path insert” is required when loading rows into Hybrid Columnar Compressed (HCC) tables, in order to get the HCC compression done into multi-block compression units. Direct path insert may occur when:

  • When inserting in parallel
  • When doing “Create table as select” (CTAS) in parallel
  • When inserting serially using the “append” hint

Since Exadata is used for Data Warehouses, Parallel Execution is likely to be used very often, both for queries of large tables and for bulk inserts and updates of existing tables.

The new features of Parallel Execution in Oracle 11gR2 are:

  1. Automatic Degree of Parallelism – to set Degree of Parallelism automatically for a statement
  2. Parallel Statement Queuing – to queue a statement if inadequate execution slave processes are not available
  3. In Memory Parallel Updates – to utilise the RAC buffer caches for Parallel queries as an alternative to cache fusion overheads
  4. Parallel Bulk Updates – to allow updates to be done in sections which an succeed or fail independently of other sections

In this post I will focus on Parallel Bulk updates which:

  • Are useful for bulk updates of large data volumes
  • Avoids the “All or Nothing” approach of traditional Parallel DML
  • Groups sets of rows into smaller sized chunks
  • Chunk updates run by a scheduler job which commits on completion
  • Chunking is based on one of three possible techniques:
    • Using ROWID ranges of rows in the queried table
    • Ranges of values in a numeric column of  the queried table
    • Results of User provided SQL statements

Parallel Bulk update is done using the DBMS_PARALLEL_EXECUTE package, which permits the chunking of rows using any of the 3 methods listed. Then the updates are done with several benefits:

  • Only one chunk of rows need be locked at one time, rather than having the entire table locked as in normal PDML
  • Chunks commit independently of of one another
  • Failure of one chunk update to commit, does not cause all the updates to roll back
  • Less Undo space is required since the chunks may commit their updates independently
  • Performance may benefit from the combination of reduced simultaneous locking, and reduced undo consumption
  • Avoids having a huge rollback of a nearly complete bulk update due to failure near the end
  • Failed chunks may be retried after correcting the problem causing the failure

Here is the DBMS_PARALLEL_EXECUTE package described in Oracle 11.2.0.2:

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for Linux: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production

SQL>  desc dbms_parallel_execute
PROCEDURE ADM_DROP_CHUNKS
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TASK_OWNER                     VARCHAR2                IN
 TASK_NAME                      VARCHAR2                IN
PROCEDURE ADM_DROP_TASK
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TASK_OWNER                     VARCHAR2                IN
 TASK_NAME                      VARCHAR2                IN
PROCEDURE ADM_STOP_TASK
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TASK_OWNER                     VARCHAR2                IN
 TASK_NAME                      VARCHAR2                IN
FUNCTION ADM_TASK_STATUS RETURNS NUMBER
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TASK_OWNER                     VARCHAR2                IN
 TASK_NAME                      VARCHAR2                IN
PROCEDURE CREATE_CHUNKS_BY_NUMBER_COL
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TASK_NAME                      VARCHAR2                IN
 TABLE_OWNER                    VARCHAR2                IN
 TABLE_NAME                     VARCHAR2                IN
 TABLE_COLUMN                   VARCHAR2                IN
 CHUNK_SIZE                     NUMBER                  IN
PROCEDURE CREATE_CHUNKS_BY_ROWID
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TASK_NAME                      VARCHAR2                IN
 TABLE_OWNER                    VARCHAR2                IN
 TABLE_NAME                     VARCHAR2                IN
 BY_ROW                         BOOLEAN                 IN
 CHUNK_SIZE                     NUMBER                  IN
PROCEDURE CREATE_CHUNKS_BY_SQL
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TASK_NAME                      VARCHAR2                IN
 SQL_STMT                       CLOB                    IN
 BY_ROWID                       BOOLEAN                 IN
PROCEDURE CREATE_TASK
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TASK_NAME                      VARCHAR2                IN
 COMMENT                        VARCHAR2                IN     DEFAULT
PROCEDURE DROP_CHUNKS
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TASK_NAME                      VARCHAR2                IN
PROCEDURE DROP_TASK
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TASK_NAME                      VARCHAR2                IN
FUNCTION GENERATE_TASK_NAME RETURNS VARCHAR2
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 PREFIX                         VARCHAR2                IN     DEFAULT
PROCEDURE GET_NUMBER_COL_CHUNK
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TASK_NAME                      VARCHAR2                IN
 CHUNK_ID                       NUMBER                  OUT
 START_ID                       NUMBER                  OUT
 END_ID                         NUMBER                  OUT
 ANY_ROWS                       BOOLEAN                 OUT
PROCEDURE GET_ROWID_CHUNK
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TASK_NAME                      VARCHAR2                IN
 CHUNK_ID                       NUMBER                  OUT
 START_ROWID                    ROWID                   OUT
 END_ROWID                      ROWID                   OUT
 ANY_ROWS                       BOOLEAN                 OUT
PROCEDURE PURGE_PROCESSED_CHUNKS
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TASK_NAME                      VARCHAR2                IN
PROCEDURE RESUME_TASK
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TASK_NAME                      VARCHAR2                IN
 SQL_STMT                       CLOB                    IN
 LANGUAGE_FLAG                  NUMBER                  IN
 EDITION                        VARCHAR2                IN     DEFAULT
 APPLY_CROSSEDITION_TRIGGER     VARCHAR2                IN     DEFAULT
 FIRE_APPLY_TRIGGER             BOOLEAN                 IN     DEFAULT
 PARALLEL_LEVEL                 NUMBER                  IN     DEFAULT
 JOB_CLASS                      VARCHAR2                IN     DEFAULT
 FORCE                          BOOLEAN                 IN     DEFAULT
PROCEDURE RESUME_TASK
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TASK_NAME                      VARCHAR2                IN
 FORCE                          BOOLEAN                 IN     DEFAULT
PROCEDURE RUN_INTERNAL_WORKER
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TASK_NAME                      VARCHAR2                IN
 JOB_NAME                       VARCHAR2                IN
PROCEDURE RUN_TASK
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TASK_NAME                      VARCHAR2                IN
 SQL_STMT                       CLOB                    IN
 LANGUAGE_FLAG                  NUMBER                  IN
 EDITION                        VARCHAR2                IN     DEFAULT
 APPLY_CROSSEDITION_TRIGGER     VARCHAR2                IN     DEFAULT
 FIRE_APPLY_TRIGGER             BOOLEAN                 IN     DEFAULT
 PARALLEL_LEVEL                 NUMBER                  IN     DEFAULT
 JOB_CLASS                      VARCHAR2                IN     DEFAULT
PROCEDURE SET_CHUNK_STATUS
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TASK_NAME                      VARCHAR2                IN
 CHUNK_ID                       NUMBER                  IN
 STATUS                         NUMBER                  IN
 ERR_NUM                        NUMBER                  IN     DEFAULT
 ERR_MSG                        VARCHAR2                IN     DEFAULT
PROCEDURE STOP_TASK
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TASK_NAME                      VARCHAR2                IN
FUNCTION TASK_STATUS RETURNS NUMBER
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 TASK_NAME                      VARCHAR2                IN

To divide the table into chucks, one of three procedures may be used:

  1. DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_ROWID
  2. DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_NUMBER_COL
  3. DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_SQL

Note: use of this package attempts to create jobs, so using this requires the CREATE_JOB system privilege.

Using ROWID based chunks

To chunk by rowid,  requires the CHUNK_SIZE parameter to be passed in addition to the table owner and table name. The CHUNK_SIZE is the rough number of rows or blocks in each chunk and therefore in each separately committed transaction. The boolean parameter BY_ROW determines of CHUNK_SIZE refers to rows or blocks.

Using numeric column value based chunks

To chunk by numeric column, requires the TABLE_COLUMN parameter, which names the numeric column used to determine the chunks and the CHUNK_SIZE parameter. For this method, the minimum and maximum values in the column are obtained from the Data Dictionary, and the table is then chunked evenly by CHUNK_SIZE. For popular values, there could be more than one chunk, and conversely a chunk could contain multiple values.

Using user provided SQL based chunks

To chunk using SQL requires that the user executing the package have execute privileges for the DBMS_SQL package, as this is called by the DBMS_PARALLEL_EXECUTE package to execute the SQL statement used to do the chunking. This method requires the SQL_STATEMENT parameter containing the text of the statement whose output is used to chunk the data, either by rowids or by numeric columns. It uses the BY_ROWID parameter. If this is  TRUE, then the SQL statement must have columns called START_ID and END_ID of type ROWID. If BY_ROWID is false, then then the SQL statement must have columns called START_ID and END_ID of type NUMBER.

Using the package

The package is generally used as follows:

  1. Call CREATE_TASK to establish the task providing a name
  2. Chunk the table using one of the three methods described above
  3. Call RUN_TASK, providing the SQL used to update the table
  4. Call DROP_TASK when complete

Some other procedures in the package allow the following operations:

  • Stopping a task and its job slaves using STOP_TASK
  • Resuming a stopped task and its job slaves using RESUME TASK
  • Managing and inquiring about a task’s or chunk’s status using TASK_STATUS or SET_CHUNK_STATUS
  • the ADM procedures are for DBAs and permit stopping and dropping tasks of other users, dropping chunks of other users and checking the task status of other users.

Note: to use RUN_TASK or RESUME_TASK also requires execute privileges on DBMS_SQL.

For a full list of all the procedures see the documentation here.The doc also contains some useful examples using the HR schema.

My Oracle support note 1066555.1 is also a useful reference.
Dictionary views used for this feature are:

  • DBA_PARALLEL_EXECUTE_TASKS
  • DBA_PARALLEL_EXECUTE_CHUNKS
  • USER_PARALLEL_EXECUTE_TASKS
  • USER_PARALLEL_EXECUTE_CHUNKS

Here are the DBA views:

SQL> desc dba_parallel_execute_tasks

 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 TASK_OWNER                                NOT NULL VARCHAR2(30)
 TASK_NAME                                 NOT NULL VARCHAR2(128)
 CHUNK_TYPE                                         VARCHAR2(12)
 STATUS                                             VARCHAR2(19)
 TABLE_OWNER                                        VARCHAR2(30)
 TABLE_NAME                                         VARCHAR2(30)
 NUMBER_COLUMN                                      VARCHAR2(30)
 TASK_COMMENT                                       VARCHAR2(4000)
 JOB_PREFIX                                         VARCHAR2(30)
 SQL_STMT                                           CLOB
 LANGUAGE_FLAG                                      NUMBER
 EDITION                                            VARCHAR2(30)
 APPLY_CROSSEDITION_TRIGGER                         VARCHAR2(30)
 FIRE_APPLY_TRIGGER                                 VARCHAR2(10)
 PARALLEL_LEVEL                                     NUMBER
 JOB_CLASS                                          VARCHAR2(30)

SQL> desc dba_parallel_execute_chunks
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 CHUNK_ID                                  NOT NULL NUMBER
 TASK_OWNER                                NOT NULL VARCHAR2(30)
 TASK_NAME                                 NOT NULL VARCHAR2(128)
 STATUS                                             VARCHAR2(20)
 START_ROWID                                        ROWID
 END_ROWID                                          ROWID
 START_ID                                           NUMBER
 END_ID                                             NUMBER
 JOB_NAME                                           VARCHAR2(30)
 START_TS                                           TIMESTAMP(6)
 END_TS                                             TIMESTAMP(6)
 ERROR_CODE                                         NUMBER
 ERROR_MESSAGE                                      VARCHAR2(4000)


Have a go at reducing the bulk of your updates!
Joel
July 2011



Posted in Oracle | 2 Comments »

 
Follow

Get every new post delivered to your Inbox.

Join 60 other followers