The Blog from the DBA Classroom

By: Joel Goodman

Recovered Yet??

Posted by Joel Goodman on 20/04/2009


In my post on Shared Sessions and Shared Servers I mentioned the new “Bits and Pieces” format at the UKOUG Technical SIGs which invited SIG delegates to speak spontaneously for as little as five minutes to share useful tips and experiences with their colleagues.

This was a nice addition to the SIG and consistant with my DBA classroom where delegates are invited to conribute if they can on my courses to everyone’s benefit.

I spoke about a few topics on that day each for a short time and the one that generated the most lively debate and discussion centred on “Backup and Recovery” strategy for distributed systems.

It began when I asked “who used DB links” which most delegates did. The next point was to distinguish between “Remote Transactions” and “Distributed Transactions” in order to see who might need to revisit their strategy.

In a “Remote Transaction” the local session updates one or more objects in a remote database over a DB link. When a “Commit” is issued by the local Database Instance, it causes a “Single Phase Commit” request to be passed over the DB link to be processed in the remote database which either commits or rolls back depending on its own internal processing requirements and which returns control to the calling instance.

In “Distributed Transactions”, two or more databases have objects updated necessitating a “Two Phased Commit” which is controlled from the Database instance where the client or middle tier is connected, also known as the “Global Coordinator”. “Two Phased” commit protocol guarantees that all participating instances will either commit or rollback and not have a mixed outcome. It is not my intention to cover the details of Oracle Database Two Phased Commit here as this is a large topic in itself which I plan to deliver later this year in a MasterClass format.

The “Backup and Recovery” strategy issue relates to situations where one of the participating databases has undergone a Database Point in Time Recovery (DBPITR) for any reason. This may have been caused by corruption, by incorrect input to a batch run, by loss of redo logs discovered during normal recovery; it does not matter really. If DBPITR is done to one of the databases, then the other databases that participate in the distributed transactions must be recovered back to the same global SCN in order for the entire distributed environment to once again be consistant.

I also told the group of a delegate on a course I delivered about four or five years ago whose jaw dropped when I mentioned this Distributed Systems Backup and recovery situation. She was a DBA for a large bank which four months prior, had performed  DBPITR  to one of three databases involved in an international distributed system but not to the other two databases. She spent much of the breaks and lunch for the next two or three days on her mobile in meetings to decide what should be done.

I asked the User Group how many of the 70 or so people using DB Links and distributed transactions had performed DBPITR in the past year. Only a couple of DBAs claimed to have done so. I then asked how many know how to perform this recovery and most were not sure or did not reply (it is an SCN based DBPITR for the other participating databases).

Furthermore one of the DBAs had a related issue where the database did not use links but stored some data on another system using AQ but where the AQ Data was not backed off when the Database had a DBPITR performed.

So the discussion then turned to listing those configurations and scenarios that we as a group could think of where coordinated Recovery is needed after a Database has DBPITR performed on it.

Here is the list we created of situations involving database transactions which include at least some updates done in some other environment outside the database instance to which the client connects:

1. Distributed Transactions Using DB Links

2. Systems updating data using UTL_FILE

3. Systems using AQ or Streams Queuing

4. Systems creating Data Pump External Tables

5. Systems making “extproc” callouts to programmes that update files

6. Systems using Bfile Lobs

7. Systems using XA for open Distributed Transactions

To me the best part of this 10 minutes was not my original idea of warning about recovery when using DB links but rather that the group as a whole then built upon this idea making an interesting discussion and expanding the original idea. But it certainly convinced many DBAs that they ought to revisit their “Backup and Recovery”  strategy. If you have any of these configurations, then  you might wish to do so as well.

Please comment if you have other scenarios to add to the list or if you have has a problem caused by the lack of a comprehensive recovery strategy.

Advertisements

2 Responses to “Recovered Yet??”

  1. Cool site, love the info.

  2. coskan said

    Thank you for pointing out this interesting but very important issue. I hope you will support this with a scenario/demonstration at a UKOUG meeting.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: