The Blog from the DBA Classroom

By: Joel Goodman

Archive for the ‘Oracle’ Category

Oracle Queries and Posts

Speaking on Ships can be Testing

Posted by Joel Goodman on 10/05/2013


1. The Ship

I took some time off working on the development of several Oracle Certification Exams to participate in the Norwegian Oracle User Group conference. Many other speakers and attendees have described the wonderful environment of the conference, comprising a land day in Oslo, followed by two days aboard the Colour Magic Cruise Ferry from Oslo to Kiel in northern Germany and back.

I had one talk on each day:

  1. Parallel Execution Master Class  – three hours on the first morning in the land day. Many people attended to “fill in the cracks” or Parallel execution and were also quite keen to understand the 11gR2 changes that permit automatic parallel execution. This talk covered:
    1. Parallel Execution Administration
    2. Parallel Execution Plans
    3. Parallel Query
    4. Parallel DML
    5. Parallel DDL
    6. Monitoring and Tuning Parallel Execution
  2. Global Resource Management in RAC – this was a conventional one hour talk on first day of cruise about the management and coordination of global resources such as:
    1. Global Enqueues
    2. Global Locks (Instance Locks)
    3. Global Buffer Cache
    4. Global Result cache
  3. Partitioning Master Class – this was a  two hour Master class on day two of the cruise for people with or without experience of partitioning and covered:
    1. All Table Partitioning Methods
    2. All Index Partitioning Methods
    3. Partition Maintenance Operations
    4. Partition Pruning (or Elimination)
    5. Partitioned Execution Plans

All the talks were well attended for the English Language stream, and I plan to do these at future conferences.

While at  the conference I made some new friends and enjoyed the company of  colleagues from past conferences too.

  1. Philippe Fierens – who attended my talks, discussed technology and philosophy at dinner and joined our group for drinks each night
  2. Jan Karremans – who exemplified the dedicated IT professional. Jan overcame getting motion sickness at sea to be at the conference and to share time with us at dinner and drinks
  3. Iloon Ellen-Wollf - from Oracle Support. I attended her talk on Oracle Public Cloud overview and learned a bit about how this offering works. I was pleased to see Oracle Support providing speakers to a User Group conference as this is not that common in my experience in speaking at such conferences.

2. The Exams

I have been working on various certification exam projects for Oracle in the past 9 months, either as an exam reviewer, or for some topics as one of the exam developers. some of the projects are:

  1. Certification projects for the next release of the Oracle Database Server.
  2. Development of the next version of Oracle Linux certification exams based on Linux 6
  3. Development of a certification exam for Oracle Database Cloud Admin
  4. Migrating the Oracle Certified Master Exam environment for Oracle 11g  to version 11..2.0.3
  5. Training the MySQL Developer Exam writing team in Oracle Item writing standards
  6. Training the MySQL Administration Exam writing team in Oracle Item writing standards
  7. Training the Solaris Exam writing team in Oracle Item writing

Exam development involves many steps from design of the blueprint for the exam right through to the exam going live, but the Subject Matter Experts who create exams must all be trained in the Oracle Item writing standards. This is a soft skill separate from the technical area of expertise. I have worked for many years with Julia Johnson, Global Director of Oracle Certification Exam development to create training guides, instructional videos and mentoring sessions, all with the aim of improving the standards of exam item writing.

One of the crucial stages in the process is the Beta test for any exam. So if you are interested in participating in the Beta programme for any Oracle certification, that is in Beta, please let me know and we can contact you closer to the time.

Joel

May 2013

Posted in Oracle | Leave a Comment »

Can Supporting an Exadata Database Machine Make You Certifiable?

Posted by Joel Goodman on 28/02/2013


One of my roles in Oracle is to help develop Oracle Certification exams, and I have been extremely busy in recent months. This work is a global activity, and is in addition to teaching courses and seminars for Oracle University in EMEA and to speaking at user groups and special interest group meetings.

The role entails:

  • Collaborating with Oracle Database or Linux Subject matter experts to help write exam items for Oracle Certified Expert  (OCE) certifications
  • Reviewing entire exams written by the core technology exam development team for Oracle Certified Associate (OCA) or Oracle Certified Professional (OCP) certifications
  • Teaching exam item writing workshops to other exam development teams with Oracle, including the Solaris and MySQL item writing teams.
  • Reviewing existing exams to assess where improvements can be made by exam developers for future projects.

The exam development projects in which I have participated in recent months are:

  1. Oracle Database OCP upgrade exam for the next release of the Oracle Database server for existing OCPs
  2. Oracle Database OCA exam for the next release release of the Oracle Database server for new OCA candidates
  3. Bronze exam for certain markets, based on the 2 day DBA overview course
  4. Oracle 11g SQL Tuning OCE exam
  5. Oracle Enterprise Manager Cloud Control  Cloud Administration exam
    • For Oracle Infrastructure as a Service (IaaS) Cloud Administrators -
    • For Oracle Platform as a Service (PaaS) Cloud Administrators.  This includes Database as a Service (DBaaS) and Middleware as a Service (MWaaS)
  6. Oracle Exadata Database Machine X3 Administration Oracle Certified Expert exam

Whilst all these exams have their interesting aspects, the focus of this post is the Database Machine OCE exam.

The Database Machine Support Skillset Challenge

Much has been written by myself and others about the different skills involved in provisioning a Database Machine and then maintaining and monitoring the various components. This skill list can be briefly summarised as follows:

  1. Hardware provisioning, installation and maintenance – A Database Machine requires floorspace, adequate ventilation and power supply
  2. Network Administration – A Database Machine must have its Ethernet switch and some other components connected to the corporate LAN. There may also be a requirement to connect Media servers to the Database Machine InfiniBand network.
  3. Operating System Administration  – The Database nodes, also referred to as “Compute” nodes, require O/S administration including backups and patching
  4. Exadata Storage Administration – The Storage nodes or “Cells” need administering and monitoring of Celldisks, Griddisks, Flashcache and other resources
  5. Grid Infrastructure Administration – The Clusterware and Oracle ASM must be administered and monitored> ASM in particular is crucial for monitoring the mirrored disk groups, because Exadata uses ASMs host based mirroring. Mirroring is not done in the Cells.
  6. Oracle Database Administration – this includes Single Instance Oracle, RAC and RAC One Node as all three are possible on a Database Machine

The challenge for an organisation is figuring out how to support a Database Machine (or any other “Engineered System”). There are three models that I have encountered amongst the attendees to the Oracle Database Machine Admin course over the past three to four years:

  1. The Silo Model – This involves having DBAs look after any Oracle products. So DBAs must skill up on all the skills listed above. The advantage is that the support organisation has all the skills in the one team and each DBA can handle support requirements in several areas, but this requires training to expand the traditional DBA skill set.
  2. The Team Model – This involves creating a new team which amongst all the members has the required skills, but where each member is responsible for their own area of expertise. This model requires less training as the members have the required skills already, but requires a change to the organisation.
  3. The Virtual Team Model – This involves creating a virtual team which amongst all the members has the required skills, but where each member is responsible for their own area of expertise. This model requires less training as the members have the required skills already, and requires no real change to existing organisational structures, apart from agreeing on the terms of reference for the virtual team.  This is the most common model I have encountered.

The Database Machine Certification Blueprint Challenge

The challenge for developing the Oracle Exadata Database Machine X3 Administration exam was to address the skill set profile of the typical administrator in light of the  preponderance of the Virtual Team Model. Since few people would be likely to have all the skills to support the Database Machine by themselves, we focused the exam content blueprint on the main skills that one uses in day to day administration. This is predominantly a DBA-centric skill set.

Here is a general outline of the skills and how we treated them for the purposes of the exam:

  • In-depth knowledge of Exadata Storage Server Administration. This is the key component of the Database Machine. The Storage is very DBA-centric, as it is defines specifically for Oracle Databases and ASM, and is not a general purpose storage solution
  • In-depth knowledge of ASM features relating specifically to Exadata based disk groups. This includes but is not restricted to ASM mirroring, Exadata-specific disk group parameters, and Exadata specific monitoring.
  • In Depth knowledge of Clusterware features relating specifically to Exadata resource management and availability. This includes but is not restricted to I/O fencing in for  Exadata based ASM disk groups, partitioning the Exadata storage grid to support multiple database clusters and maintaining storage availability to the storage during patching.
  • In Depth knowledge of Oracle Database administration relating to the use of Exadata based ASM disk groups. This includes but is not limited to Monitoring SQL execution plans when objects reside in Exadata bases ASM disk groups, Exadata specific wait events and Exadata specific statistics.
  • Overview knowledge of network administration. This includes network overview, general IP address and Ethernet connectivity requirements, very general knowledge of the InfiniBand switch topography and port requirements.
  • Overview knowledge of Operating System maintenance requirements. This includes but is not limited to differentiating between the storage nodes and compute nodes, knowing what is and what is not permitted regarding O/S patching and administration, and maintaining high availability during patching.
  • Overview of The Database Machine Hardware components. This includes but is not limited to rack sizes, disk options. site planning, installation and support.
  • Concept Only Overview of Monitoring the Database Machine with Enterprise Manager.  Since both Grid Control and Cloud Control are available for monitoring, the exam focuses only on concepts including but not restricted to as SNMP traps, plug-ins and target discovery

For a complete list of possible exam topics visit the Certification page for the Database Machine Exam and click on the “Exam Topics” tab.

The Exam Development and Quality Assurance Challenge

The Exam was written by a team Oracle University Subject Matter Experts who were all experienced in the product. The review and Beta Test involved Experts from Oracle ACS, with field experience, and other SMEs who were not involved in exam item writing.

If you work with the Exadata Database Machine, preparing for this certification will help you gain a mastery of the skills most often used on a day to day basis as well as a good working knowledge of all the other skill areas listed above.

Joel

London February 2013

Posted in Oracle | 3 Comments »

UKOUG Conference 2012 and Oracle Certification – Are they on Your Agenda?

Posted by Joel Goodman on 14/09/2012


The UKOUG launched the 2102 Server Technology conference agenda yesterday evening, at an informal gathering at the Rack and Tenter pub near the Oracle London City Office. I had a chance to see the conference agenda for the first time since the conference planning conference that I attended in July in Reading.

I met with other presenters, some Oracle colleagues and various others, some of whom came as guests of the invitees, but who also had interesting things to add to the evenings proceedings. Good food, some drinks and even time to debate the finer points of comedy and comedians, and the complexities of Rugby laws for people who dont know the game. As I am an amateur, county level rugby referee, this was not a bad topic for an Oracle meeting!
This is especially so, given that the guest speaker at this year’s conference will be an Ex-England rugby captain.

Also had some time to catch up with friends and colleagues from the user group and as usual, the organisers have done a fine job in creating a top quality agenda for this year.

New in the agenda is the inclusion of “Engineered Systems” as a separate category for talks and round table sessions to cover the various Oracle engineered solutions:

  • Oracle Exadata Database Machine
  • Oracle Database Appliance
  • Oracle Exalogic Elastic Cloud
  • Oracle Sparc Super Cluster
  • Oracle Exalytics In-Memory Machine
  • Oracle Big Data Appliance
  • Sun ZFS Storage Appliance

At this year’s conference I will be involved in the following activities:

  1. Presenting a talk on Oracle Partitioning New Features
  2. Presenting a talk on RAC Global Resource Management
  3. Chairing the Data Guard Round Table Discussion Group
  4. Chairing the Living with Engineered Systems Round Table Discussion Group
  5. Chairing the Grid Infrastructure and RAC Round Table Discussion Group

I have spent the last four months working various certification developments projects and hence had no time to post new articles. Part of my role in Oracle is that of Global team leader for DBA Certification Exam Development and in that capacity I:

  • Write large parts of the Oracle Certified Expert (OCE) DBA exams
  • Review all DBA Oracle Certified Associate (OCA) exams
  • Review all DBA Oracle Certified Professional (OCP) exams
  • Define the content of the DBA Oracle Certified Master (OCM) exams
  • Design the OCM Delivery system used in Oracle University Classrooms

Part of this involved New feature Exams for upgrading OCP when a new release of Oracle is available and I’ve been looking at that this summer.

Earlier this summer I recorded three Oracle Certification Exam Prep Seminars which help exam candidates review the exam objectives, and assess their own skills gap as an aid to revising for the exam. You can preview them using the following links

There will also be a segment on the Oracle Certified Master Exam covering:

  • Preparing for the exam
  • How the exam is conducted
  • How to handle the pressure of time
  • What happens after the exam

This part is still in post production editing at the time of writing this post.

I hope to see some friends and colleagues at this year’s conference and also hope that people planning on maintaining their certification or attaining new ones find the prep seminars useful.

 

Joel

September 2012

Posted in Oracle | 1 Comment »

Great Speakers – and in Stereo Too!

Posted by Joel Goodman on 14/05/2012


It has been my pleasure to be involved for the past 14 years with the UKOUG as an Oracle Liaison to the DBA oriented technical SIGs, as a speaker at the SIGs and at the Annual Conference.

This is one of the best if not the best Oracle conferences in the world, organised by a dedicated team of conference specialists, with many years of expertise in both technology and conference organisation.

It features well known experts in the latest database, O/S and storage technology and a culture of sharing information and ideas. Many of the bloggers and authors that you may have read of whom you have heard, will be there.

If you have expertise, then there will be others willing to listen and share. There is no better way to cut to the key issues, than by meeting other experts who can also share with you.

There is also a great atmosphere, with social evening programmes and entertainment.

So consider submitting one or more abstracts, while time remains to do so.

To register go to the UKOUG Conference 2012 Call for Papers page.

I hope you can make it!!

Joel

London

May 2012

Posted in Oracle | Leave a Comment »

Smooth Sailing at the OUGN

Posted by Joel Goodman on 22/03/2012


I have spent the past two days at the Norwegian Oracle User Group (OUGN) along with many  colleagues and friends from the Oracle world in both EMEA and further afield. I’ve been honoured to have been invited for the past three years and am joined this year by two distinguished Oracle University trainers with whom I regularly collaborate on the EMEA OU Subject Matter Experts team: Harald van Breederode and Uwe Hesse.

Several other people have blogged about the various events but the unique aspect of this event is the venue:

Held on the Colour Magic Cruise Ferry sailing from Oslo to Kiel and then back over two days makes this conference unlike any other in which I have participated.

This year we had a land day on Wednesday in Olso first followed by the cruise portion of the conference.

Uwe will speak on “Data Guard Real Time Query”; a feature that is part of Active Data Guard on Friday.

Harald has already presented two talks:

  • Server Side Smart Flash cache on Wednesday in Oslo
  • Adaptive Cursor Sharing today on board the Colour Magic

Harald will finish tomorrow with a talk on Client Connectivity in a Data Guard environment.

 

I presented two talks already:

  • Database Machine Overview and Features on Wednesday in Oslo
  • Database Machine Monitoring using Enterprise Manager and Command line on board the Colour Magic

I will finish tomorrow with a talk on Automatic Parallelism from 11gR2.

 

It is a  well run and friendly conference and the organisers are to be congratulated for the way they do this year after year.

 

Joel

 

Somewhere at Sea

22/03/2012

 

 

 

Posted in Oracle | Leave a Comment »

Weblogic JTA

Posted by Joel Goodman on 20/03/2012


Are you an Oracle WebLogic Server Administrator?

Would you like to help define the depth and scope of future Oracle WebLogic Server training and certification?

Oracle is conducting this online Job Task survey prior to the next major Oracle WebLogic Server product release to ensure that the tasks Oracle WebLogic Server Administrators consider important are prominent in the design of future WebLogic Server Administrator curriculum and certification offerings.

Don’t miss this chance to do what DBAs did last year, to help define the curriculum and certification for the Oracle Database. Join with other Oracle experts to take this online survey and tell us what tasks are important to you.

Survey is here: http://www.surveymonkey.com/s/7XKX983

 

Joel

 

 

Posted in Oracle | Leave a Comment »

Don’t be a Crash Test Dummy

Posted by Joel Goodman on 28/02/2012


During a recent Oracle 11g Tuning Course in London, a discussion of performance problems related to locking and resource serialisation arose during the lessons dealing with contention. I was then asked by a student to summarise the types of serialisation that occur and to describe the worst types that I find at customers. I replied that “in-doubt” locks on tables involved in distributed transactions have created the worst cases that I have seen. I will focus on this type of lock later in this post. But first, the answer to the first question:

There are different types of serialisation mechanisms for different purposes:

1. Latches – used to protect low level memory structures or access to certain sections of code. Latches are held for very short durations and have been described very nicely in numerous blogs. A good in depth discussion may be found here in a white paper by David Gornshtein and Boris Tamarkin. Latches are typically held for microseconds and have no longer have any documented parameters to control or administer them. Monitoring may be done using GUI tools or by using dynamic performance views such as:

  1. V$LATCH
  2. V$LATCHNAME
  3. V$LATCH_PARENT
  4. V$LATCH_CHILDREN
  5. V$LATCHHOLDER
  6. V$LATCHNAME
  7. V$LATCH_MISSES

2. Mutexes- which are used from 10.2.0.2 onwards for library cache management as they are faster and use less memory than do latches. Mutexes, like latches are typically held for short durations and have no documented controlling parameters, They may be monitored using dynamic performance views such as:

  1. V$MUTEX_SLEEP
  2. V$MUTEX_SLEEP_HISTORY

3. ENQUEUES – used to control access to resources for a longer period of time. ENQUEUES (also called locks) are obtained using enqueues. Unlike mutexes or latches, they may be held for long periods by the owning session, possibly until the session commits. A good presentation on locks, latches and mutexes by my colleague Tanel Poder may be found here.  DBAs can control this potential contention problem by having sessions that hold locks sniped after a period of time, setting IDLE_TIME in the older implementation with Oracle Kernel profiles, or using the Database Resource Manager, and setting timeouts differentially for sessions that don’t currently block other sessions and those that do.  Locks may be monitored using dynamic performance views such as:

  1. V$LOCK
  2. V$LOCK_TYPE
  3. V$LOCK_ELEMENT
  4. V$LOCKED_OBJECT
  5. V$LOCK_ACTIVITY
  6. V$LOCKS_WITH_COLLISIONS

4. User Defined Locks – also  known as UL lock types, these locks are created and managed using the DBMS_LOCK package. While it is beyond the scope of this post, to discuss the details, one unusual aspect of user managed locks is that they may persist across commits. This may be done by setting the RELEASE_ON_COMMIT boolean parameter of the DBMS_LOCK.REQUEST function to FALSE. This may lead to contention problems introduced by this level and duration of locking. DBAs ideally should know the purpose of these locks, and their likely duration, in order to monitor effectively. Monitoring of these locks is done using the same view as for ENQUEUES listed above.

5.  “In Doubt” locks on tables – these occur during the two phase commit process that occur when distributed transaction over database links are being committed.  A good discussion of the two-phased commit processing may be found in the Oracle documentation. What is unusual here is that an “in doubt” transaction lock remains on the table, returning an error on reads AND writes,  until the “in doubt” state is resolved.

When a network, node or instance failure prevents a distributed transaction from completing a “two phase commit” process that is already underway, one or more of the sites involved in the coordination of the commit, may not know the disposition at other sites. If a site is in the “prepare” state, then it must wait to be told whether to commit or to roll back  updates to any local objects involved in the distributed transaction. Until this decision is made, the objects remain locked. But since it is not known whether or not rows involved in the transaction will be committed, the lock is for both updates AND reads!! This is quite a serious lock as tables in this state may not even be read. Attempting to query a table in such a state results in an

ora-01591 : lock held by in-doubt distributed transaction x.yy.zzzz

error, where x.yy.zzzz represents the Global Transaction Identifier of the distributed transaction.

Normally if an instance or node failure occurs, it is resolved quickly, when connectivity is re-established amongst the database instances involved in the distributed transaction. This process is managed using the following components:

  1. The RECO background process, also known as the recoverer process
  2. The DBA_2PC_PENDING Data Dictionary view – based on sys.pending_trans$ base table
  3. The DBA_2PC_NEIGHBORS  Data Dictionary view – based on sys.pss1$ base table

RECO, on each instance will handshake with the other RECO processes and a decision will be made on any “in doubt” resources, based on whether the transaction did indeed commit or roll back and once the affected databases perform these actions, the affected tables are unlocked.

But if there is a delay in re-establishing connectivity, tables may remain locked for long periods of time; minutes, hours or more, because RECO can not make decisions if one or more of the instances is not available. If one of the sites has gone down due to a hardware problem for example.  and requires a resolution time longer than is acceptable, then DBAs may need to resolve this manually by performing either a force commit or a force rollback on the global transaction for the surviving instances and databases.

Oracle provides a way to practice these manual “in doubt” resolutions on test systems by using two separate but related features:

  • Disabling the RECO process
  • Performing 2 phase commit “crash tests

Disabling the RECO process

This is done by using the following statement logged as SYS, SYSTEM or with an appropriate role.

alter system disable distributed recovery;

The RECO process remains up, but does not participate in any automated “in doubt” resolution. By doing this, a DBA may simulate numerous failures using the crash tests, without having to shut down any instances.

 

The Crash Tests

These tests are performed by using specially worded comments in the COMMIT statement, which support comments for the purpose of general documentation in the code. There are many crash tests that are possible, and DBAs are better off if all the scenarios that might apply are tested.

The commit statement for crash tests has the following form:

commit comment ‘ORA-2PC-CRASH-TEST-<nn>’ ;

where nn is the number of the crash test.

To understand the tests, note that the participants in Oracle two phased commit processing may play one of a number of roles in the proceedings:

1. Global Transaction Coordinator – This is the database instance that issues the commit statement and to which the client or middle tier is connected.

2. Local Transaction Coordinator – This is a database instance that issues prepare, commit or rollback statements to database instances not connected directly to the Global Transaction Coordinator. This occurs if one or more of the databases that are directly connected to by the Global Transaction Coordinator,  connect onwards in a “daisy chain” fashion to other remote databases.

3. Commit Point Site - This is the instance that has the highest value for the COMMIT_POINT_STRENGTH parameter. The database that is the commit point site, is never asked to prepare. All the other sites are asked to prepare and if they do successfully, then a commit is sent to the commit point site. Usually, this should be the site where most of the updates are done as it saves preparing, when there is a possibility of rolling back, as preparing causes redo log buffer flush to be done. If there are only two databases, and the commit point strengths are equal, then the remote site is asked to prepare and the local site acts as both the global transaction coordinator and the commit point site.

Note also that the global transaction coordinator only has visibility of remote sites that are directly connected via database links. If there is a daisy-chain, then a local transaction coordinator is also used. For example if  database instance A connects to database instances B and C, and database instance B in turn connects to database instance D, then database instance D can not be the commit point site.

Assuming that database A is the global transaction coordinator, database B is the commit point site, database D is “daisy chained” from database B, and database instance A sends a prepare to database instance B.  Database instance B must do the following:

  • Send a prepare request to database instance D
  • Get a positive reply to the prepare
  • Issue a commit to database instance D
  • Get a positive reply for the commit
  • Commit its own resources
  • Return a positive result for the commit to database A

If database B is not the commit point site, then it gets a prepare and later on a commit from the global transaction coordinator.

Here is what it must do for the prepare:

  • Send a prepare request to database instance D
  • Get a positive reply to the prepare
  • Prepare its own resources
  • Return a positive result for the prepare to database A

And here is what it must do for the commit:

  • Send a commit request to database instance D
  • Get a positive reply to the commit
  • Commit its own resources
  • Return a positive result for the commit to database A

Note the following too

  • If any site fails to prepare then all sites will be instructed to roll back by the global or the local coordinator.
  • Once a transaction is either committed or rolled back, then a forget message is propagated to all the sites so that they may remove the appropriate rows from sys.pending_trans$ and sys.pss1$

Now we can look at the crash tests with their associated number to be placed in the comment:

  1. Crash commit point site after collect
  2. Crash non-commit point site after collect
  3. Crash non-commit point site before prepare
  4. Crash non-commit point site after prepare
  5. Crash commit point site before commit
  6. Crash commit point site after commit
  7. Crash non-commit point site before commit
  8. Crash non-commit point site after commit
  9. Crash commit point site before forget
  10. Crash non-commit point site before forget

To make this a bit simpler I have a demo that I use in the classroom. The script and the spooled output may be downloaded.

Have a go and don’t be a crash test dummy!

Joel

February 2012

Posted in Oracle | Tagged: , , , , | 2 Comments »

Mounting Failures With Your Failgroups?

Posted by Joel Goodman on 16/01/2012


Last year I wrote a post about mirroring in ASM in the context of Exadata which discussed aspects of ASM mirroring, including failgroups, and the ability of ASM to cope with different failure scenarios. The post dealt with cases such as:

  • Single disk failure in a single cell
  • Multiple disk failure in a single cell
  • Single cell failure
  • Overlapping disk failures in multiple cells

It also covered the management of free space in ASM.

But the assumption was that the diskgroup(s) were already mounted, when the failures occurred. If an ASM instance attempts to mount a diskgroup however, then things are very different.

I have been working closely with members of the Oracle Advanced Customer Support Services team over the past 18 months, to train EMEA staff in Grid Infrastructure, RAC, Exadata and Enterprise Linux, to help skill up for supporting the Oracle Database Machine.  Recently, this collaboration resulted in a joint project between Oracle University and ACS to triage some problems at an EMEA customer.

The customer’s configuration used Grid Infrastructure for a standalone server on a two node Solaris stretch cluster. This was done to utilise ASM on each node, to support a live environment on one node, and a test environment on the other node in a different location.  But the test node was also part of the HA strategy, in case of problems on the live node. Oracle Restart was not used, so essentially, it was two separate nodes each with Oracle and ASM.

The ASM disks, were provided by SAN LUNs, with arrays in both the live and test locations. Each array was connected to both the live and test servers.

The live system diskgroups each had two failgroups to support ASM normal redundancy

  • First failgroup on LUNs provided by the storage array at the live location
  • Second failgroup on LUNs provided by the storage array at the test location

Normally, the ASM instance on the live system would have two diskgroups mounted, each with two failgroups. But to handle live location failures, scripts were written, that would:

  • Shut the test database instances on the test server
  • Unmount the test database diskgroups in the ASM instance on the test server
  • Mount the live database diskgroups in the ASM instance on the test server
  • Start the live database instance on the test server

In effect, this was a cold failover of the database. But without Grid Infrastructure for a cluster and clustered ASM, the diskgroups could only be mounted by a single ASM instance at one time. Thus the live disk groups could be mounted on the test server, only if not currently mounted on the live one.

Test were done by the customer, demonstrating that the scripts worked properly when the live database node failed. Eventually after all testing was completed the system for which this database was created, became operational.

I was called in, partly to triage a failure of the same recovery scripts that had worked during testing, and also to do a health check on the database and ASM environments. What I discovered from the ASM and database logs, was that one of the failgroups, on the storage array at the live data centre, had gone offline. ASM continued to read mirror copies from the other failgroup,  as it is designed to do, whenever I/O for primary copies of ASM allocation units (AUs) on the offline failgroup were done. Eventually, the OS on the live server failed, possibly related to access to swap space or other system files on the array.

When the failover to the test system was invoked and the scripts executed:

  • The ASM instance failed to mount the live database diskgroups
  • Startup of the live database instance on the test server failed as it could not access the database files stored in the ASM diskgroups

An attempt was eventually made to fail back to the live server, once the live system OS was back up, but the same problem arose. When I examined the scripts, and the logs, it showed that the MOUNT command for the ASM diskgroups was missing the FORCE option. This is what caused the mount failures in both cases.

Mirrored ASM diskgroups, may survive the loss of ASM disks  or even a complete failgroup, when already mounted, but to mount a mirrored ASM disk group requires that each AU have at least two copies discovered for normal redundancy diskgroups. If one of more ASM disks, is not accessible, then the diskgroup will not mount, because the stated level of redundancy will not have been met. The FORCE option overcomes this problem, by requesting that the diskgroup be mounted, without providing the stated level of redundancy, should a subsequent failure of  an ASM disk occur.

Contributing to the problem, was the lack of testing for storage array failure. The tests had only been done on database node failure. No tests were done for storage array failure, storage network failure or for failure of any of the components upon which the storage depends. This was why the failure to mount the diskgroups in this situation, was not discovered during testing.

An alternative approach to the software architecture, would have been to use Grid Infrastructure for a cluster, rather than Grid Infrastructure for a standalone server. This would have meant the following:

  • The live database diskgroups could be mounted simultaneously from the ASM instances on both the live and test servers, because clustered ASM can do this.
  • No need to write custom scripts to mount the diskgroups on the test server would be needed.
  • The Grid Infrastructure could be used to control the “cold failover” of the non-RAC database instance from the live server to the test server
  • Since the diskgroups could be mounted already they would be accessible and the live database could start up normally on the test server

It was an interesting bit of troubleshooting, and served to re-enforce my classroom message about testing all failure scenarios when planning for High Availability. Instead of suffering mounting failures, aim for success.

Joel

1/2012

Posted in Oracle | Tagged: , , , , | 1 Comment »

Maintaining your Cell’s Image

Posted by Joel Goodman on 14/10/2011


I have been teaching the Oracle Exadata Database Machine Administration course for 2 years now and recently the one day seminar on Database Machine (DBM) monitoring. These two will be merged into a newly upgraded course soon that covers both topics and also new material on monitoring Infiniband switches, Cisco Switches, KVM hardware, patching and much more.

In the classroom, Oracle use a virtualised image of an X2-2 Quarter rack on each delegate desktop running under OVM. But the virtual machines for the Exadata storage servers also known as cells have some differences when compared to real storage servers:

  • The 12 Physical disks and associated LUNs are virtualised as files, rather than being device files as they are in a real cell
  • the 16 Flash FDOMs and associated LUNs are virtualised as files, rather than being device files as they are in a real cell
  • The sizes of these 28 files are smaller than the devices in a real cell
  • These is no virtualised Infiniband network. All the virtual machines in the virtual quarter rack use a virtual network under Oracle virtual machine (OVM)
  • There are no Infiniband switches, Cisco switches, KVM hardware, Power Distribution Units, or ILOMs to administer or monitor.
  • The root, boot and Oracle base file systems are not mirrored in different partitions on the first two hard disks of the cell. They are contained in directories on the virtual disk used by the virtual machine.

The classroom setup however is very good for the purpose of the course:

  • The Cell Software is the same as the software in real cells
  • The virtualised quarter rack permits delegates to lean how to administer for Exadata All the following:
    • Grid Infrastructure
    • ASM Disk Groups
    • Cell Discovery via the /etc/oracle/cell/network-config/cellip.ora and /etc/oracle/cell/network-config/cellinit.ora files
    • Cell administration using cellcli and dcli
    • Database I/O Resource management
    • Cell I/O Resource Management
    • Cell Security using Realms
    • Flash device configuration for Flash Cache
    • Flash device configuration for flash based disk groups
    • Flash device configuration for Smart Flash Logging
  • The Monitoring of the following is also available in the virtualised setup:
    • Cell Smart Scans
    • Use of Hybrid Columnar Compression (HCC)
    • The effect of Storage Indexes on I/O
    • The effect of Flash Cache on I/O
    • Interpreting cell oriented statistics
    • Interpreting execution plans and statement execution for Smart Storage operations
    • Monitoring the effect of I/O resource management

One of the common questions I get in the course concerns the configuration of the software on the cells. Cells contain “images” consisting of the following components:

  • Enterprise Linux OS kernel
  • The Root file system
  • The Oracle Software
  • Device Drivers
  • Firmware Golden Copies

But in the virtualised classroom setup it is not possible to demonstrate this, so I will use a real cell to show how this is configured.

The first two Hard Disks on each cell contain various partitions that are not present on the other ten Hard Disks which have no partition table.

Lets start by finding the device files associated with the 12 physical disks on the cell. We can use the cellcli utility for this:

CellCLI> list lun attributes diskType ,deviceName, lunSize,isSystemLun where disktype = HardDisk
         HardDisk        /dev/sda        557.861328125G   TRUE
         HardDisk        /dev/sdb        557.861328125G   TRUE
         HardDisk        /dev/sdc        557.861328125G   FALSE
         HardDisk        /dev/sdd        557.861328125G   FALSE
         HardDisk        /dev/sde        557.861328125G   FALSE
         HardDisk        /dev/sdf        557.861328125G   FALSE
         HardDisk        /dev/sdg        557.861328125G  FALSE
         HardDisk        /dev/sdh        557.861328125G  FALSE
         HardDisk        /dev/sdi        557.861328125G   FALSE
         HardDisk        /dev/sdj        557.861328125G   FALSE
         HardDisk        /dev/sdk        557.861328125G  FALSE
         HardDisk        /dev/sdl        557.861328125G   FALSE

We can see the 12 device files. /dev/sda and /dev/sdb correspond to disks 0 and 1 containing the “image” of  the cells. The “IsSystemLun” attribute shows that these two devices contain the image. This cell has 600G disks as the “lunSize” attribute indicates.

Here are the device files:

[root@dmorlcel05 ~]# ls -als /dev/sd[a-l]
0 brw-r—– 1 root disk 8,   0 Aug 27 02:22 /dev/sda
0 brw-r—– 1 root disk 8,  16 Aug 27 02:22 /dev/sdb
0 brw-r—– 1 root disk 8,  32 Oct 14 06:07 /dev/sdc
0 brw-r—– 1 root disk 8,  48 Oct 14 06:07 /dev/sdd
0 brw-r—– 1 root disk 8,  64 Oct 14 06:07 /dev/sde
0 brw-r—– 1 root disk 8,  80 Oct 14 06:05 /dev/sdf
0 brw-r—– 1 root disk 8,  96 Oct 14 06:07 /dev/sdg
0 brw-r—– 1 root disk 8, 112 Oct 14 06:07 /dev/sdh
0 brw-r—– 1 root disk 8, 128 Oct 14 06:07 /dev/sdi
0 brw-r—– 1 root disk 8, 144 Oct 14 06:07 /dev/sdj
0 brw-r—– 1 root disk 8, 160 Oct 14 06:07 /dev/sdk
0 brw-r—– 1 root disk 8, 176 Oct 14 06:07 /dev/sdl

Now lets look at some details of  these devices. First lets see the partition tables of the first two devices:

[root@dmorlcel05 grub]# fdisk -l /dev/sd[a-b]

Disk /dev/sda: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          15      120456   fd  Linux raid autodetect
/dev/sda2              16          16        8032+  83  Linux
/dev/sda3              17       69039   554427247+  83  Linux
/dev/sda4           69040       72824    30403012+   f  W95 Ext’d (LBA)
/dev/sda5           69040       70344    10482381   fd  Linux raid autodetect
/dev/sda6           70345       71649    10482381   fd  Linux raid autodetect
/dev/sda7           71650       71910     2096451   fd  Linux raid autodetect
/dev/sda8           71911       72171     2096451   fd  Linux raid autodetect
/dev/sda9           72172       72432     2096451   fd  Linux raid autodetect
/dev/sda10          72433       72521      714861   fd  Linux raid autodetect
/dev/sda11          72522       72824     2433816   fd  Linux raid autodetect

Disk /dev/sdb: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1          15      120456   fd  Linux raid autodetect
/dev/sdb2              16          16        8032+  83  Linux
/dev/sdb3              17       69039   554427247+  83  Linux
/dev/sdb4           69040       72824    30403012+   f  W95 Ext’d (LBA)
/dev/sdb5           69040       70344    10482381   fd  Linux raid autodetect
/dev/sdb6           70345       71649    10482381   fd  Linux raid autodetect
/dev/sdb7           71650       71910     2096451   fd  Linux raid autodetect
/dev/sdb8           71911       72171     2096451   fd  Linux raid autodetect
/dev/sdb9           72172       72432     2096451   fd  Linux raid autodetect
/dev/sdb10          72433       72521      714861   fd  Linux raid autodetect
/dev/sdb11          72522       72824     2433816   fd  Linux raid autodetect

/dev/sda and /dev/sdb each contain the same partitions with the same sizes. But lets look at the other 10 disks and see what they look like:

[root@dmorlcel05 ~]# fdisk -l /dev/sd[c-l]

Disk /dev/sdc: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdc doesn’t contain a valid partition table

Disk /dev/sdd: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdd doesn’t contain a valid partition table

Disk /dev/sde: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sde doesn’t contain a valid partition table

Disk /dev/sdf: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdf doesn’t contain a valid partition table

Disk /dev/sdg: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdg doesn’t contain a valid partition table

Disk /dev/sdh: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdh doesn’t contain a valid partition table

Disk /dev/sdi: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdi doesn’t contain a valid partition table

Disk /dev/sdj: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdj doesn’t contain a valid partition table

Disk /dev/sdk: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdk doesn’t contain a valid partition table

Disk /dev/sdl: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdl doesn’t contain a valid partition table

We can clearly see that disks 0 and 1 contain 11 partitions but that disks 2 through 11 do not. These partitions are what makes the first two disks on each cell into “system luns“. The total size used for this set of system partitions is seen by looking at the celldisks:

CellCLI> list celldisk attributes name,deviceName,devicePartition,diskType,size where diskType = HardDisk
         CD_00_dmorlcel05        /dev/sda        /dev/sda3       HardDisk        528.734375G
         CD_01_dmorlcel05        /dev/sdb        /dev/sdb3       HardDisk        528.734375G
         CD_02_dmorlcel05        /dev/sdc        /dev/sdc        HardDisk        557.859375G
         CD_03_dmorlcel05        /dev/sdd        /dev/sdd        HardDisk        557.859375G
         CD_04_dmorlcel05        /dev/sde        /dev/sde        HardDisk        557.859375G
         CD_05_dmorlcel05        /dev/sdf        /dev/sdf        HardDisk        557.859375G
         CD_06_dmorlcel05        /dev/sdg        /dev/sdg        HardDisk        557.859375G
         CD_07_dmorlcel05        /dev/sdh        /dev/sdh        HardDisk        557.859375G
         CD_08_dmorlcel05        /dev/sdi        /dev/sdi        HardDisk        557.859375G
         CD_09_dmorlcel05        /dev/sdj        /dev/sdj        HardDisk        557.859375G
         CD_10_dmorlcel05        /dev/sdk        /dev/sdk        HardDisk        557.859375G
         CD_11_dmorlcel05        /dev/sdl        /dev/sdl        HardDisk        557.859375G

We can see that about 29 gig is used for the system area on the first two hard disks. We can also see that the “devicePartition” attribute for the first two celldisks is the third partition of  the device but that for the other 10 disks, it corresponds to the device name. But now lets see how these partitions are really used. First lets try to see what is mounted:

[root@dmorlcel05 ~]# mount
/dev/md6 on / type ext3 (rw,usrquota,grpquota)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
/dev/md8 on /opt/oracle type ext3 (rw,nodev)
/dev/md4 on /boot type ext3 (rw,nodev)
/dev/md11 on /var/log/oracle type ext3 (rw,nodev)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
//10.141.138.97/monitor on /home/monitor type cifs (rw,mand)
/dev/sdm1 on /bla type ext3 (rw)
/dev/sdm1 on /mnt/usb type ext3 (rw)

The mounts show no evidence of partitions on /dev/sda or /dev/sdb. Instead we see various references to /dev/md partitions. We can also see this in /etc/fstab:

[root@dmorlcel05 ~]# cat /etc/fstab
/dev/md6           /                       ext3    defaults,usrquota,grpquota        1 1
tmpfs                   /dev/shm                tmpfs   defaults        0 0
devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
sysfs                   /sys                    sysfs   defaults        0 0
proc                    /proc                   proc    defaults        0 0
/dev/md2              swap                    swap    defaults        0 0
/dev/md8                /opt/oracle             ext3    defaults,nodev  1 1
/dev/md4                /boot                   ext3    defaults,nodev  1 1
/dev/md11               /var/log/oracle         ext3    defaults,nodev  1 1
//10.141.138.97/monitor     /home/monitor           cifs    user=monitor,password=monitor,uid=root,gid=root 0 0

We can see the following:

  • the swap partition is /dev/md2
  • the boot file system is mounted on /dev/md4
  • the root file system on /dev/md6
  • the oracle cell software base is mounted on /dev/md8
  • the oracle cell software log file system is mounted on /dev/md11

Also we already know the following:

  • /dev/sda3 and /dev/sdb3 are used for the celldisks on the first two luns. We saw this in the celldisk listing above. There is no corresponding metadevice.
  • /dev/sda4 and /dev/sdb4 are just extended partitions.
  • /dev/sda1 and /dev/sdb1 are bootable partitions from the fdisk output we saw earlier

To understand the use of the mounted devices, note that many of the partitions of the first two disks are managed with software raid mirroring using mdadm. Lets first see all the metadevices:

[root@dmorlcel05 grub]# ls -als /dev/md*
0 brw-r—– 1 root disk 9,  0 Aug 27 02:24 /dev/md0
0 brw-r—– 1 root disk 9,  1 Aug 27 02:22 /dev/md1
0 brw-r—– 1 root disk 9, 11 Aug 27 02:24 /dev/md11
0 brw-r—– 1 root disk 9,  2 Aug 27 02:22 /dev/md2
0 brw-r—– 1 root disk 9,  4 Aug 27 02:24 /dev/md4
0 brw-r—– 1 root disk 9,  5 Aug 27 02:27 /dev/md5
0 brw-r—– 1 root disk 9,  6 Aug 27 02:27 /dev/md6
0 brw-r—– 1 root disk 9,  7 Aug 27 02:27 /dev/md7
0 brw-r—– 1 root disk 9,  8 Aug 27 02:27 /dev/md8

We can see that there are metadevices for most of the partitions but not for:

  • /dev/sda3 which is used for the first celldisk
  • /dev/sdb3 which is used for the second celldisk

Lets see what the software raid setup is like using mdadm first with /dev/md2 which we saw earlier was used for the swap partition. First lets confirm this using swapon.

[root@dmorlcel05 grub]# swapon -s
Filename                                Type            Size    Used    Priority
/dev/md2                                partition       2096376 0       -1

Now lets see what /dev/md2 is:

root@dmorlcel05 grub]# mdadm --misc -D /dev/md2
/dev/md2:
        Version : 0.90
  Creation Time : Thu Feb  4 06:03:17 2010
     Raid Level : raid1
     Array Size : 2096384 (2047.59 MiB 2146.70 MB)
  Used Dev Size : 2096384 (2047.59 MiB 2146.70 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Sun Oct  9 04:22:45 2011
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : 3db204f1:d7a6a84b:cd086034:be3fb671
         Events : 0.58

    Number   Major   Minor   RaidDevice State
       0       8        9        0      active sync   /dev/sda9
       1       8       25        1      active sync   /dev/sdb9

We can see that /dev/md2 is mirroring across partitions /dev/sda9 and /dev/sdb9.

Lets see about some other partitions, firstly the boot file system on /dev/md4:

[root@dmorlcel05 grub]# mdadm --misc -D /dev/md4
/dev/md4:
        Version : 0.90
  Creation Time : Thu Feb  4 06:03:28 2010
     Raid Level : raid1
     Array Size : 120384 (117.58 MiB 123.27 MB)
  Used Dev Size : 120384 (117.58 MiB 123.27 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 4
    Persistence : Superblock is persistent

    Update Time : Fri Oct 14 06:44:52 2011
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : e325de40:410dd5d7:f477be17:f36a61aa
         Events : 0.64

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1

We can see that the boot file system is also mirrored across /dev/sda1 and /dev/sdb1. Next is the root file system:

[root@dmorlcel05 grub]# mdadm --misc -D /dev/md6
 /dev/md6:
         Version : 0.90
   Creation Time : Thu Feb  4 06:03:41 2010
      Raid Level : raid1
      Array Size : 10482304 (10.00 GiB 10.73 GB)
   Used Dev Size : 10482304 (10.00 GiB 10.73 GB)
    Raid Devices : 2
   Total Devices : 2
 Preferred Minor : 6
     Persistence : Superblock is persistent
    Update Time : Fri Oct 14 07:15:20 2011
           State : clean
  Active Devices : 2
 Working Devices : 2
  Failed Devices : 0
   Spare Devices : 0
           UUID : 73867382:a65ee3ec:8bc6831d:a67e037c
          Events : 0.6
    Number   Major   Minor   RaidDevice State
        0       8        6        0      active sync   /dev/sda6
        1       8       22        1      active sync   /dev/sdb6

Next the Oracle base file system on /dev/md8:

[root@dmorlcel05 grub]# mdadm --misc -D /dev/md8
/dev/md8:
        Version : 0.90
  Creation Time : Thu Feb  4 06:04:15 2010
     Raid Level : raid1
     Array Size : 2096384 (2047.59 MiB 2146.70 MB)
  Used Dev Size : 2096384 (2047.59 MiB 2146.70 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 8
    Persistence : Superblock is persistent

    Update Time : Fri Oct 14 07:16:41 2011
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : 353f0c30:88a291fa:36b88fb4:d274f195
         Events : 0.60

    Number   Major   Minor   RaidDevice State
       0       8        8        0      active sync   /dev/sda8
       1       8       24        1      active sync   /dev/sdb8

Finally we have the log filesystem:

[root@dmorlcel05 grub]# mdadm --misc -D /dev/md11
/dev/md11:
        Version : 0.90
  Creation Time : Tue Aug 24 15:55:27 2010
     Raid Level : raid1
     Array Size : 2433728 (2.32 GiB 2.49 GB)
  Used Dev Size : 2433728 (2.32 GiB 2.49 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 11
    Persistence : Superblock is persistent

    Update Time : Fri Oct 14 07:19:15 2011
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : f04ecbf4:58fdc681:0e1668d1:9049b7b4
         Events : 0.66

    Number   Major   Minor   RaidDevice State
       0       8       11        0      active sync   /dev/sda11
       1       8       27        1      active sync   /dev/sdb11 

We can see that the root, boot, cell software, cell log and swap partitions are mirrored onto partitions of the first
two disks using mdadm. But this leaves some partitions on each of the disks unaccounted for. For example we see no evidence of  /dev/md5 or /dev/md7 and the partitions over which they might be created.

To understand this we must see how cell get imaged. A  cell contains both the current image and the previous image of root and software file systems as well as different kernels. When a new image is “patched” onto the cell, the “inactive” partitions on /dev/sda and /dev/sdb get updated with the new image and updates are made to /etc/fstab and to the grub.conf file for booting the new images kernel. When the cell is rebooted, the new image is the active image and the previous active image becomes the inactive one. We can explore this using some utilities used on the database machine.

[root@dmorlcel05 grub]# imageinfo -all

Kernel version: 2.6.18-194.3.1.0.4.el5 #1 SMP Sat Feb 19 03:38:37 EST 2011 x86_64
Cell version: OSS_11.2.0.3.0_LINUX.X64_110429.1
Cell rpm version: cell-11.2.2.3.1_LINUX.X64_110429.1-1

Active image version: 11.2.2.3.1.110429.1
Active image created: 2011-04-29 21:20:48 -0700
Active image activated: 2011-05-09 16:26:31 -0400
Active image type: production
Active image status: success
Active internal version:
Active image label: OSS_11.2.0.3.0_LINUX.X64_110429.1
Active node type: STORAGE
Active system partition on device: /dev/md6
Active software partition on device: /dev/md8

In partition rollback: Impossible

Cell boot usb partition: /dev/sdm1
Cell boot usb version: 11.2.2.3.1.110429.1

rmdir: /mnt/dev/md5: Directory not empty
Inactive image version: 11.2.1.2.6
Inactive image created: 2010-05-11 13:37:31 -0700
Inactive image activated: 2010-10-05 02:34:01 -0400
Inactive image type: production
Inactive image status: success
Inactive internal version: 999999
Inactive image label: OSS_11.2.1.2.6_LINUX.X64_100511
Inactive node type: STORAGE
Inactive system partition on device: /dev/md5
Inactive software partition on device: /dev/md7

Boot area has rollback archive for the version: 11.2.1.2.6
Rollback to the inactive partitions: Possible

Here we can see that /dev/md5 and /dev/md7 are “inactive. Each of these metadevices is mirrored across partitions on /dev/sda and /dev/sdb. Note that there is also a utility to show all the images that have been on the cell:

[root@dmorlcel05 grub]# imagehistory -a
Version                              : 11.2.1.2.1
Image activation date                : 2010-02-04 15:00:53 -0800
Imaging mode                         : fresh
Imaging status                       : success
Imaging type                         : image
Image creation date                  : 2010-02-01 02:52:14 -0800
Image type                           : production
Image node                           : STORAGE
Internal version                     : 100131
Internal label                       : OSS_11.2.1.2.1_LINUX.X64_100131
Internal cell rpm version            : undefined
Internal cell software version       : undefined
Image id                             : 1265021534

Version                              : 11.2.1.2.3
Image activation date                : 2010-04-30 12:12:04 -0400
Imaging mode                         : patch
Imaging status                       : success
Imaging type                         : in partition
Image creation date                  : 2010-03-05 11:35:15 -0800
Image type                           : production
Image node                           : STORAGE
Internal version                     : 999999
Internal label                       : OSS_11.2.1.2.3_LINUX.X64_100305
Internal cell rpm version            : undefined
Internal cell software version       : undefined
Image id                             : 1267817715

Version                              : 11.2.1.2.4
Image activation date                : 2010-04-30 12:21:54 -0400
Imaging mode                         : patch
Imaging status                       : success
Imaging type                         : in partition
Image creation date                  : 2010-04-13 21:53:04 -0700
Image type                           : production
Image node                           : STORAGE
Internal version                     : 999999
Internal label                       : OSS_11.2.1.2.4_LINUX.X64_100413
Internal cell rpm version            : undefined
Internal cell software version       : undefined
Image id                             : 1271220784

Version                              : 11.2.1.3.1
Image activation date                : 2010-08-24 18:13:07 -0400
Imaging mode                         : out of partition upgrade
Upgrade logs                         : /var/log/cellos/patchlogs/1286251665.ex_part_rollback.11.2.1.2.4_11.2.1.3.1/1286251665.ex_part_rollback.11.2.1.2.4_11.2.1.3.1.tar.gz
Imaging status                       : success
Imaging type                         : out of partition
Image creation date                  : 2010-08-18 21:52:15 -0700
Image type                           : production
Image node                           : STORAGE
Internal version                     : 100818.1
Internal label                       : OSS_11.2.1.3.1_LINUX.X64_100818.1
Internal cell rpm version            : undefined
Internal cell software version       : undefined
Image id                             : 1282193535

Version                              : 11.2.1.2.4
Image activation date                : 2010-10-05 00:24:08 -0400
Imaging mode                         : out of partition rollback
Imaging status                       : success
Imaging type                         : in partition
Image creation date                  : 2010-04-13 21:53:04 -0700
Image type                           : production
Image node                           : STORAGE
Internal version                     : 999999
Internal label                       : OSS_11.2.1.2.4_LINUX.X64_100413
Internal cell rpm version            : undefined
Internal cell software version       : undefined
Image id                             : 1271220784

Version                              : 11.2.1.2.6
Image activation date                : 2010-10-05 02:34:01 -0400
Imaging mode                         : patch
Imaging status                       : success
Imaging type                         : in partition
Image creation date                  : 2010-05-11 13:37:31 -0700
Image type                           : production
Image node                           : STORAGE
Internal version                     : 999999
Internal label                       : OSS_11.2.1.2.6_LINUX.X64_100511
Internal cell rpm version            : undefined
Internal cell software version       : undefined
Image id                             : 1273610251

Version                              : 11.2.2.3.1.110429.1
Image activation date                : 2011-05-09 16:26:31 -0400
Imaging mode                         : out of partition upgrade
Imaging status                       : success
Imaging type                         : out of partition
Image creation date                  : 2011-04-29 21:20:48 -0700
Image type                           : production
Image node                           : STORAGE
Internal version                     : undefined
Internal label                       : OSS_11.2.0.3.0_LINUX.X64_110429.1
Internal cell rpm version            : OSS_11.2.2.3.1_LINUX.X64_110429.1
Internal cell software version       : OSS_11.2.0.3.0_LINUX.X64_110429.1
Image id                             : 1304137248

Note: the imageinfo and imagehistory may also be used on the database nodes in a database machine but the output is somewhat different for imageinfo. Here is imageinfo on a database server node ni the database machine:

[root@dmorldb03 ~]# imageinfo -all

Kernel version: 2.6.18-194.3.1.0.4.el5 #1 SMP Sat Feb 19 03:38:37 EST 2011 x86_64
Image version: 11.2.2.3.5.110815
Image created: 2011-08-16 13:35:02 -0700
Image activated: 2011-10-08 17:28:56 -0400
Image image type: production
Image status: success
Internal version:
Image label: OSS_11.2.2.3.5_LINUX.X64_110815
Node type: COMPUTE
System partition on device: /dev/sda1

And here is the imagehistory output on a database node:

[root@dmorldb03 ~]# imagehistory -a
Version                              : 11.2.1.2.1
Image activation date                : 2010-02-03 22:59:09 -0800
Imaging mode                         : fresh
Imaging status                       : success
Imaging type                         : image
Image creation date                  : 2010-02-01 02:53:01 -0800
Image type                           : production
Image node                           : COMPUTE
Internal version                     : 100131
Internal label                       : OSS_11.2.1.2.1_LINUX.X64_100131
Image id                             : 1265021581

Version                              : 11.2.1.2.3
Image activation date                : 2010-04-30 14:29:15 -0400
Imaging mode                         : patch
Imaging status                       : success
Imaging type                         : in partition
Image creation date                  : 2010-03-05 11:38:07 -0800
Image type                           : production
Image node                           : COMPUTE
Internal version                     : 999999
Internal label                       : OSS_11.2.1.2.3_LINUX.X64_100305
Image id                             : 1267817887

Version                              : 11.2.1.3.1
Image activation date                : 2010-08-24 18:50:13 -0400
Imaging mode                         : patch
Imaging status                       : success
Imaging type                         : in partition
Image creation date                  : 2010-08-18 21:49:31 -0700
Image type                           : production
Image node                           : COMPUTE
Internal version                     : 100818.1
Internal label                       : OSS_11.2.1.3.1_LINUX.X64_100818.1
Image id                             : 1282193371

Version                              : 11.2.2.3.1.110429.1
Image activation date                : 2011-05-09 17:58:47 -0400
Imaging mode                         : patch
Imaging status                       : success
Imaging type                         : in partition
Image creation date                  : 2011-04-29 21:14:31 -0700
Image type                           : production
Image node                           : COMPUTE
Internal version                     : undefined
Internal label                       : OSS_11.2.0.3.0_LINUX.X64_110429.1
Image id                             : 1304136871

Version                              : 11.2.2.3.5.110815
Image activation date                : 2011-10-08 16:15:41 -0400
Imaging mode                         : patch
Imaging status                       : success
Imaging type                         : in partition
Image creation date                  : 2011-08-16 13:35:02 -0700
Image type                           : production
Image node                           : COMPUTE
Internal version                     : undefined
Internal label                       : OSS_11.2.2.3.5_LINUX.X64_110815
Image id                             : 1313526902

Version                              : 11.2.2.3.1.110429.1
Image activation date                : 2011-10-08 17:01:49 -0400
Imaging mode                         : rollback
Imaging status                       : success
Imaging type                         : in partition
Image creation date                  : 2011-04-29 21:14:31 -0700
Image type                           : production
Image node                           : COMPUTE
Internal version                     : undefined
Internal label                       : OSS_11.2.0.3.0_LINUX.X64_110429.1
Image id                             : 1304136871

Version                              : 11.2.2.3.5.110815
Image activation date                : 2011-10-08 17:28:56 -0400
Imaging mode                         : patch
Imaging status                       : success
Imaging type                         : in partition
Image creation date                  : 2011-08-16 13:35:02 -0700
Image type                           : production
Image node                           : COMPUTE
Internal version                     : undefined
Internal label                       : OSS_11.2.2.3.5_LINUX.X64_110815
Image id                             : 1313526902

I hope this helps in the understanding of how cells maintain their images!

Joel

October 2011


Posted in Oracle | 8 Comments »

Train Your DBAs – From DBATRAIN

Posted by Joel Goodman on 12/10/2011


Last chance to contribute to the footprint of the new Oracle DBA training curriculum and certification track footprint.

The Job Task Analysis project survey closes at end of the month and will be used to help define the structure and content of the Oracle University curriculum for the next release. This covers the core DBA training curriculum, advanced topics such as Grid Infrastructure, RAC, Data Guard, Partitioning, Exadata Database Machine, Performance Management and much more.

There will of course be a certification track for OCA and OCP based on the core curriculum and several Oracle Certified Expert (OCE) exams based on advanced topics, and of course the Oracle Certified Master (OCM) exam.

I will be meeting with Harald van Breederode, Uwe Hesse and Branislav Valny from the Oracle University EMEA SME team, and with the DBA curriculum Managers from Oracle University to help create the blueprint for the next generation of training and certification paths.

Share your views by taking  this online survey and tell us what tasks are important to you.

Your input will assure that Oracle University courses and certification tracks, reflect the evolving skill set and job requirements of today’s DBA 2.0

Joel

October 2011

Posted in Oracle | Leave a Comment »

 
Follow

Get every new post delivered to your Inbox.

Join 78 other followers