Mirror Mirror on the Exadata….
Posted by Joel Goodman on 28/03/2011
In past posts, I have discussed several aspects of the Oracle Exadata Database Machine, most recently concerning the methods used by Advanced Customer Services (ACS) to configure the network and software. But ASM must be used for Exadata and requires the administrator to use ASM redundancy, because Exadata does not do Array Based Mirroring.
The feedback I get in the classroom and when presenting at user group conferences on either ASM or on Exadata is that customers don’t use ASM mirroring because the ASM Disks are LUNS on storage arrays and the mirroring is done by the storage system. Most ASM diskgroups therefore, use external redundancy where by default each ASM disk is considered a separate failgroup.
With Exadata, ASM automatically groups griddisks discovered on the same Exadata storage server into a single failgroup. For example, there are 14 storage servers (also known as cells) in an X2-2 or X2-8 full rack, and each cell would be considered a separate failgroup by ASM, assuming that at least one griddisk was discovered by ASM on each cell. No failgroup clauses are required when creating the diskgroup. The path name for the Grid Disks discovered in the cells have the following format:
0/<cell ip address>/<grid disk name>
These path names may be seen in one of the following ways:
- By querying the PATH column from V$ASM_DISK
- By using the LSDSK command from the ASMCMD tool
- By using Enterprise Manager ASM pages.
Here is an example of using V$ASM_DISK:
SQL> select path from v$asm_disk where path like 'o/%/%'; PATH ------------------------------------------------------------------------------ o/192.168.10.20/data_CD_04_mycell1 o/192.168.10.20/data_CD_05_mycell1 o/192.168.10.21/reco_CD_01_mycell2 o/192.168.10.21/reco_CD_02_mycell2
Note that the ip address in the path name, is the address of the cell on the infiniband storage network within the database machine. The grid disk name is chosen by the administrator, when creating the grid disk on an exadata cell disk. Alternatively, the name may be system generated, by concatenating an administrator specified prefix to the name of the cell disk on which the grid disk resides. Since all grid disks on the same cell share the same ip address in their path name, it is easy for ASM to group the grid disks into failgroups.
Unlike diskgroups based on SAN provided LUNs, the cells do not mirror, and to provide for redundancy, the ASM diskgroups, must be created using normal or high redundancy so that ASM performs the mirroring. But there are some important implications for administering diskgroups when the mirroring is done by ASM relating to failure scenarios:
- Single disk failure in a single cell
- Multiple disk failure in a single cell
- Single cell failure
- Overlapping disk failures in multiple cells
- Pro-Active Disk Quarantine
- Required mirror free space to accommodate failures
1. Loss of a single disk in a cell
Loss of a disk means that any griddisk on that physical device will go offline within the diskgroup containing that griddisk. ASM normally controls the disposition of the offline disk based on an the disk repair time attribute of the diskgroup, which defaults to 3.6 hours. After this time, the disk if not placed back online, will be dropped and the failgroup to which it belongs will get rebalanced. This involves copying all primary or secondary ASM allocation units from other failgroups, to the failgroup that has lost a disk and also re-balancing the content of the failgroup so that the asm disks have roughly the same fullness percentage.
The delay specified by disk repair time is an 11g ASM feature to help avoid a re-balance operation at disk drop and then another, once the disk is repaired and added back to the failgroup.
2. Multiple disk failures in a single cell
This is similar to the loss of a single disk, except that there are more allocation units to recover, and less space remaining in the failgroup in which to copy the missing allocation units from other failgroups.
3. Single cell failure
In this case, the failgroup is entirely off line, and the missing allocation units must be copied from other failgroups to space on surviving failgroups.
4. Overlapping disk failures in multiple cells
In this case, it is possible that the second disk failure may occur before the allocation units lost in the first disk failure have been successfully copied back to the failgroup suffering the first disk failure. If any allocation unit lost due to the first disk failure has its secondary copy on the disk which failed in the second diskgroup, then the mirror copies will have both been lost, and data loss will have occurred. If the diskgroup uses high redundancy instead of normal redundancy, then three copies of each allocation unit will exist, and the chances of the loss of all copies of any one AU is reduced, at the cost of an extra 50% overhead in disk space.
5. Pro-Active Disk Quarantine
The disk repair time feature mentioned above was introduced as a diskgroup attribute in Oracle 11g to cut possible re-balance overheads, but it increases the risk that all copies of certain allocation units may be lost if overlapping disk failures on different cells occur. Delaying the disk drop, increases the chances of another disk failure occurring before the lost AUs are copied.
Patch 220.127.116.11.1 introduced Pro-Active Disk Quarantine, which overrides the disk repair time property of a diskgroup when the diskgroup is based on Exadata griddisks. This forces the griddisk to be dropped from the diskgroup immediately, so that copying and re-balancing can be performed as soon as possible. This reduces the exposure time for another disk failure in another failgroup, to cause the loss of the second copy of an AU, whose first copy has not yet been remirrored.
This means that ASM diskgroups now react differently to the loss of an ASM disk, if Exadata is used, compared to the behaviour when the ASM disks are based on other storage technologies.
6. Required Mirror Free Space to Accommodate Disk Failures
As mentioned earlier, many ASM administrators have little or no experience with ASM mirroring, because they use External Redundancy for their diskgroups and ASM generally maintains one copy for each AU in this case. tracking free space within a diskgroup in such a case is simple. Use the following details from view V$ASM_DISKGROUP to examine the redundancy, offline state and free space requirements for mirror recovery:
- TOTAL_MB indicates the total space in the diskgroup
- FREE_MB indicates the total free space in the diskgroup
- REQUIRED_MIRROR_FREE_MB indicates free space required in the diskgroup to restore redundancy by copying allocation units as described above.
- USABLE_FILE_MB indicates how much of the FREE_MB may be safely used whilst leaving enough free space for mirror copy recovery in the case of disk failure
- TYPE indicates the redundancy attribute of the diskgroup.
- OFFLINE_DISKS indicates how many disks are offline in the diskgroup
Note: the LSDG command in the ASMCMD utility will provide the same information.
The crucial aspect of administering the diskgroups when ASM mirrors the AUs, is having enough free space to re-copy the AUs when a loss occurs. This is not a concern when external redundancy is used, but it is for normal and high redundancy, and requires that the ASM administrator be aware of the free space needs.
For the database machine, the implications depend on the level of redundancy and also on the type of rack. For example, a quarter rack has only three cells and therefore three failgroups. The worst case scenario would be loss of an entire cell whereby the two surviving cells would require enough free space to recover all the missing AU copies from the lost cell in order to restore redundancy. On a half rack, which contains seven cells, the overhead of the required free space is spread across six surviving cells rather than just two, and on a full rack, which contains fourteen cells, there are thirteen survivors in the case of the loss of a complete cell and failgroup.
Most ASM administrators who have attended the Oracle Database Machine and Exadata seminar or course, have realised that they must review their existing ASM skills and add ASM redundancy management to their skillset.