10.2.0.3 RAC的ASM实例,出现ORA-7445错误导致实例崩溃。
详细错误信息为:
Tue Nov 9 10:47:59 2010 NOTE: reconfiguration OF GROUP 4/0x654fe2a9 (DATA), FULL=1 NOTE: disk validation pending FOR GROUP 4/0x654fe2a9 (DATA) ERROR: GROUP 4/0x654fe2a9 (DATA): could NOT validate disk 25 SUCCESS: validated disks FOR 4/0x654fe2a9 (DATA) NOTE: PST refresh pending FOR GROUP 4/0x654fe2a9 (DATA) NOTE: PST UPDATE: grp = 4, dsk = 25, mode = 0x4 Tue Nov 9 10:48:03 2010 ERROR: too many offline disks IN PST (grp 4) Tue Nov 9 10:48:03 2010 NOTE: PST NOT enabling heartbeating (grp 4): GROUP dismounted Tue Nov 9 10:48:03 2010 SUCCESS: refreshed PST FOR 4/0x654fe2a9 (DATA) ERROR: ORA-15040 thrown IN RBAL FOR GROUP NUMBER 4 Tue Nov 9 10:48:03 2010 Errors IN file /u01/app/oracle/admin/+ASM/bdump/+asm2_rbal_7330.trc: ORA-15040: diskgroup IS incomplete ORA-15066: offlining disk "" may RESULT IN a DATA loss ORA-15042: ASM disk "25" IS missing Tue Nov 9 10:48:05 2010 Errors IN file /u01/app/oracle/admin/+ASM/bdump/+asm2_ckpt_7321.trc: ORA-00600: internal error code, arguments: [kfcbCloseCIC10], [4], [25], [7], [], [], [], [] Tue Nov 9 10:48:06 2010 Errors IN file /u01/app/oracle/admin/+ASM/bdump/+asm2_ckpt_7321.trc: ORA-00600: internal error code, arguments: [kfcbCloseCIC10], [4], [25], [7], [], [], [], [] Tue Nov 9 10:48:06 2010 CKPT: terminating instance due TO error 469 Tue Nov 9 10:48:06 2010 Trace dumping IS performing id=[cdmp_20101109104806] Tue Nov 9 10:48:08 2010 Shutting down instance (abort) License high water mark = 7 Tue Nov 9 10:48:11 2010 Instance TERMINATED BY CKPT, pid = 7321 Tue Nov 9 10:48:13 2010 Instance TERMINATED BY USER, pid = 25914 |
根据MOS文档Bug 8374703 – RAC ASM crash after disconnect in storage interconnect [ID 8374703.8],当存储连接断开可能会导致ASM实例的脑裂。而当前的这个问题虽然和文档描述的不完全一致,但是这个ORA-600错误显然是由于之前ASM磁盘找不到的问题所引起的。
这个问题Oracle在11.2.0.1在最终解决,事实上11.2的ASM的架构都和11.1及以前版本的ASM发生了明显的变化。而针对Bug 8374703的bug也只是在11.1上被FIXED,这个问题并没有10g上的解决方案。不过从错误信息上看,这个错误在10g上应该是偶然出现,而且和磁盘故障直接相关。