客户10.2.0.4 RAC数据库出现网络异常,导致实例崩溃并伴随大量ORA-27300错误。
详细错误信息为:
Wed Nov 21 16:37:36 2012 Errors IN file /u01/oracle/app/admin/orcl/udump/orcl2_ora_29173.trc: ORA-00603: ORACLE server SESSION TERMINATED BY fatal error ORA-27504: IPC error creating OSD context ORA-27300: OS system dependent operation:if_not_found failed WITH STATUS: 0 ORA-27301: OS failure message: Error 0 ORA-27302: failure occurred at: skgxpvaddr9 ORA-27303: additional information: requested interface 10.0.1.2 NOT found. CHECK output FROM ifconfig command Wed Nov 21 16:37:36 2012 Errors IN file /u01/oracle/app/admin/orcl/udump/orcl2_ora_29198.trc: ORA-00603: ORACLE server SESSION TERMINATED BY fatal error ORA-27504: IPC error creating OSD context ORA-27300: OS system dependent operation:if_not_found failed WITH STATUS: 0 ORA-27301: OS failure message: Error 0 ORA-27302: failure occurred at: skgxpvaddr9 ORA-27303: additional information: requested interface 10.0.1.2 NOT found. CHECK output FROM ifconfig command Wed Nov 21 16:37:56 2012 Trace dumping IS performing id=[cdmp_20121121163746] Wed Nov 21 16:38:00 2012 ospid 28424: network interface WITH IP address 10.0.1.2 no longer operational requested interface 10.0.1.2 NOT found. CHECK output FROM ifconfig command Wed Nov 21 16:38:07 2012 Error: KGXGN aborts the instance (6) Wed Nov 21 16:38:07 2012 Errors IN file /u01/oracle/app/admin/orcl/bdump/orcl2_lmon_28422.trc: ORA-29702: error occurred IN Cluster GROUP Service operation LMON: terminating instance due TO error 29702 Wed Nov 21 16:38:07 2012 Errors IN file /u01/oracle/app/admin/orcl/bdump/orcl2_lms1_28430.trc: ORA-29702: error occurred IN Cluster GROUP Service operation Wed Nov 21 16:38:07 2012 Errors IN file /u01/oracle/app/admin/orcl/bdump/orcl2_lms3_28438.trc: ORA-29702: error occurred IN Cluster GROUP Service operation . . . Wed Nov 21 16:38:09 2012 Errors IN file /u01/oracle/app/admin/orcl/bdump/orcl2_j000_28635.trc: ORA-29702: error occurred IN Cluster GROUP Service operation ORA-29702: error occurred IN Cluster GROUP Service operation Wed Nov 21 16:38:09 2012 Errors IN file /u01/oracle/app/admin/orcl/bdump/orcl2_mman_28450.trc: ORA-29702: error occurred IN Cluster GROUP Service operation Wed Nov 21 16:38:09 2012 Errors IN file /u01/oracle/app/admin/orcl/bdump/orcl2_asmb_28496.trc: ORA-15064: communication failure WITH ASM instance ORA-03113: end-of-file ON communication channel Wed Nov 21 16:38:10 2012 Errors IN file /u01/oracle/app/admin/orcl/bdump/orcl2_pmon_28416.trc: ORA-29702: error occurred IN Cluster GROUP Service operation Wed Nov 21 16:38:10 2012 Errors IN file /u01/oracle/app/admin/orcl/bdump/orcl2_smon_28462.trc: ORA-29702: error occurred IN Cluster GROUP Service operation Wed Nov 21 17:25:50 2012 Starting ORACLE instance (normal) LICENSE_MAX_SESSION = 0 LICENSE_SESSIONS_WARNING = 0 Interface TYPE 1 eth1 10.0.1.0 configured FROM OCR FOR USE AS a cluster interconnect Interface TYPE 1 eth0 172.18.19.0 configured FROM OCR FOR USE AS a public interface |
显然导致RAC节点宕机的问题来自操作系统或硬件层。导致出现ORA-27504错误的原因是操作系统相关的ORA-27300、ORA-27301、ORA-27302和ORA-27303错误。而这些错误明确的之处私有网络接口的地址无法找到,而操作系统命令ifconfig命令输出结果异常。
Oracle的网络心跳依赖于私有网络,而出现了这个问题,导致数据库节点崩溃也是情理之中的了。
显然这不应该算作Oracle的bug,Oracle给出的错误信息已经清晰的指明了问题的原因。找到导致操作系统层面网络接口失效的原因才是解决问题的关键。