私有网络接口丢失导致实例崩溃

客户10.2.0.4 RAC数据库出现网络异常,导致实例崩溃并伴随大量ORA-27300错误。
详细错误信息为:

Wed Nov 21 16:37:36 2012
Errors IN file /u01/oracle/app/admin/orcl/udump/orcl2_ora_29173.trc:
ORA-00603: ORACLE server SESSION TERMINATED BY fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:if_not_found failed WITH STATUS: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpvaddr9
ORA-27303: additional information: requested interface 10.0.1.2 NOT found. CHECK output FROM ifconfig command
Wed Nov 21 16:37:36 2012
Errors IN file /u01/oracle/app/admin/orcl/udump/orcl2_ora_29198.trc:
ORA-00603: ORACLE server SESSION TERMINATED BY fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:if_not_found failed WITH STATUS: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpvaddr9
ORA-27303: additional information: requested interface 10.0.1.2 NOT found. CHECK output FROM ifconfig command
Wed Nov 21 16:37:56 2012
Trace dumping IS performing id=[cdmp_20121121163746]
Wed Nov 21 16:38:00 2012
ospid 28424: network interface WITH IP address 10.0.1.2 no longer operational
requested interface 10.0.1.2 NOT found. CHECK output FROM ifconfig command
Wed Nov 21 16:38:07 2012
Error: KGXGN aborts the instance (6)
Wed Nov 21 16:38:07 2012
Errors IN file /u01/oracle/app/admin/orcl/bdump/orcl2_lmon_28422.trc:
ORA-29702: error occurred IN Cluster GROUP Service operation
LMON: terminating instance due TO error 29702
Wed Nov 21 16:38:07 2012
Errors IN file /u01/oracle/app/admin/orcl/bdump/orcl2_lms1_28430.trc:
ORA-29702: error occurred IN Cluster GROUP Service operation
Wed Nov 21 16:38:07 2012
Errors IN file /u01/oracle/app/admin/orcl/bdump/orcl2_lms3_28438.trc:
ORA-29702: error occurred IN Cluster GROUP Service operation
.
.
.
Wed Nov 21 16:38:09 2012
Errors IN file /u01/oracle/app/admin/orcl/bdump/orcl2_j000_28635.trc:
ORA-29702: error occurred IN Cluster GROUP Service operation
ORA-29702: error occurred IN Cluster GROUP Service operation
Wed Nov 21 16:38:09 2012
Errors IN file /u01/oracle/app/admin/orcl/bdump/orcl2_mman_28450.trc:
ORA-29702: error occurred IN Cluster GROUP Service operation
Wed Nov 21 16:38:09 2012
Errors IN file /u01/oracle/app/admin/orcl/bdump/orcl2_asmb_28496.trc:
ORA-15064: communication failure WITH ASM instance
ORA-03113: end-of-file ON communication channel
Wed Nov 21 16:38:10 2012
Errors IN file /u01/oracle/app/admin/orcl/bdump/orcl2_pmon_28416.trc:
ORA-29702: error occurred IN Cluster GROUP Service operation
Wed Nov 21 16:38:10 2012
Errors IN file /u01/oracle/app/admin/orcl/bdump/orcl2_smon_28462.trc:
ORA-29702: error occurred IN Cluster GROUP Service operation
Wed Nov 21 17:25:50 2012
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Interface TYPE 1 eth1 10.0.1.0 configured FROM OCR FOR USE AS a cluster interconnect
Interface TYPE 1 eth0 172.18.19.0 configured FROM OCR FOR USE AS  a public interface

显然导致RAC节点宕机的问题来自操作系统或硬件层。导致出现ORA-27504错误的原因是操作系统相关的ORA-27300、ORA-27301、ORA-27302和ORA-27303错误。而这些错误明确的之处私有网络接口的地址无法找到,而操作系统命令ifconfig命令输出结果异常。
Oracle的网络心跳依赖于私有网络,而出现了这个问题,导致数据库节点崩溃也是情理之中的了。
显然这不应该算作Oracle的bug,Oracle给出的错误信息已经清晰的指明了问题的原因。找到导致操作系统层面网络接口失效的原因才是解决问题的关键。

This entry was posted in ORACLE and tagged , , , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *