ORA-600(kjcsombd:2)错误

又是一个9208 RAC上的错误,事实上这个错误和上一篇文章中描述的错误相关性很大,因为在上一篇节点关闭并报错的同时,这个节点出现了这个ORA-600错误。
ORA-600(kjccgmb:1)错误:https://yangtingkun.net/?p=245
在当前节点上的详细错误信息为:

Thu Oct 13 18:13:10 2011
IPC Send timeout detected. Sender ospid 1228900
Thu Oct 13 18:13:12 2011
Communications reconfiguration: instance 1
Thu Oct 13 18:13:12 2011
Trace dumping IS performing id=[cdmp_20111013181312]
Thu Oct 13 18:13:17 2011
IPC Send timeout detected. Sender ospid 770198
Thu Oct 13 18:13:19 2011
Evicting instance 2 FROM cluster
Thu Oct 13 18:13:22 2011
IPC Send timeout detected. Sender ospid 1032208
Thu Oct 13 18:13:31 2011
IPC Send timeout detected. Sender ospid 1302720
Thu Oct 13 18:13:37 2011
IPC Send timeout detected. Sender ospid 438420
Thu Oct 13 18:13:39 2011
Waiting FOR instances TO leave: 
2 
Thu Oct 13 18:13:47 2011
IPC Send timeout detected. Sender ospid 1474810
Thu Oct 13 18:13:59 2011
Waiting FOR instances TO leave: 
2 
.
.
.
Thu Oct 13 18:17:22 2011
IPC Send timeout detected. Sender ospid 876652
Thu Oct 13 18:17:24 2011
IPC Send timeout detected. Sender ospid 1654878
Thu Oct 13 18:17:27 2011
IPC Send timeout detected. Sender ospid 1425476
Thu Oct 13 18:17:27 2011
IPC Send timeout detected. Sender ospid 970920
Thu Oct 13 18:17:39 2011
Waiting FOR instances TO leave: 
2 
Thu Oct 13 18:17:59 2011
Waiting FOR instances TO leave: 
2 
Thu Oct 13 18:18:19 2011
Waiting FOR instances TO leave: 
2 
Thu Oct 13 18:18:29 2011
Errors IN file /u01/product/admin/RAC/udump/rac1_ora_1032208.trc:
ORA-00600: internal error code, arguments: [kjcsombd:2], [], [], [], [], [], [], []
ORA-03113: end-of-file ON communication channel
Thu Oct 13 18:18:37 2011
Errors IN file /u01/product/admin/RAC/udump/rac1_ora_1032208.trc:
ORA-00603: ORACLE server SESSION TERMINATED BY fatal error
ORA-00600: internal error code, arguments: [kjcsombd:2], [], [], [], [], [], [], []
ORA-03113: end-of-file ON communication channel
Thu Oct 13 18:18:38 2011
Trace dumping IS performing id=[cdmp_20111013181838]

这个600错误一直重复出现,直到另一个实例启动,对应的详细TRACE信息为:

/u01/product/admin/RAC/udump/rac1_ora_1032208.trc
Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production
WITH the Partitioning, REAL Application Clusters, OLAP AND Oracle DATA Mining options
JServer Release 9.2.0.8.0 - Production
ORACLE_HOME = /u01/product/oracle/9.2.0
System name: AIX
Node name: p55a1
Release: 3
Version: 5
Machine: 0001D007D600
Instance name: RAC1
Redo thread mounted BY this instance: 1
Oracle process NUMBER: 237
Unix process pid: 1032208, image: oracle@p55a1 (TNS V1-V3)
*** SESSION ID:(275.64909) 2011-10-13 18:13:22.043
SKGXPCTX: 0x102c4988 ctx
admono 0x3d9a665b admport:
SSKGXPT 0x102c4c44 flags  active network 0
info FOR network 0
 socket no 7  IP 172.16.12.254  UDP 57496
 HACMP network_id 0 sflags SSKGXPT_WRITESSKGXPT_UP
context TIMESTAMP 0xe5b469
 no ports
    sconno     accono   ertt  state   seq#   sent  async   sync rtrans   acks
0x5c9264d4 0x07c1249a     32      3  33535    772    772      0    296    771
slot 6 rqh=11035df18
seq=33534 len=424 accno=0x7c1249a START TS=0xe102f0 rt TS=0xe5b7c7 X CNT=297
0x5c9264d5 0x60c4351d     32      3  34041   1278   1278      0      0   1278
0x5c9264d6 0x4201cb1f     32      3  32770      7      7      0      0      7
       ach     accono     sconno      admno  state   seq#    rcv rtrans   acks
Submitting synchronized dump request [268435460]
KCL: caught error 3113 during cr LOCK op
*** 2011-10-13 18:18:29.055
ksedmp: internal OR fatal error
ORA-00600: internal error code, arguments: [kjcsombd:2], [], [], [], [], [], [], []
ORA-03113: end-of-file ON communication channel
CURRENT SQL information unavailable - no SESSION.
----- Call Stack Trace -----
calling              CALL     entry                argument VALUES IN hex      
location             TYPE     point                (? means dubious VALUE)     
-------------------- -------- -------------------- ----------------------------
ksedmp+0148          bl       ksedst               102974684 ?
ksfdmp+0018          bl       01FD3FC8             
kgerinv+00e8         bl       _ptrgl               
kgeanmfe+0048        bcl      kglsim_unpin_simhp+  000000200 ? 000000000 ?
                              001c                 700000000017CD8 ? 000000000 ?
kjcsombdi+0974       bl       kgeanmfe             110006288 ? 110357A28 ?
                                                   102A73228 ? 000000000 ?
                                                   10DC9D3665FC00 ?
                                                   8A2477269B180 ?
                                                   12E0BE826D694B2F ?
                                                   000000077 ?
kjcsombd+00a4        bl       kjcsombdi            BADC0FFEE0DDF00D ?
                                                   BADC0FFEE0DDF00D ?
kjpsod+0fbc          bl       kjcsombd             70000034CDEECD8 ? 2000004F8 ?
kssdch_stage+02b8    bl       _ptrgl               
kssdch+0014          bl       kssdch_stage         BADC0FFEE0DDF00D ?
                                                   BADC0FFEE0DDF00D ?
                                                   BADC0FFEE0DDF00D ?
ksudlp+0380          bl       kssdch               7000003796230D0 ? 200000002 ?
opidcl+020c          bl       01FD4824             
opidrv+045c          bl       opidcl               11000D060 ? 0101FAED0 ?
sou2o+0028           bl       opidrv               3C0C000000 ? 4A0142C60 ?
                                                   FFFFFFFFFFFF990 ?
main+0138            bl       01FD39E0             
__start+0098         bl       main                 000000000 ? 000000000 ?
--------------------- Binary Stack Dump ---------------------

从TRACE文件不难判断,出现这个问题是由于需要从远端CACHE中获取一致性读的BLOCK,但是在获取过程中碰到了ORA-3113通信中断错误。
显然这个问题与另外的节点关闭直接相关,配合另外节点上的ORA-600错误,怀疑两个节点间的通信在关闭时刻出现异常,从而引发各个节点上出现了不同的ORA-600错误。

This entry was posted in BUG and tagged , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *