ORA-600(kglobpn-bad-pga)错误

这个错误比较罕见,无论是GOOGLE还是METALINK都找不到任何的相关信息。
数据库版本是11.2.0.2 RAC环境,这个错误之所以这么罕见,也和11.2的版本太新,以致于很多问题都还没有暴露出来有关。
错误信息为:

2012-03-29 04:06:02.353000 +08:00
Errors IN file /oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_18355.trc (incident=144401):
ORA-00600: internal error code, arguments: [kglobpn-bad-pga], [], [], [], [], [], [], [], [], [], [], []
Incident details IN: /oracle/diag/rdbms/orcl/orcl1/incident/incdir_144401/orcl1_ora_18355_i144401.trc
2012-03-29 04:10:12.017000 +08:00
Dumping diagnostic DATA IN directory=[cdmp_20120329041012], requested BY (instance=1, osid=18355), summary=[incident=144401].
USE ADRCI OR Support Workbench TO package the incident.
See Note 411.1 at My Oracle Support FOR error AND packaging details.
2012-03-29 04:10:13.750000 +08:00
Sweep [inc][144401]: completed
Sweep [inc2][144401]: completed

详细的TRACE信息为:

*** 2012-03-29 04:06:02.385
*** SESSION ID:(106.45) 2012-03-29 04:06:02.385
*** CLIENT ID:() 2012-03-29 04:06:02.385
*** SERVICE NAME:(orcl) 2012-03-29 04:06:02.385
*** MODULE NAME:(PL/SQL Developer) 2012-03-29 04:06:02.385
*** ACTION NAME:(Main SESSION) 2012-03-29 04:06:02.385
Dump continued FROM file: /oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_18355.trc
ORA-00600: internal error code, arguments: [kglobpn-bad-pga], [], [], [], [], [], [], [], [], [], [], []
========= Dump FOR incident 144401 (ORA 600 [kglobpn-bad-pga]) ========
*** 2012-03-29 04:06:02.426
dbkedDefDump(): Starting incident DEFAULT dumps (flags=0x2, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=8m7jgqrwrtyj1) -----
CREATE OR REPLACE TRIGGER N_TRIGGER
  AFTER INSERT OR UPDATE ON T
  FOR EACH ROW
DECLARE
.
.
.
END;
----- Call Stack Trace -----
calling              CALL     entry                argument VALUES IN hex      
location             TYPE     point                (? means dubious VALUE)     
-------------------- -------- -------------------- ----------------------------
ksedst1()+96         CALL     skdstdst()           FFFFFFFF7FFEED60 ?
                                                   1006B0C80 ? 000000000 ?
                                                   00000000A ? 000000001 ?
                                                   10BD95B00 ?
ksedst()+60          CALL     ksedst1()            000000000 ? 000000001 ?
                                                   00010C212 ? 00010C000 ?
                                                   10C20A000 ? 00010C20A ?
dbkedDefDump()+2032  CALL     ksedst()             000000000 ? 10B25B000 ?
                                                   10B25B2B0 ? 10C212000 ?
                                                   00010B000 ? 00010C212 ?
dbgexPhaseII()+1800  PTR_CALL dbkedDefDump()       000000003 ? 000000002 ?
                                                   10A6EC2C8 ? 0000014B0 ?
                                                   10C20A000 ? 000000003 ?
dbgexProcessError()  CALL     dbgexPhaseII()       10C3B4650 ?
+1248                                              FFFFFFFF7BE3A578 ?
                                                   FFFFFFFF7FFF3AB8 ?
                                                   0018E0000 ? 10A6E35B8 ?
                                                   000001C00 ?
dbgePostErrorKGE()+  CALL     dbgeExecuteForError  10AE0C3FD ?
1320                          ()                   FFFFFFFFFEC0B62D ?
                                                   001050000 ?
                                                   FFFFFFFF7FFF6268 ?
                                                   001060000 ? 000000028 ?
dbkePostKGE_kgsf()+  CALL     dbgePostErrorKGE()   10C20AC90 ? 000000000 ?
44                                                 FFFFFFFF7BE3A578 ?
                                                   000000000 ? 000000258 ?
                                                   00010C000 ?
kgerinv_internal()+  CALL     kgeadse()            10C20AC90 ?
72                                                 FFFFFFFF7D1042C0 ?
                                                   000000258 ? 000002868 ?
                                                   10A6E4000 ? 00010A6E4 ?
kgerinv()+40         CALL     kgerinv_internal()   10C20AC90 ? 004EA2360 ?
                                                   10BDED4D0 ? 000000258 ?
                                                   000000000 ? 000000000 ?
kgeasnmierr()+28     CALL     kgerinv()            10C20AC90 ?
                                                   FFFFFFFF7D1042C0 ?
                                                   10BDED4D0 ? 000000000 ?
                                                   FFFFFFFF7FFF7630 ?
                                                   000001400 ?
kglobpn()+2972       CALL     kgeasnmierr()        10C20AC90 ?
                                                   FFFFFFFF7D1042C0 ?
                                                   10BDED4D0 ? 000000000 ?
                                                   0000060A6 ? 00000005B ?
kglpim()+444         CALL     kglobpn()            FFFFFFFFFFBEC700 ?
                                                   10C20AC90 ? 0000020BE ?
                                                   000000000 ? CF6AF45C8 ?
                                                   D163B01D0 ?
kktcrt()+1204        CALL     kglpim()             000001154 ?
                                                   FFFFFFFF7FFF7EC8 ?
                                                   D163B01D0 ? 000000002 ?
                                                   000000006 ? 00000004E ?
opiexe()+24660       CALL     kktcrt()             000006000 ?
                                                   FFFFFFFF7BC19328 ?
                                                   D1772AAE0 ? 10C20A950 ?
                                                   FFFFFFFF7BC190C0 ?
                                                   000000002 ?
opiosq0()+6416       CALL     opiexe()             000000004 ? 000000000 ?
                                                   FFFFFFFF7FFFA43C ?
                                                   00010A6E3 ? 10A6E3000 ?
                                                   FFFFFFFF7FFF9E10 ?
kpooprx()+232        CALL     opiosq0()            00000004A ? 00000000E ?
                                                   FFFFFFFF7FFFA600 ?
                                                   0000000A4 ? 00010A6E3 ?
                                                   10C20AC90 ?
kpoal8()+3884        CALL     kpooprx()            FFFFFFFF7FFFDF3C ?
                                                   FFFFFFFF7DFD0660 ?
                                                   0000015EC ? 0000000A4 ?
                                                   00010C000 ? 0000000A4 ?
opiodr()+1428        PTR_CALL kpoal8()             00000005E ? 00000001C ?
                                                   FFFFFFFF7FFFDF38 ?
                                                   00010C000 ? 10C20A000 ?
                                                   000001648 ?
ttcpip()+1056        PTR_CALL opiodr()             00010A795 ? 00000001C ?
                                                   103EAD460 ? 00010A400 ?
                                                   000001400 ? 10C20A000 ?
opitsk()+1528        CALL     ttcpip()             000000000 ? 10A6C7694 ?
                                                   10C20AC90 ?
                                                   FFFFFFFF7FFFDF38 ?
                                                   FFFFFFFF7FFFC980 ?
                                                   10C221848 ?
opiino()+1000        CALL     opitsk()             10A6C7694 ? 10C226C98 ?
                                                   10C221654 ? 10C21F958 ?
                                                   000000000 ? 10C20A950 ?
opiodr()+1428        PTR_CALL opiino()             00010C000 ? 10C2216D0 ?
                                                   10C2216D0 ? 000380000 ?
                                                   0000000E9 ?
                                                   FFFFFFFF7FFFF890 ?
opidrv()+1100        CALL     opiodr()             10C221000 ? 000000004 ?
                                                   1035DD740 ? 00010C000 ?
                                                   000001400 ? 10C20A000 ?
sou2o()+92           CALL     opidrv()             00000003C ? 000000004 ?
                                                   FFFFFFFF7FFFF890 ?
                                                   0001EB250 ?
                                                   FFFFFFFF7C945110 ?
                                                   FFFFFFFF7FFFFC98 ?
opimai_real()+304    CALL     sou2o()              FFFFFFFF7FFFF868 ?
                                                   00000003C ? 000000004 ?
                                                   FFFFFFFF7FFFF890 ?
                                                   00010C000 ? 00010B800 ?
ssthrdmain()+320     PTR_CALL opimai_real()        000000000 ?
                                                   FFFFFFFF7FFFFB38 ?
                                                   FFFFFFFF7F900768 ?
                                                   00010B800 ? 000000001 ?
                                                   000000002 ?
main()+308           CALL     ssthrdmain()         00010C000 ? 000000002 ?
                                                   00044D000 ? 100644B40 ?
                                                   10C230000 ? 00010C230 ?
_start()+380         CALL     main()               000000002 ?
                                                   FFFFFFFF7FFFFC48 ?
                                                   000000000 ?
                                                   FFFFFFFF7FFFFB48 ?
                                                   FFFFFFFF7FFFFC58 ?
                                                   FFFFFFFF7DB00200 ?
--------------------- Binary Stack Dump ---------------------

显然这是利用PL/SQL Developer在创建一个触发器,由于触发器的内容很长,这里将其省略了。
虽然没有MOS的佐证,但是不难判断,问题处在PL/SQL Developer在编译触发器时与Oracle的配合上。虽然不能确定问题导致的原因就是Developer,但是这个错误的产生肯定与工具有关,因为很多罕见的错误都是工具引入的,而Develper引入的ORA-7445和ORA-600错误我这个几乎不用工具的人都已经碰到很多个了,难免这个问题不是Developer与最新版的ORACLE不兼容所致。

ORA-7445(ksuklms)错误

客户的11.2.0.2 RAC环境出现了这个ORA-7445[ksuklms]错误。
错误信息为:

2012-05-10 16:25:38.561000 +08:00
Exception [TYPE: SIGSEGV, Address NOT mapped TO object] [ADDR:0x163A] [PC:0x100A554A8, ksuklms()+392] [flags: 0x0, COUNT: 1]
Errors IN file /app/diag/rdbms/orcl/orcl1/trace/orcl1_ora_28387.trc (incident=449642):
ORA-07445: exception encountered: core dump [ksuklms()+392] [SIGSEGV] [ADDR:0x163A] [PC:0x100A554A8] [Address NOT mapped TO object] []
Incident details IN: /app/diag/rdbms/orcl/orcl1/incident/incdir_449642/orcl1_ora_28387_i449642.trc
USE ADRCI OR Support Workbench TO package the incident.
See Note 411.1 at My Oracle Support FOR error AND packaging details.
2012-05-10 16:25:40.465000 +08:00
Dumping diagnostic DATA IN directory=[cdmp_20120510162540], requested BY (instance=1, osid=28387), summary=[incident=449642].
2012-05-10 16:25:42.061000 +08:00
Sweep [inc][449642]: completed
Sweep [inc2][449642]: completed

详细错误信息为:

*** 2012-05-10 16:25:38.800
*** SESSION ID:(917.28879) 2012-05-10 16:25:38.800
*** CLIENT ID:() 2012-05-10 16:25:38.800
*** SERVICE NAME:(SYS$USERS) 2012-05-10 16:25:38.800
*** MODULE NAME:(sqlplus@ecsyhdb1 (TNS V1-V3)) 2012-05-10 16:25:38.800
*** ACTION NAME:() 2012-05-10 16:25:38.800
Dump continued FROM file: /app/diag/rdbms/orcl/orcl1/trace/orcl1_ora_28387.trc
ORA-07445: exception encountered: core dump [ksuklms()+392] [SIGSEGV] [ADDR:0x163A] [PC:0x100A554A8] [Address NOT mapped TO object] []
========= Dump FOR incident 449642 (ORA 7445 [ksuklms()+392]) ========
----- Beginning of Customized Incident Dump(s) -----
Exception [TYPE: SIGSEGV, Address NOT mapped TO object] [ADDR:0x163A] [PC:0x100A554A8, ksuklms()+392] [flags: 0x0, COUNT: 1]
Registers:
----------
%i0: 0xffffffff7fff72e4 %i1: 0xffffffff7fff72d6 %i2: 0xffffffff7fff72ee
%i3: 0xffffffff7fff72e4 %i4: 0x0000000000000000 %i5: 0xffffffff7a6068f8
%i6: 0xffffffff7fff6a21 %fp: 0xffffffff7fff7220 %i7: 0x0000000100a54020
%l0: 0xffffffff7fff721c %l1: 0x000000038000b840 %l2: 0x00000000000019d8
%l3: 0x0000000000001670 %l4: 0x000000000000163a %l5: 0x0000000000000000
%l6: 0xffffffff7fff72d6 %l7: 0x00000000000019e0
%o0: 0x0000000000000000 %o1: 0x0000000000008343 %o2: 0x0000000000000b4a
%o3: 0xffffffff7fff72e8 %o4: 0x000000000000001a %o5: 0x0000000000000000
%o6: 0xffffffff7fff68c1 %sp: 0xffffffff7fff70c0 %o7: 0x0000000100a5546c
%g1: 0x0000000be0d2fbd8 %g2: 0x0000000000008342 %g3: 0x0000000000008342
%g4: 0x0000000000041a10 %g5: 0x0000000000000000 %g6: 0x0000000000000000
%g7: 0xffffffff7c100200
%pc: 0x0000000100a554a8 %npc: 0x0000000100a554ac %y: 0x0000000000000000
Stack info:
----------
ss_sp: 0xffffffff7e000000 ss_size: 0x0000000002000000 ss_flags: 0
Swap entries = 1 
path=/dev/md/dsk/d1, SIZE=75500789760, free=75500789760, LENGTH=147462480
*** 2012-05-10 16:25:38.814
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x3, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=55z15wr7xssjk) -----
ALTER system KILL SESSION '33603,2890'
----- Call Stack Trace -----
calling              CALL     entry                argument VALUES IN hex      
location             TYPE     point                (? means dubious VALUE)     
-------------------- -------- -------------------- ----------------------------
ksedst1()+96         CALL     skdstdst()           FFFFFFFF7A5FD640 ?
                                                   000000000 ? 000000000 ?
                                                   00000000A ? 000000001 ?
                                                   10BD552E0 ?
ksedst()+60          CALL     ksedst1()            000000001 ? 000000001 ?
                                                   00010C1D1 ? 00010C000 ?
                                                   10C1CA000 ? 00010C1CA ?
dbkedDefDump()+2032  CALL     ksedst()             000000001 ? 10B21A000 ?
                                                   10B21AA90 ? 10C1D2000 ?
                                                   00010B000 ? 00010C1D2 ?
ssexhd()+2196        CALL     ksedmp()             000000003 ? 000000003 ?
                                                   000000B4A ? 00038000B ?
                                                   000380000 ? 000000003 ?
__sighndlr()+12      PTR_CALL ssexhd()             10C1CE000 ? BE8D30A10 ?
                                                   10B86D110 ?
                                                   FFFFFFFF7A601EF0 ?
                                                   00010C1C9 ? 0000003E8 ?
call_user_handler()  CALL     __sighndlr()         00000000B ?
+992                                               FFFFFFFF7A601EF0 ?
                                                   FFFFFFFF7A601C10 ?
                                                   1046F1AA0 ? 000000000 ?
                                                   00000000A ?
ksuklms()+392        PTR_CALL 0000000000000000     FFFFFFFF7C100200 ?
                                                   FFFFFFFF7C100200 ?
                                                   FFFFFFFF7A601C10 ?
                                                   000000009 ? 000000000 ?
                                                   000000000 ?
ksukil()+480         CALL     ksuklms()            FFFFFFFF7FFF72E4 ?
                                                   FFFFFFFF7FFF72D6 ?
                                                   FFFFFFFF7FFF72EE ?
                                                   FFFFFFFF7FFF72E4 ?
                                                   000000000 ?
                                                   FFFFFFFF7A6068F8 ?
kkyasy()+4988        CALL     ksukil()             000000000 ? 000000001 ?
                                                   AE4BE0C42 ? AE4BE0B08 ?
                                                   10529ACB0 ? 000000B4A ?
kksExecutorclmand()  CALL     kkyasy()             000000001 ?
+2244                                              FFFFFFFF7AA56F28 ?
                                                   07FFFFFFF ? 000000001 ?
                                                   000000000 ? 07FFFFFFF ?
opiexe()+13404       CALL     kksExecutorclmand()  FFFFFFFF7AA56F28 ?
                                                   00010C1E3 ? 000000004 ?
                                                   BF0F4AB10 ? 10C1CE900 ?
                                                   10C1CE4C8 ?
kpoal8()+2368        CALL     opiexe()             000000049 ? 000000003 ?
                                                   FFFFFFFF7FFFA91C ?
                                                   000000000 ? 000000000 ?
                                                   0BFFFFFFF ?
opiodr()+1428        PTR_CALL kpoal8()             00000005E ? 00000001C ?
                                                   FFFFFFFF7FFFDDD8 ?
                                                   00010C000 ? 10C1CA000 ?
                                                   000001648 ?
ttcpip()+1056        PTR_CALL opiodr()             00010A755 ? 00000001C ?
                                                   103E6CC40 ? 00010A400 ?
                                                   000001400 ? 10C1CA000 ?
opitsk()+1528        CALL     ttcpip()             000000000 ? 10A686E74 ?
                                                   10C1CA3E0 ?
                                                   FFFFFFFF7FFFDDD8 ?
                                                   FFFFFFFF7FFFC820 ?
                                                   10C1E0F98 ?
opiino()+1000        CALL     opitsk()             10A686E74 ? 10C1E63E8 ?
                                                   10C1E0DA4 ? 10C1DF0A8 ?
                                                   000000000 ? 10C1CA0A0 ?
opiodr()+1428        PTR_CALL opiino()             000002270 ? 10C1E0E20 ?
                                                   00010C000 ? 000380000 ?
                                                   0000000FC ?
                                                   FFFFFFFF7FFFF730 ?
opidrv()+1100        CALL     opiodr()             10C1E0000 ? 000000004 ?
                                                   10359CF20 ? 00010C000 ?
                                                   000001400 ? 10C1CA000 ?
sou2o()+92           CALL     opidrv()             00000003C ? 000000004 ?
                                                   FFFFFFFF7FFFF730 ?
                                                   0001EA190 ?
                                                   FFFFFFFF7AF42F10 ?
                                                   10C3D42B0 ?
opimai_real()+304    CALL     sou2o()              FFFFFFFF7FFFF708 ?
                                                   00000003C ? 000000004 ?
                                                   FFFFFFFF7FFFF730 ?
                                                   00010C000 ? 00010B800 ?
ssthrdmain()+320     PTR_CALL opimai_real()        000000000 ?
                                                   FFFFFFFF7FFFF9D8 ?
                                                   FFFFFFFF7DF3AEB8 ?
                                                   00010B800 ? 000000001 ?
                                                   000000002 ?
main()+308           CALL     ssthrdmain()         00010C000 ? 000000002 ?
                                                   00044D000 ? 100604320 ?
                                                   10C1EF000 ? 00010C1EF ?
_start()+380         CALL     main()               000000002 ?
                                                   FFFFFFFF7FFFFAE8 ?
                                                   000000000 ?
                                                   FFFFFFFF7FFFF9E8 ?
                                                   FFFFFFFF7FFFFAF8 ?
                                                   FFFFFFFF7C100200 ?
--------------------- Binary Stack Dump ---------------------

这个错误是10.2.0.4开始引入的,Oracle在10.2.0.5中已经fixed了这个问题,没想到在11.2.0.2中,这个问题仍然出现。在10.2.0.4中,在RAC环境下杀掉一个会话可能导致节点的CRASH,但是11.2中,虽然出现了同样的错误,但是数据库实例并未CRASH。该问题的描述可以参考文档:Bug 7038750 – Dump (ksuklms) / instance crash [ID 7038750.8]。
在11.2中可以简单的忽略这个问题,而10.2.0.4环境如果碰到这个错误,除了将数据库升级到10.2.0.5或10.2.0.4.1以外,还可以在初始化参数文件中添加event:‘10422 trace name context forever, level 1’来避免这个错误造成实例的CRASH。

ORA-600(kjbrrefp:key)和ORA-600(kjbmprlst:shadow)错误

这两个错误是由同一个BUG导致的。
数据库环境11.2.0.2 RAC for Solaris sparc,错误信息如下:

2012-01-29 06:15:10.168000 +08:00
Errors IN file /app/diag/rdbms/orcl/orcl1/trace/orcl1_lms3_81.trc (incident=384590):
ORA-00600: internal error code, arguments: [kjbrref:pkey], [332269], [202], [137064], [0], [], [], [], [], [], [], []
Incident details IN: /app/diag/rdbms/orcl/orcl1/incident/incdir_384590/orcl1_lms3_81_i384590.trc
USE ADRCI OR Support Workbench TO package the incident.
See Note 411.1 at My Oracle Support FOR error AND packaging details.
2012-01-29 06:15:11.923000 +08:00
Dumping diagnostic DATA IN directory=[cdmp_20120129061511], requested BY (instance=1, osid=81 (LMS3)), summary=[incident=384590].
Sweep [inc][384590]: completed
Sweep [inc2][384590]: completed
2012-01-29 06:15:17.289000 +08:00
Errors IN file /app/diag/rdbms/orcl/orcl1/trace/orcl1_lms3_81.trc:
ORA-00600: internal error code, arguments: [kjbrref:pkey], [332269], [202], [137064], [0], [], [], [], [], [], [], []
LMS3 (ospid: 81): terminating the instance due TO error 484
2012-01-29 06:15:20.910000 +08:00
ORA-1092 : opitsk aborting process
2012-01-29 06:15:22.384000 +08:00
.
.
.
2012-04-17 04:26:44.373000 +08:00
Errors IN file /app/diag/rdbms/orcl/orcl1/trace/orcl1_lms1_8678.trc (incident=432578):
ORA-00600: internal error code, arguments: [kjbmprlst:shadow], [], [], [], [], [], [], [], [], [], [], []
Incident details IN: /app/diag/rdbms/orcl/orcl1/incident/incdir_432578/orcl1_lms1_8678_i432578.trc
USE ADRCI OR Support Workbench TO package the incident.
See Note 411.1 at My Oracle Support FOR error AND packaging details.
2012-04-17 04:26:45.864000 +08:00
Dumping diagnostic DATA IN directory=[cdmp_20120417042645], requested BY (instance=1, osid=8678 (LMS1)), summary=[incident=432578].
Errors IN file /app/diag/rdbms/orcl/orcl1/trace/orcl1_lms1_8678.trc:
ORA-00600: internal error code, arguments: [kjbmprlst:shadow], [], [], [], [], [], [], [], [], [], [], []
2012-04-17 04:26:47.359000 +08:00
Sweep [inc][432578]: completed
Sweep [inc2][432578]: completed
2012-04-17 04:26:53.095000 +08:00
Errors IN file /app/diag/rdbms/orcl/orcl1/trace/orcl1_lms1_8678.trc:
ORA-00600: internal error code, arguments: [kjbmprlst:shadow], [], [], [], [], [], [], [], [], [], [], []
LMS1 (ospid: 8678): terminating the instance due TO error 484
2012-04-17 04:26:56.593000 +08:00
ORA-1092 : opitsk aborting process
2012-04-17 04:26:58.088000 +08:00
Instance TERMINATED BY LMS1, pid = 8678

可以看到,无论是kjbrref:pkey错误的出现还是kjbmprlst:shadow错误的出现,都直接导致了实例的CRASH。可以说这两个错误都是非常严重的问题。而且二者都发生在LMSn进程上。

*** 2012-01-29 06:15:10.194
*** SESSION ID:(1009.1) 2012-01-29 06:15:10.194
*** CLIENT ID:() 2012-01-29 06:15:10.194
*** SERVICE NAME:(SYS$BACKGROUND) 2012-01-29 06:15:10.194
*** MODULE NAME:() 2012-01-29 06:15:10.194
*** ACTION NAME:() 2012-01-29 06:15:10.194
Dump continued FROM file: /app/diag/rdbms/orcl/orcl1/trace/orcl1_lms3_81.trc
ORA-00600: internal error code, arguments: [kjbrref:pkey], [332269], [202], [137064], [0], [], [], [], [], [], [], []
========= Dump FOR incident 384590 (ORA 600 [kjbrref:pkey]) ========
----- Beginning of Customized Incident Dump(s) -----
 GCS RESOURCE 0xb92d0cfa0 hashq [0xbb35eddc8,0xc0f9b1f60] name[0x511ed.ca] pkey 136931.0
   GRANT 0xb94a7e8f8 cvt 0x0 send 0x0@1,0 WRITE 0x0,0@65536
   flag 0x2 mdrole 0x1 mode 1 scan 0.0 ROLE LOCAL
   disk: 0x0000.00000000 WRITE: 0x0000.00000000 cnt 0x0 hist 0x0
   xid 0x0000.000.00000000 sid 3 pkwait 0s rmacks 0
   refpcnt 0 weak: 0x0000.00000000
   pkey 136931.0
   hv 91 [stat 0x0, 1->1, wm 32768, RMno 0, reminc 12, dom 0]
   kjga st 0x4, step 0.35.0, cinc 18, rmno 6345, flags 0x20
   lb 16384, hb 32767, myb 16957, drmb 16957, apifrz 1
   GCS SHADOW 0xb94a7e8f8,626 resp[0xb92d0cfa0,0x511ed.ca] pkey 136931.0
     GRANT 1 cvt 0 mdrole 0x1 st 0x100 lst 0x40 GRANTQ rl LOCAL
     master 1 owner 2 sid 3 remote[0x68fde3ef0,11] hist 0x10c30086180431f
     history 0x1f.0x6.0x1.0xc.0x6.0x1.0xc.0x6.0x1.0x0.
     cflag 0x0 sender 0 flags 0x0 replay# 0 abast 0x0.x0.1 dbmap 0x0
     disk: 0x0000.00000000 WRITE request: 0x0000.00000000
     pi scn: 0x0000.00000000 sq[0xb92d0cfd0,0xb92d0cfd0]
     msgseq 0x1 updseq 0x0 reqids[11,0,0] infop 0x0 lockseq x67d9
   GCS SHADOW END
 GCS RESOURCE END
----- End of Customized Incident Dump(s) -----
*** 2012-01-29 06:15:10.261
dbkedDefDump(): Starting incident DEFAULT dumps (flags=0x2, level=3, mask=0x0)
----- SQL Statement (None) -----
CURRENT SQL information unavailable - no cursor.
----- Call Stack Trace -----
calling              CALL     entry                argument VALUES IN hex      
location             TYPE     point                (? means dubious VALUE)     
-------------------- -------- -------------------- ----------------------------
ksedst1()+96         CALL     skdstdst()           FFFFFFFF7FFF4C00 ?
                                                   100670460 ? 000000000 ?
                                                   00000000A ? 000000001 ?
                                                   10BD552E0 ?
ksedst()+60          CALL     ksedst1()            000000000 ? 000000001 ?
                                                   00010C1D1 ? 00010C000 ?
                                                   10C1CA000 ? 00010C1CA ?
dbkedDefDump()+2032  CALL     ksedst()             000000000 ? 10B21A000 ?
                                                   10B21AA90 ? 10C1D2000 ?
                                                   00010B000 ? 00010C1D2 ?
dbgexPhaseII()+1800  PTR_CALL dbkedDefDump()       000000003 ? 000000002 ?
                                                   10A6ABAA8 ? 0000014B0 ?
                                                   10C1C9000 ? 000000003 ?
dbgexExplicitEndInc  CALL     dbgexPhaseII()       10C373D30 ?
()+728                                             FFFFFFFF7A634920 ?
                                                   FFFFFFFF7FFF8FDC ?
                                                   0018E0001 ? 10A6A2D98 ?
                                                   000001C00 ?
dbgeEndDDEInvocatio  CALL     dbgexExplicitEndInc  10A6A2C50 ?
nImpl()+704                   ()                   FFFFFFFF7A634920 ?
                                                   FFFFFFFF7FFF8F28 ?
                                                   FFFFFFFF7FFFC620 ?
                                                   000000000 ?
                                                   FFFFFFFFFE4E26A0 ?
kjbrref()+1496       CALL     dbgeEndDDEInvocatio  10C373D30 ? 001B1D800 ?
                              n()                  FFFFFFFFFEC0AF31 ?
                                                   FFFFFFFF7FFFC620 ?
                                                   000002868 ? 0018E0001 ?
kjblreplay()+7380    CALL     kjbrref()            000002868 ? 10C1CA3E0 ?
                                                   000021768 ? A681AFA10 ?
                                                   B92D0CFA0 ? C0F96F920 ?
kjbldrmrpst()+4864   CALL     kjblreplay()         000000000 ? 000000001 ?
                                                   10C1CA0A0 ? BDA03C9B8 ?
                                                   000000000 ? 10C1E8890 ?
kjmprcfgsync()+1424  CALL     kjbldrmrpst()        A681AFA10 ? 000000001 ?

另一个trace文件:

*** 2012-04-17 04:26:44.389
*** SESSION ID:(673.1) 2012-04-17 04:26:44.389
*** CLIENT ID:() 2012-04-17 04:26:44.389
*** SERVICE NAME:(SYS$BACKGROUND) 2012-04-17 04:26:44.389
*** MODULE NAME:() 2012-04-17 04:26:44.389
*** ACTION NAME:() 2012-04-17 04:26:44.389
Dump continued FROM file: /app/diag/rdbms/orcl/orcl1/trace/orcl1_lms1_8678.trc
ORA-00600: internal error code, arguments: [kjbmprlst:shadow], [], [], [], [], [], [], [], [], [], [], []
========= Dump FOR incident 432578 (ORA 600 [kjbmprlst:shadow]) ========
----- Beginning of Customized Incident Dump(s) -----
 FUSION MSG 0xffffffff79c40b80,39 FROM 2 spnum 14 ver[38,11161] ln 144 sq[2,8]
        REPLAY 1 [0x103699.c7, 151132.0] c[0x7e7bd3240,55] [0x494e,x38]
        GRANT 2 CONVERT 0 ROLE x0
        pi [0x0.0x0] flags 0x0 state 0x100
        disk scn 0x0.0 writereq scn 0x0.0 rreqid x0
        msgRM# 11161 bkt# 18131 drmbkt# 18131
    pkey 151132.0 undo 0 stat 5 masters[32768, 2->32768] reminc 38 RM# 11152
 flg x0 TYPE x0 afftime x8517cf38
 nreplays BY lms 0 = 4046 
 nreplays BY lms 1 = 4105 
 nreplays BY lms 2 = 4176 
 nreplays BY lms 3 = 4214 
 nreplays BY lms 4 = 4158 
 nreplays BY lms 5 = 4162 
   hv 125 [stat 0x0, 1->1, wm 32768, RMno 0, reminc 36, dom 0]
   kjga st 0x4, step 0.36.0, cinc 38, rmno 11161, flags 0x20
   lb 16384, hb 32767, myb 18131, drmb 18131, apifrz 1
 FUSION MSG DUMP END
 GCS RESOURCE 0xbb93a40e8 hashq [0xba8f40298,0xc27d16700] name[0x103699.c7] pkey 151008.0
   GRANT 0xb99d64f38 cvt 0x0 send 0x0@1,0 WRITE 0x0,0@65536
   flag 0x2 mdrole 0x1 mode 1 scan 0.0 ROLE LOCAL
   disk: 0x0000.00000000 WRITE: 0x0000.00000000 cnt 0x0 hist 0x0
   xid 0x0000.000.00000000 sid 1 pkwait 0s rmacks 0
   refpcnt 0 weak: 0x0000.00000000
   pkey 151008.0
   hv 125 [stat 0x0, 1->1, wm 32768, RMno 0, reminc 36, dom 0]
   kjga st 0x4, step 0.36.0, cinc 38, rmno 11161, flags 0x20
   lb 16384, hb 32767, myb 18131, drmb 18131, apifrz 1
   GCS SHADOW 0xb99d64f38,42 resp[0xbb93a40e8,0x103699.c7] pkey 151008.0
     GRANT 1 cvt 0 mdrole 0x1 st 0x100 lst 0x40 GRANTQ rl LOCAL
     master 1 owner 2 sid 1 remote[0x85fed2220,13] hist 0xb93e302087234c9f
     history 0x1f.0x19.0xd.0x39.0x8.0x4.0xc.0x1f.0x39.0x1.
     cflag 0x0 sender 0 flags 0x0 replay# 0 abast 0x0.x0.1 dbmap 0x0
     disk: 0x0000.00000000 WRITE request: 0x0000.00000000
     pi scn: 0x0000.00000000 sq[0xbb93a4118,0xbb93a4118]
     msgseq 0x1 updseq 0x0 reqids[13,0,0] infop 0x0 lockseq xf0d1
   GCS SHADOW END
 GCS RESOURCE END
----- End of Customized Incident Dump(s) -----
*** 2012-04-17 04:26:44.478
dbkedDefDump(): Starting incident DEFAULT dumps (flags=0x2, level=3, mask=0x0)
----- SQL Statement (None) -----
CURRENT SQL information unavailable - no cursor.
----- Call Stack Trace -----
calling              CALL     entry                argument VALUES IN hex      
location             TYPE     point                (? means dubious VALUE)     
-------------------- -------- -------------------- ----------------------------
ksedst1()+96         CALL     skdstdst()           FFFFFFFF7FFF4D20 ?
                                                   100670460 ? 000000000 ?
                                                   00000000A ? 000000001 ?
                                                   10BD552E0 ?
ksedst()+60          CALL     ksedst1()            000000000 ? 000000001 ?
                                                   00010C1D1 ? 00010C000 ?
                                                   10C1CA000 ? 00010C1CA ?
dbkedDefDump()+2032  CALL     ksedst()             000000000 ? 10B21A000 ?
                                                   10B21AA90 ? 10C1D2000 ?
                                                   00010B000 ? 00010C1D2 ?
dbgexPhaseII()+1800  PTR_CALL dbkedDefDump()       000000003 ? 000000002 ?
                                                   10A6ABAA8 ? 0000014B0 ?
                                                   10C1C9000 ? 000000003 ?
dbgexExplicitEndInc  CALL     dbgexPhaseII()       10C373D30 ?
()+728                                             FFFFFFFF7A634920 ?
                                                   FFFFFFFF7FFF90FC ?
                                                   0018E0001 ? 10A6A2D98 ?
                                                   000001C00 ?
dbgeEndDDEInvocatio  CALL     dbgexExplicitEndInc  10A6A2C50 ?
nImpl()+704                   ()                   FFFFFFFF7A634920 ?
                                                   FFFFFFFF7FFF9048 ?
                                                   FFFFFFFF7FFFC740 ?
                                                   000000000 ?
                                                   FFFFFFFFFE4E26A0 ?
kjbmprlst()+13504    CALL     dbgeEndDDEInvocatio  10C373D30 ? 001B1D800 ?
                              n()                  FFFFFFFFFEC0AF31 ?
                                                   FFFFFFFF7FFFC740 ?
                                                   0013F5000 ? 0018E0001 ?
kjmxmpm()+796        PTR_CALL kjbmprlst()          101782000 ? 00010C1CA ?
                                                   10C1EA000 ? 10C1CA000 ?
                                                   10A6A3000 ? 10A6A3000 ?
kjmpbmsg()+4584      CALL     kjmxmpm()            00010A400 ? 000000000 ?
                                                   0852DA2C5 ? 00010C000 ?
                                                   10A7EE000 ? BE22AF0C0 ?
kjmsm()+11308        CALL     kjmpbmsg()           00010A400 ? 00000009C ?
                                                   00010C000 ? 10A7EE000 ?
                                                   000000001 ? 000000027 ?
ksbrdp()+1236        PTR_CALL kjmsm()              000001888 ? 25916872D1 ?
                                                   000002000 ? 000000000 ?
                                                   00000024B ? 000001000 ?
opirip()+1008        CALL     ksbrdp()             10BB56000 ? BD8C0B680 ?
                                                   000000001 ? 000001400 ?
                                                   00010B800 ? 10AC212D8 ?
opidrv()+780         CALL     opirip()             10A6A3000 ? 380013D50 ?
                                                   000380002 ? 3800055C0 ?
                                                   380002000 ? 00010C000 ?
sou2o()+92           CALL     opidrv()             000000032 ? 000000004 ?
                                                   FFFFFFFF7FFFF780 ?
                                                   0001EA190 ?
                                                   FFFFFFFF7AF42F10 ?
                                                   FFFFFFFF7FFFFBB8 ?
opimai_real()+516    CALL     sou2o()              FFFFFFFF7FFFF758 ?

可以看到,两个TRACE文件也非常接近,而且连报错的前几个堆栈函数的名称都完全一样。
查询MOS,确认为Bug 12834027 ORA-600 [kjbmprlst:shadow] / ORA-600 [kjbrasr:pkey] with RAC read mostly locking,这个问题在最新的11.2.0.3.1PSU中被FIXED,除了打补丁之外,还可以考虑通过隐含参数”_gc_read_mostly_locking”=FALSE来禁止READ-MOSTLY OBJECT LOCKING。此外,禁止DRM也可以避免该错误的产生。

ORA-600(qerrmOStart2)错误

客户11.2.0.2 RAC for Solaris 10出现ORA-600错误。
错误信息为:

Fri May 11 14:22:45 2012
Errors IN file /oracle/diag/rdbms/orcl/orcl2/trace/orcl2_ora_25552.trc (incident=194913):
ORA-00600: 内部错误代码, 参数: [qerrmOStart2], [1740], [ORA-01740: 标识符中缺失双引号
], [], [], [], [], [], [], [], [], []
Incident details IN: /oracle/diag/rdbms/orcl/orcl2/incident/incdir_194913/orcl2_ora_25552_i194913.trc
Fri May 11 14:23:29 2012
Fri May 11 14:23:29 2012
Dumping diagnostic DATA IN directory=[cdmp_20120511142329], requested BY (instance=2, osid=25552), summary=[incident=194913].USE ADRCI OR Support Workbench t
o package the incident.
See Note 411.1 at My Oracle Support FOR error AND packaging details.
Exception [TYPE: SIGSEGV, Stack Overflow] [ADDR:0x8] [PC:0x10611C144, qerrmOdcl()+36] [flags: 0x0, COUNT: 1]
Errors IN file /oracle/diag/rdbms/orcl/orcl2/trace/orcl2_ora_25552.trc (incident=194914):
ORA-07445: 出现异常错误: 核心转储 [qerrmOdcl()+36] [SIGSEGV] [ADDR:0x8] [PC:0x10611C144] [Stack Overflow] []
ORA-00600: 内部错误代码, 参数: [qerrmOStart2], [1740], [ORA-01740: 标识符中缺失双引号
], [], [], [], [], [], [], [], [], []
Incident details IN: /oracle/diag/rdbms/orcl/orcl2/incident/incdir_194914/orcl2_ora_25552_i194914.trc
USE ADRCI OR Support Workbench TO package the incident.
See Note 411.1 at My Oracle Support FOR error AND packaging details.
Fri May 11 14:23:30 2012
Sweep [inc][194913]: completed
Sweep [inc][194914]: completed
Sweep [inc2][194913]: completed

虽然这个操作在MOS没有相关描述,但是根据错误信息分析,这个ORA-600只是标识作用,真正的错误是随后参数中的ORA-1740错误。而这个错误信息更像是一个语句分析的错误。

$ more /oracle/diag/rdbms/orcl/orcl2/incident/incdir_194913/orcl2_ora_25552_i194913.trc
Dump file /oracle/diag/rdbms/orcl/orcl2/incident/incdir_194913/orcl2_ora_25552_i194913.trc
Oracle DATABASE 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
WITH the Partitioning, REAL Application Clusters, OLAP, DATA Mining
AND REAL Application Testing options
ORACLE_HOME = /oracle/product/11.2.0/dbhome_1
System name:    SunOS
Node name:      orcl2
Release:        5.10
Version:        Generic_147440-02
Machine:        sun4u
Instance name: orcl2
Redo thread mounted BY this instance: 2
Oracle process NUMBER: 364
Unix process pid: 25552, image: oracle@orcl2
 
 
*** 2012-05-11 14:22:45.579
*** SESSION ID:(1590.33807) 2012-05-11 14:22:45.579
*** CLIENT ID:() 2012-05-11 14:22:45.579
*** SERVICE NAME:(orcl) 2012-05-11 14:22:45.579
*** MODULE NAME:(PL/SQL Developer) 2012-05-11 14:22:45.579
*** ACTION NAME:(SQL Window - NEW) 2012-05-11 14:22:45.579
 
Dump continued FROM file: /oracle/diag/rdbms/orcl/orcl2/trace/orcl2_ora_25552.trc
ORA-00600: 内部错误代码, 参数: [qerrmOStart2], [1740], [ORA-01740: 标识符中缺失双引号
], [], [], [], [], [], [], [], [], []
 
========= Dump FOR incident 194913 (ORA 600 [qerrmOStart2]) ========
 
*** 2012-05-11 14:22:45.606
dbkedDefDump(): Starting incident DEFAULT dumps (flags=0x2, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=2z1m6fp3p7qjd) -----
SELECT o_no,
       c_state,
       d_state,
       D_CONTENT,
       O_TIME,
       o_id,
       O_NAME,
       l_name,
       l_order
 FROM
(SELECT b.o_no,a.c_state,'' d_state, a.D_CONTENT,a.O_TIME,a.o_id,a.O_NAME,
       l_name,l_order,
       ROW_NUMBER() OVER(PARTITION BY a.O_ID ORDER BY a.O_TIME DESC) RN
  FROM T_O_D@DB_A A, T_B_O@DB_A B,t_b_o_l@db_a c
 WHERE A.O_ID = B.O_ID
   AND a.o_id = c.o_id
   AND A.O_TIME >= TRUNC(SYSDATE - 1)
   AND A.O_TIME < TRUNC(SYSDATE)
   AND NOT EXISTS (SELECT 1
          FROM T_B_O_E@DB_A O
         WHERE O.O_ID = A.O
           AND O.O_TIME >= TRUNC(SYSDATE - 1)
           AND O.O_TIME < TRUNC(SYSDATE))
    UNION ALL
SELECT b.o_no,a.c_state, 'TK',a.s_remark,a.O_TIME,a.o_id,a.O_NAME,
       l_name,l_order,
       ROW_NUMBER() OVER(PARTITION BY a.O_ID ORDER BY a.O_TIME DESC) RN
  FROM T_B_O_E@DB_A A, T_B_O@DB_A B,t_b_o_l@db_a c
 WHERE A.O_ID = B.O_ID
   AND a.o_id = c.o_id
   AND A.O_TIME >= TRUNC(SYSDATE - 1)
   AND A.O_TIME < TRUNC(SYSDATE)
   AND b.r_time >= TRUNC(SYSDATE - 1)
   AND b.r_time < TRUNC(SYSDATE)
   AND b.p_state = '2'
   AND a.e_type = '14'
    UNION ALL
SELECT b.o_no,a.c_state,'TD', a.s_remark,a.O_TIME,a.o_id,a.O_NAME,
       l_name,l_order,
       ROW_NUMBER() OVER(PARTITION BY a.O_ID ORDER BY a.O_TIME DESC) RN
  FROM T_B_O_E@DB_A  A, T_B_O@DB_A B,t_b_o_l@db_a c
 WHERE A.O_ID = B.O_ID
   AND a.o_id = c.o_id
   AND A.O_TIME >= TRUNC(SYSDATE - 1)
   AND A.O_TIME < TRUNC(SYSDATE)
   AND b.c_tag = '1'
   AND a.e_type = '04'
UNION ALL
SELECT to_char(P_ID), STATE, DECODE(STATE, '1', 'S', '2', 'ZDTD'), NULL,
       U_TIME, NULL, NULL, NULL, NULL, 1
  FROM T_B_B_I@DB_A A
 WHERE STATE IN ('1', '2')
   AND SCODE = 'EMAL'
   AND U_TIME >= TRUNC(SYSDATE - 1)
   AND U_TIME < TRUNC(SYSDATE))
   WHERE rn = 1
 
----- Call Stack Trace -----
calling              CALL     entry                argument VALUES IN hex      
location             TYPE     point                (? means dubious VALUE)     
-------------------- -------- -------------------- ----------------------------
ksedst1()+96         CALL     skdstdst()           FFFFFFFF7FFEEED0 ?
                                                   1006B0C80 ? 000000000 ?
                                                   00000000A ? 000000001 ?
                                                   10BD95B00 ?
ksedst()+60          CALL     ksedst1()            000000000 ? 000000001 ?
                                                   00010C212 ? 00010C000 ?
                                                   10C20A000 ? 00010C20A ?
dbkedDefDump()+2032  CALL     ksedst()             000000000 ? 10B25B000 ?
                                                   10B25B2B0 ? 10C212000 ?
                                                   00010B000 ? 00010C212 ?
dbgexPhaseII()+1800  PTR_CALL dbkedDefDump()       000000003 ? 000000002 ?
                                                   10A6EC2C8 ? 0000014B0 ?
                                                   10C20A000 ? 000000003 ?
dbgexProcessError()  CALL     dbgexPhaseII()       10C3B4650 ?
+1248                                              FFFFFFFF7BE3A578 ?
                                                   FFFFFFFF7FFF3C28 ?
                                                   0018E0000 ? 10A6E35B8 ?
                                                   000001C00 ?
dbgePostErrorKGE()+  CALL     dbgeExecuteForError  10AE0C3FD ?
1320                          ()                   FFFFFFFFFEC0B62D ?
                                                   001050000 ?
                                                   FFFFFFFF7FFF63D8 ?
                                                   001060000 ? 000000028 ?
dbkePostKGE_kgsf()+  CALL     dbgePostErrorKGE()   10C20AC90 ? 000000000 ?
44                                                 FFFFFFFF7BE3A578 ?
                                                   000000000 ? 000000258 ?
                                                   00010C000 ?
kgerinv_internal()+  CALL     kgeadse()            10C20AC90 ?
72                                                 FFFFFFFF7BC22F20 ?
                                                   000000258 ? 000002868 ?
                                                   10A6E4000 ? 00010A6E4 ?
kgerinv()+40         CALL     kgerinv_internal()   10C20AC90 ? 004EA2360 ?
                                                   10BC2F628 ? 000000258 ?
                                                   000000000 ? 000000002 ?
kgesinv()+20         CALL     kgerinv()            10C20AC90 ?
                                                   FFFFFFFF7BC22F20 ?
                                                   10BC2F628 ? 000000002 ?
                                                   FFFFFFFF7FFF7840 ?
                                                   000001400 ?
ksesin()+92          CALL     kgesinv()            10C20AC90 ?
                                                   FFFFFFFF7BC22F20 ?
                                                   10BC2F628 ? 000000002 ?
                                                   FFFFFFFF7FFF7840 ?
                                                   00010C212 ?
OCIKSIN()+412        CALL     ksesin()             10BC2F628 ? 10C212000 ?
                                                   00010C000 ? 00010C000 ?
                                                   00010C20A ? 00010C212 ?
qerrmOStart()+516    CALL     OCIKSIN()            FFFFFFFFFFFFFFFF ?
                                                   0000006CC ? 10BC2F628 ?
                                                   00010BA9D ? 10BA9D890 ?
                                                   FFFFFFFF7FFF786C ?
qerrmStart()+1528    CALL     qerrmOStart()        10E603CD58 ?
                                                   FFFFFFFF7BC68170 ?
                                                   000000000 ?
                                                   FFFFFFFF7DF91088 ?
                                                   000106000 ?
                                                   FFFFFFFF7BC67F2C ?
selexe0()+976        PTR_CALL qerrmStart()         FFFFFFFF7BC68170 ?
                                                   000000003 ? 10E603CD58 ?
                                                   FFFFFFFF7BC75250 ?
                                                   000000001 ?
                                                   FFFFFFFF7DFBE5C8 ?
opiexe()+11664       CALL     selexe0()            FFFFFFFF7BC367C8 ?
                                                   10C21FAE0 ?
                                                   FFFFFFFF7BC41C90 ?
                                                   000000000 ? 10E60B2EE0 ?
                                                   10C21F000 ?
kpoal8()+2368        CALL     opiexe()             000000049 ? 000000003 ?
                                                   FFFFFFFF7FFF9E8C ?
                                                   000000000 ? 000000000 ?
                                                   0BFFFFFFF ?
opiodr()+1428        PTR_CALL kpoal8()             00000005E ? 00000001C ?
                                                   FFFFFFFF7FFFD348 ?
                                                   00010C000 ? 10C20A000 ?
                                                   000001648 ?
ttcpip()+1056        PTR_CALL opiodr()             00010A795 ? 00000001C ?
                                                   103EAD460 ? 00010A400 ?
                                                   000001400 ? 10C20A000 ?
opitsk()+1528        CALL     ttcpip()             000000000 ? 10A6C7694 ?
                                                   10C20AC90 ?
                                                   FFFFFFFF7FFFD348 ?
                                                   FFFFFFFF7FFFBD90 ?
                                                   10C221848 ?
opiino()+1000        CALL     opitsk()             10A6C7694 ? 10C226C98 ?
                                                   10C221654 ? 10C21F958 ?
                                                   000000000 ? 10C20A950 ?
opiodr()+1428        PTR_CALL opiino()             00010C000 ? 10C2216D0 ?
                                                   10C2216D0 ? 000380000 ?
                                                   0000000EB ?
                                                   FFFFFFFF7FFFECA0 ?
opidrv()+1100        CALL     opiodr()             10C221000 ? 000000004 ?
                                                   1035DD740 ? 00010C000 ?
                                                   000001400 ? 10C20A000 ?
sou2o()+92           CALL     opidrv()             00000003C ? 000000004 ?

显然这是PL/DEVELOPER执行或解析语句的时候引发的问题,本质上是一个普通的编译错误,但是Oracle将其作为内部ORA-600错误报了出来。
对于这个错误可以简单的忽略。

ORA-600(qkaffsindex5)错误

客户11.2.0.2 RAC for Solaris10上的错误。
错误信息为:

2012-05-04 22:00:04.768000 +08:00
BEGIN automatic SQL Tuning Advisor run FOR special tuning task "SYS_AUTO_SQL_TUNING_TASK"
2012-05-04 22:00:17.279000 +08:00
Errors IN file /oracle/diag/rdbms/orcl/orcl1/trace/orcl1_j002_5730.trc (incident=231810):
ORA-00600: internal error code, arguments: [qkaffsindex5], [], [], [], [], [], [], [], [], [], [], []
Incident details IN: /oracle/diag/rdbms/orcl/orcl1/incident/incdir_231810/orcl1_j002_5730_i231810.trc
2012-05-04 22:01:38.428000 +08:00
Dumping diagnostic DATA IN directory=[cdmp_20120504220138], requested BY (instance=1, osid=5730 (J002)), summary=[incident=231810].
USE ADRCI OR Support Workbench TO package the incident.
See Note 411.1 at My Oracle Support FOR error AND packaging details.
Sweep [inc][231810]: completed
Sweep [inc2][231810]: completed

查询详细信息:

$ more /oracle/diag/rdbms/orcl/orcl1/incident/incdir_231810/orcl1_j002_5730_i231810.trc
Dump file /oracle/diag/rdbms/orcl/orcl1/incident/incdir_231810/orcl1_j002_5730_i231810.trc
Oracle DATABASE 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
WITH the Partitioning, REAL Application Clusters, OLAP, DATA Mining
AND REAL Application Testing options
ORACLE_HOME = /oracle/product/11.2.0/dbhome_1
System name:    SunOS
Node name:      racdb1
Release:        5.10
Version:        Generic_142900-14
Machine:        sun4u
Instance name: orcl1
Redo thread mounted BY this instance: 1
Oracle process NUMBER: 129
Unix process pid: 5730, image: oracle@racdb1 (J002)
 
 
*** 2012-05-04 22:00:17.299
*** SESSION ID:(25.62649) 2012-05-04 22:00:17.299
*** CLIENT ID:() 2012-05-04 22:00:17.299
*** SERVICE NAME:(SYS$USERS) 2012-05-04 22:00:17.299
*** MODULE NAME:(DBMS_SCHEDULER) 2012-05-04 22:00:17.299
*** ACTION NAME:(ORA$AT_SQ_SQL_SW_6463) 2012-05-04 22:00:17.299
 
Dump continued FROM file: /oracle/diag/rdbms/orcl/orcl1/trace/orcl1_j002_5730.trc
ORA-00600: internal error code, arguments: [qkaffsindex5], [], [], [], [], [], [], [], [], [], [], []
 
========= Dump FOR incident 231810 (ORA 600 [qkaffsindex5]) ========
 
*** 2012-05-04 22:00:17.319
dbkedDefDump(): Starting incident DEFAULT dumps (flags=0x2, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=c9nhsv0f2b021) -----
/* SQL Analyze(25,1) */ SELECT MENU_ID,MENU_NAME,PROV_CODE FROM VA_MENU WHERE STATUS = :1  AND (PROV_CODE = '098' OR PROV_CODE = :2 ) ORDER BY PROV_CODE DESC
, SEQ_NUM
----- PL/SQL Stack -----
----- PL/SQL Call Stack -----
  object      line  object
  handle    NUMBER  name
9e289c2e0     11816  package body SYS.DBMS_SQLTUNE_INTERNAL
9f7693938         7  SYS.WRI$_ADV_SQLTUNE
9e86e5c88       587  package body SYS.PRVT_ADVISOR
9e86e5c88      2655  package body SYS.PRVT_ADVISOR
5a34f0858       241  package body SYS.DBMS_ADVISOR
9e5ef9668       821  package body SYS.DBMS_SQLTUNE
9e8456960         4  anonymous block
 
----- Call Stack Trace -----
calling              CALL     entry                argument VALUES IN hex      
location             TYPE     point                (? means dubious VALUE)     
-------------------- -------- -------------------- ----------------------------
ksedst1()+96         CALL     skdstdst()           FFFFFFFF7FFD27F0 ?
                                                   100670460 ? 000000000 ?
                                                   00000000A ? 000000001 ?
                                                   10BD552E0 ?
ksedst()+60          CALL     ksedst1()            000000000 ? 000000001 ?
                                                   00010C1D1 ? 00010C000 ?
                                                   10C1CA000 ? 00010C1CA ?
dbkedDefDump()+2032  CALL     ksedst()             000000000 ? 10B21A000 ?
                                                   10B21AA90 ? 10C1D2000 ?
                                                   00010B000 ? 00010C1D2 ?
dbgexPhaseII()+1800  PTR_CALL dbkedDefDump()       000000003 ? 000000002 ?
                                                   10A6ABAA8 ? 0000014B0 ?
                                                   10C1C9000 ? 000000003 ?
dbgexProcessError()  CALL     dbgexPhaseII()       10C373D30 ?
+1248                                              FFFFFFFF7A632830 ?
                                                   FFFFFFFF7FFD7548 ?
                                                   0018E0000 ? 10A6A2D98 ?
                                                   000001C00 ?
dbgePostErrorKGE()+  CALL     dbgeExecuteForError  10ADCBBDD ?
1320                          ()                   FFFFFFFFFEC0B62D ?
                                                   001050000 ?
                                                   FFFFFFFF7FFD9CF8 ?
                                                   001060000 ? 000000028 ?
dbkePostKGE_kgsf()+  CALL     dbgePostErrorKGE()   10C1CA3E0 ? 000000000 ?
44                                                 FFFFFFFF7A632830 ?
                                                   000000000 ? 000000258 ?
                                                   00010C000 ?
kgerinv_internal()+  CALL     kgeadse()            10C1CA3E0 ?
72                                                 FFFFFFFF7A63ADC0 ?
                                                   000000258 ? 000002868 ?
                                                   10A6A3000 ? 00010A6A3 ?
kgerinv()+40         CALL     kgerinv_internal()   10C1CA3E0 ? 004EA2360 ?
                                                   10B77E7B0 ? 000000258 ?
                                                   000000000 ? 000000000 ?
kgeasnmierr()+28     CALL     kgerinv()            10C1CA3E0 ?
                                                   FFFFFFFF7A63ADC0 ?
                                                   10B77E7B0 ? 000000000 ?
                                                   FFFFFFFF7FFDB0C0 ?
                                                   000001400 ?
qkaffsindex()+7648   CALL     kgeasnmierr()        10C1CA3E0 ?
                                                   FFFFFFFF7A63ADC0 ?
                                                   10B77E7B0 ? 000000000 ?
                                                   10C1CA000 ? 00010C1D1 ?
qkatab()+4060        CALL     qkaffsindex()        FFFFFFFF7A03ACB8 ?

错误发生在SQL_TUNING的过程中,那么多半是Oracle的bug。查询MOS,果然发现文档Bug 12869386 : DBMS_SQLTUNE.EXECUTE_TUNING_TASK REPORTS ORA-600 [QKAFFSINDEX5]记录了这个问题,不过Oracle目前虽然确认了这个bug,但是还没有提供明确的解决这个问题的方案。
好在问题发生在SQLTUNE功能上,即使失败对于数据库的运行也没有影响。

ORA-7445(dbgrlWriteAlertDetail_int)和ORA-4030导致实例崩溃

客户11.2.0.2 RAC for Solaris 10 sparc单实例出现ORA-7445和ORA-4030操作,导致实例崩溃。
这个错误比较严重:

2012-05-04 02:02:26.403000 +08:00
Archived Log entry 949 added FOR thread 1 SEQUENCE 518 ID 0x70a64e83 dest 1:
Archived Log entry 950 added FOR thread 1 SEQUENCE 519 ID 0x70a64e83 dest 1:
Thread 1 advanced TO log SEQUENCE 521 (after internal thread enable)
Thread 2 opened at log SEQUENCE 441
CURRENT log# 4 seq# 441 mem# 0: /orcldata1/orcl/redo04.log
CURRENT log# 4 seq# 441 mem# 1: /orcldata2/orcl/redo04.log
Successful OPEN OF redo thread 2
MTTR advisory IS disabled because FAST_START_MTTR_TARGET IS NOT SET
SMON: enabling cache recovery
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
2012-05-04 02:02:27.570000 +08:00
[374] Successfully onlined Undo Tablespace 5.
Undo initialization finished serial:0 START:903149023 END:903149480 diff:457 (4 seconds)
Verifying file header compatibility FOR 11g tablespace encryption..
Verifying 11g file header compatibility FOR tablespace encryption completed
Redo thread 1 internally disabled at seq 521 (CKPT)
SMON: enabling tx recovery
DATABASE Characterset IS ZHS16GBK
Archived Log entry 951 added FOR thread 1 SEQUENCE 520 ID 0x70a64e83 dest 1:
ARC3: Archiving disabled thread 1 SEQUENCE 521
Archived Log entry 952 added FOR thread 1 SEQUENCE 521 ID 0x70a64e83 dest 1:
No Resource Manager plan active
minact-scn: Inst 2 IS now the master inc#:2 mmon proc-id:326 STATUS:0x7
minact-scn STATUS: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000
2012-05-04 02:02:29.806000 +08:00
Starting background process GTX0
GTX0 started WITH pid=72, OS id=660
Starting background process RCBG
RCBG started WITH pid=73, OS id=662
replication_dependency_tracking turned off (no async multimaster replication found)
Thread 2 advanced TO log SEQUENCE 442 (LGWR switch)
CURRENT log# 3 seq# 442 mem# 0: /orcldata1/orcl/redo03.log
CURRENT log# 3 seq# 442 mem# 1: /orcldata2/orcl/redo03.log
Archived Log entry 953 added FOR thread 2 SEQUENCE 441 ID 0x70a64e83 dest 1:
2012-05-04 02:02:31.733000 +08:00
Starting background process QMNC
2012-05-04 02:02:37.268000 +08:00
Exception [TYPE: SIGSEGV, Address NOT mapped TO object] [ADDR:0xFFFFFFFF7FFF5EF8] [PC:0x108068724, dbgrlWriteAlertDetail_int()+132] [flags: 0x0, COUNT: 1]
Exception [TYPE: SIGSEGV, Address NOT mapped TO object] [ADDR:0xFFFFFFFF7FFF5FF4] [PC:0xFFFFFFFF7BD00F34, _memset()+52] [flags: 0x0, COUNT: 1]
2012-05-04 02:02:38.594000 +08:00
Errors IN file /opt/oracle/app/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_m000_666.trc (incident=156594):
ORA-07445: exception encountered: core dump [dbgrlWriteAlertDetail_int()+132] [SIGSEGV] [ADDR:0xFFFFFFFF7FFF5EF8] [PC:0x108068724] [Address NOT mapped TO obj
ect] []
ORA-04030: OUT OF process memory WHEN trying TO allocate 67108896 bytes (qesmmCheckPgaL,qesmmCheckPgaLimit:mem)
Incident details IN: /opt/oracle/app/oracle/diag/rdbms/orcl/orcl2/incident/incdir_156594/orcl2_m000_666_i156594.trc
USE ADRCI OR Support Workbench TO package the incident.
See Note 411.1 at My Oracle Support FOR error AND packaging details.
Errors IN file /opt/oracle/app/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_psp0_266.trc (incident=156026):
ORA-07445: exception encountered: core dump [_memset()+52] [SIGSEGV] [ADDR:0xFFFFFFFF7FFF5FF4] [PC:0xFFFFFFFF7BD00F34] [Address NOT mapped TO object] []
Incident details IN: /opt/oracle/app/oracle/diag/rdbms/orcl/orcl2/incident/incdir_156026/orcl2_psp0_266_i156026.trc
USE ADRCI OR Support Workbench TO package the incident.
See Note 411.1 at My Oracle Support FOR error AND packaging details.
2012-05-04 02:02:49.732000 +08:00
Exception [TYPE: SIGSEGV, Address NOT mapped TO object] [ADDR:0xFFFFFFFF7FFF2000] [PC:0x107D04724, dbgemdGetCallStackWFlag()+100] [flags: 0x0, COUNT: 1]
Errors IN file /opt/oracle/app/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_dia0_282.trc:
ORA-07445: exception encountered: core dump [dbgemdGetCallStackWFlag()+100] [SIGSEGV] [ADDR:0xFFFFFFFF7FFF2000] [PC:0x107D04724] [Address NOT mapped TO objec
t] []
ORA-04030: OUT OF process memory WHEN trying TO allocate 816 bytes (ksdhngmemctx_h,ksdhng:enod)
2012-05-04 02:02:55.200000 +08:00
Exception [TYPE: SIGSEGV, Address NOT mapped TO object] [ADDR:0xFFFFFFFF7FFF2000] [PC:0x107D04724, dbgemdGetCallStackWFlag()+100] [flags: 0x0, COUNT: 1]
Errors IN file /opt/oracle/app/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_diag_274.trc:
ORA-07445: exception encountered: core dump [dbgemdGetCallStackWFlag()+100] [SIGSEGV] [ADDR:0xFFFFFFFF7FFF2000] [PC:0x107D04724] [Address NOT mapped TO objec
t] []
ORA-04030: OUT OF process memory WHEN trying TO allocate 32128 bytes (pga heap,grpsvc msg)
2012-05-04 02:02:58.724000 +08:00
USE ADRCI OR Support Workbench TO package the incident.
See Note 411.1 at My Oracle Support FOR error AND packaging details.
USE ADRCI OR Support Workbench TO package the incident.
See Note 411.1 at My Oracle Support FOR error AND packaging details.
2012-05-04 02:04:01.919000 +08:00
PMON (ospid: 264): terminating the instance due TO error 490
2012-05-04 02:04:05.129000 +08:00
Instance TERMINATED BY PMON, pid = 264

数据库的启动操作还没完成,就碰到ORA-7445 [dbgrlWriteAlertDetail_int]错误,随后是ORA-04030错误,接着是ORA-07445[_memset]错误,最后是ORA-7445[dbgemdGetCallStackWFlag]错误。而这一系列的错误出现,最终导致了PMON结束了数据库实例。
从错误信息上看,和内存分配有关,但是数据库刚启动,怎么会连67M的内存都无法分配呢,查询MOS发现,原来是SWAP空间耗尽导致的,详细描述可以参考Instance crash ORA-7445 [_memset()+120] and ORA-4030 (QERHJ hash-joi,kllcqas:kllsltba) [ID 1071033.1]
检查系统的日志信息message:

May 4 02:02:14 orcl2 Had[5187]: [ID 702911 daemon.notice] VCS CRITICAL V-16-1-50086 Swap usage ON orcl2 IS 97%
May 4 02:02:33 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 666 (oracle)
May 4 02:02:33 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 674 (oracle)
May 4 02:02:33 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 668 (oracle)
May 4 02:02:34 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 266 (oracle)
May 4 02:02:35 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 666 (oracle)
May 4 02:02:35 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 668 (oracle)
May 4 02:02:35 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 674 (oracle)
May 4 02:02:35 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 266 (oracle)
May 4 02:02:36 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 666 (oracle)
May 4 02:02:36 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 674 (oracle)
May 4 02:02:36 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 668 (oracle)
May 4 02:02:36 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 266 (oracle)
May 4 02:02:37 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 666 (oracle)
May 4 02:02:37 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 668 (oracle)

果然找到大量的SWAP空间不足的告警,对于Solaris系统而言,清理/tmp空间,然后重启数据库,问题不再出现。

ORA-600(ksnpost:ksnigb)错误

客户10.2.0.5 RAC for HP-UX ia64出现ORA-600(ksnpost:ksnigb)的错误。
错误信息如下:

Wed Apr 25 16:32:21 UTC 2012
Errors IN file /oracle/app/admin/orcl/bdump/orcl2_ora_17014.trc:
ORA-00600: Message 600 NOT found; No message file FOR product=RDBMS, facility=ORA; arguments: [ksnpost:ksnigb]
Wed Apr 25 16:32:55 UTC 2012
Trace dumping IS performing id=[cdmp_20120425163255]
Wed Apr 25 16:32:56 UTC 2012
Errors IN file /oracle/app/admin/orcl/bdump/orcl2_ora_5511.trc:
ORA-12012: 自动执行作业 8938 出错
ORA-12801: 并行查询服务器 ORA-12801: 并行查询服务器 P122, instance orcldb2:orcl2 (2) 中发出错误信号
ORA-00600: Message 600 NOT found; No message file FOR product=RDBMS, facility=ORA; arguments: [ksnpost:ksnigb]
ORA-06512: 在 "SYS.DBMS_STATS", line 21275
中发出错误信号
Wed Apr 25 16:35:41 UTC 2012
Errors IN file /oracle/app/admin/orcl/bdump/orcl2_ora_19966.trc:
ORA-27041: unable TO OPEN file
HPUX-ia64 Error: 23: File TABLE overflow
Additional information: 44
Additional information: 65598
Additional information: 2734688
Wed Apr 25 16:35:41 UTC 2012
Errors IN file /oracle/app/admin/orcl/bdump/orcl2_ora_19966.trc:
ORA-27041: unable TO OPEN file
HPUX-ia64 Error: 23: File TABLE overflow
Additional information: 44
Additional information: 65598
Additional information: 2734688
Wed Apr 25 16:35:41 UTC 2012
Errors IN file /oracle/app/admin/orcl/bdump/orcl2_ora_19966.trc:
ORA-27041: unable TO OPEN file
HPUX-ia64 Error: 23: File TABLE overflow
Additional information: 44
Additional information: 65598
Additional information: 2734688

详细TRACE信息为:

orcldb2>more /oracle/app/admin/orcl/bdump/orcl2_ora_17014.trc
Dump file /oracle/app/admin/orcl/bdump/orcl2_ora_17014.trc
Oracle DATABASE 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
WITH the Partitioning, REAL Application Clusters, OLAP, DATA Mining
AND REAL Application Testing options
ORACLE_HOME = /oracle/app/product/10.2.0/db_1
System name:    HP-UX
Node name:      orcldb2
Release:        B.11.31
Version:        U
Machine:        ia64
Instance name: orcl2
Redo thread mounted BY this instance: 2
Oracle process NUMBER: 0
Unix process pid: 17014, image: oracle@orcldb2
Ioctl ASYNC_CONFIG error, errno = 1
*** SERVICE NAME:(SYS$USERS) 2012-04-25 16:32:21.472
*** SESSION ID:(377.37550) 2012-04-25 16:32:21.472
*** 2012-04-25 16:32:21.472
ksedmp: internal OR fatal error
ORA-00600: Message 600 NOT found; No message file FOR product=RDBMS, facility=ORA; arguments: [ksnpost:ksnigb]
CURRENT SQL statement FOR this SESSION:
SELECT /*+ parallel_index(t,"IDX_PRODUCT_ID",96) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring no_expand index_ffs(t,"
IDX_PRODUCT_ID") */ COUNT(*) AS nrw,COUNT(DISTINCT sys_op_lbid(82769,'L',t.rowid)) AS nlb,COUNT(DISTINCT "PRODUCT_ID") AS ndk,sys_op_countchg(substrb(t.rowid
,1,15),1) AS clf FROM "ORCL_UNICOM"."MESSAGE_LOG" sample block (   .4377057550,1)  t WHERE ("PRODUCT_ID" IS NOT NULL) AND (TBL$OR$IDX$PART$NUM("ORCL_UNICOM"."M
ESSAGE_LOG",0,4,0,"ROWID") = :objn)
----- Call Stack Trace -----
calling              CALL     entry                argument VALUES IN hex      
location             TYPE     point                (? means dubious VALUE)     
-------------------- -------- -------------------- ----------------------------
40000000040333a0     CALL     4000000004035b60     000000000 ? 000000001 ?
4000000004032b40     CALL     4000000004033360     000000000 ?
                                                   C000000000000D20 ?
                                                   4000000004032B40 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ?
400000000407abb0     CALL     40000000040322c0     000000003 ?
                                                   9FFFFFFFFFFEFC10 ?
                                                   60000000000BA268 ?
                                                   9FFFFFFFFFFF01E0 ?
                                                   C000000000000999 ?
                                                   400000000407ABB0 ?
40000000098c33d0     CALL     400000000407ab40     9FFFFFFFFFFF0770 ?
                                                   000000003 ?
                                                   9FFFFFFFFFFF01F0 ?
                                                   60000000000BA268 ?
                                                   C000000000000612 ?
                                                   40000000098C33D0 ?
40000000098c41c0     CALL     40000000098c32a0     60000000000318D0 ?
                                                   4000000001ADC4E0 ?
                                                   6000000000032988 ?
                                                   4000000001ADC4E0 ?
                                                   9FFFFFFFFFFF07A0 ?
4000000004030400     CALL     40000000098c4180     60000000000318D0 ?
                                                   9FFFFFFFFD364638 ?
                                                   9FFFFFFFFD364648 ?
                                                   6000000000032D00 ?
                                                   9FFFFFFFFFFF07A0 ?
4000000002071430     CALL     4000000004030350     4000000000F67410 ?
                                                   000000000 ?
                                                   60000000000C6C98 ?
40000000030c7ce0     CALL     40000000020713d0     C000001FF2838160 ?
                                                   60000000000BA268 ?
                                                   C000000000000797 ?
                                                   40000000030C7CE0 ?
                                                   9FFFFFFFFFFF07DC ?
                                                   9FFFFFFFFFFF07E0 ?
                                                   9FFFFFFFFFFF07D0 ?
                                                   000000001 ?
40000000030c8940     CALL     40000000030c7a50     C0000020351ED5A0 ?

查询MOS,发现和问题文档Get Various ORA-600 Error Followed By “HPUX-ia64 Error: 24: Too many open files” [ID 1061351.1]描述得比较接近。虽然文档描述的问题是HPUX-ia64 Error: 24,而当前的错误是HPUX-ia64 Error: 23: File table overflow,但是二者的本质是一样的。
从错误信息上看65598已经超过了设置值65536,针对这个问题,可以通过调整内核参数的方式进行解决,可以参考文档How to Increase the Number of Open Files on HP-UX [ID 1264834.1]。
而对于当前的情况,问题发生在并行执行上,并行度开到96显然过大,将其降至并行度16应该就可以避免该错误。

在线索引重建被取消导致大量ORA-600(kghstack_underflow_internal_3)错误

客户在比较繁忙的时刻执行了索引的REBUILD操作,导致大量会话被锁,最终对ONLINE REBUILD操作执行了取消操作,引发了一系列更严重的ORA-600错误。
登录数据库后,发现两个节点上ORACLE_BASE所在目录都已经100%占用。数据库无法正常通过/ AS SYSDBA方式登录。
查询告警日志发现大量的ORA-600和ORA-7445错误:

Tue May 08 21:20:26 EAT 2012
Errors IN file /oracle/app/admin/orcl/udump/orcl2_ora_1555.trc:
ORA-07445: exception encountered: core dump [0000000000000000] [SIGSEGV] [Invalid permissions FOR mapped object] [0x000000000] [] []
Tue May 08 21:20:27 EAT 2012
Errors IN file /oracle/app/admin/orcl/udump/orcl2_ora_3891.trc:
ORA-00600: internal error code, arguments: [kghstack_underflow_internal_3], [0x60000000003002F0], [keybuf], [], [], [], [], []
Tue May 08 21:20:27 EAT 2012
Errors IN file /oracle/app/admin/orcl/udump/orcl2_ora_26190.trc:
ORA-07445: exception encountered: core dump [0000000000000000] [SIGSEGV] [Invalid permissions FOR mapped object] [0x000000000] [] []
Tue May 08 21:20:27 EAT 2012
Errors IN file /oracle/app/admin/orcl/udump/orcl2_ora_2873.trc:
ORA-07445: exception encountered: core dump [0000000000000000] [SIGSEGV] [Invalid permissions FOR mapped object] [0x000000000] [] []
Tue May 08 21:20:27 EAT 2012
Errors IN file /oracle/app/admin/orcl/udump/orcl2_ora_4518.trc:
ORA-00600: internal error code, arguments: [kghstack_underflow_internal_3], [0x60000000003002F0], [keybuf], [], [], [], [], []
Tue May 08 21:20:27 EAT 2012
Errors IN file /oracle/app/admin/orcl/udump/orcl2_ora_22469.trc:
ORA-00600: internal error code, arguments: [kghstack_underflow_internal_3], [0x60000000003002F0], [keybuf], [], [], [], [], []
Tue May 08 21:20:28 EAT 2012
Errors IN file /oracle/app/admin/orcl/udump/orcl2_ora_26440.trc:
ORA-07445: exception encountered: core dump [0000000000000000] [SIGSEGV] [Invalid permissions FOR mapped object] [0x000000000] [] []
Tue May 08 21:20:29 EAT 2012
Errors IN file /oracle/app/admin/orcl/udump/orcl2_ora_762.trc:
ORA-07445: exception encountered: core dump [0000000000000000] [SIGSEGV] [Invalid permissions FOR mapped object] [0x000000000] [] []
Tue May 08 21:20:29 EAT 2012
Errors IN file /oracle/app/admin/orcl/udump/orcl2_ora_26106.trc:
ORA-07445: exception encountered: core dump [0000000000000000] [SIGSEGV] [Invalid permissions FOR mapped object] [0x000000000] [] []
Tue May 08 21:20:30 EAT 2012
Errors IN file /oracle/app/admin/orcl/udump/orcl2_ora_1597.trc:
ORA-07445: 出现异常错误: 核心转储 [0000000000000000] [SIGSEGV] [Invalid permissions FOR mapped object] [0x000000000] [] []
Tue May 08 21:20:30 EAT 2012
Errors IN file /oracle/app/admin/orcl/udump/orcl2_ora_856.trc:
ORA-07445: exception encountered: core dump [0000000000000000] [SIGSEGV] [Invalid permissions FOR mapped object] [0x000000000] [] []
.
.
.

检查对应的TRACE文件,发现导致错误语句执行的是TABLE_A表的插入:

ksedmp: internal OR fatal error
ORA-00600: 内部错误代码, 参数: [kghstack_underflow_internal_3], [0x60000000003002F0], [keybuf], [], [], [], [], []
CURRENT SQL statement FOR this SESSION:
INSERT INTO TABLE_A (O_ID, P_ID, S_ID, F_ITEM, F_NAME, T_MON, D_MON, I_MON, P_STATE, P_TIME, R1, R2) VALUES (:B10 , SUBSTR(:B10 , LENGTH(:B10 ) - 1, 2), :B9 , :B8 , :B7 , :B6 , :B5 , :B4 , :B3 , NULL, :B2 , :B1 )
----- PL/SQL Call Stack -----
  object      line  object
  handle    NUMBER  name
c00000203c2cc550       119  package body U1.P_O_I
c00000203b788200       288  PROCEDURE U1.U_B_O_I
c00000203719b8d0         1  anonymous block
----- Call Stack Trace -----
calling              CALL     entry                argument VALUES IN hex      
location             TYPE     point                (? means dubious VALUE)     
-------------------- -------- -------------------- ----------------------------
ksedst()+64          CALL     ksedst1()            000000000 ? 000000001 ?
ksedmp()+2176        CALL     ksedst()             000000000 ?
                                                   C000000000000D20 ?
                                                   4000000004037940 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ?
ksfdmp()+112         CALL     ksedmp()             000000003 ?
                                                   9FFFFFFFFFFF1230 ?
                                                   60000000000BA290 ?
                                                   9FFFFFFFFFFF1800 ?
                                                   C000000000000999 ?
                                                   400000000407F9B0 ?
kgerinv()+304        CALL     ksfdmp()             9FFFFFFFFFFF1D90 ?
                                                   000000003 ?
                                                   9FFFFFFFFFFF1810 ?
                                                   60000000000BA290 ?
                                                   C000000000000612 ?
                                                   40000000098C38B0 ?
kgeasnmierr()+144    CALL     kgerinv()            60000000000318D0 ?
                                                   4000000001AD98A0 ?
                                                   6000000000032988 ?
                                                   4000000001AD98A0 ?
                                                   9FFFFFFFFFFF1DD0 ?
$cold_kghstack_unde  CALL     kgeasnmierr()        60000000000318D0 ?
rflow_internal()+36                                9FFFFFFFBF3B1168 ?
8                                                  9FFFFFFFBF3B1178 ?
                                                   6000000000032D00 ?
                                                   000000002 ?
                                                   60000000003002F0 ?
                                                   000000001 ? 000000006 ?
kghstack_free()+208  CALL     $cold_kghstack_unde  60000000000318D0 ?
                              rflow_internal()     60000000003002F0 ?
                                                   60000000000BA290 ?
                                                   C000000000000B1D ?
                                                   4000000002F7A510 ?
                                                   00002C87B ?
                                                   6000000000031A10 ?
ksmfrs()+48          CALL     kghstack_free()      60000000000318D0 ?
                                                   60000000003002F0 ?
rpiswu2()+1312       CALL     ksmfrs()             60000000003002F0 ?
                                                   C000000000001026 ?
                                                   4000000002F78960 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ?
rpidrv()+2352        CALL     rpiswu2()            9FFFFFFFFFFF2AF0 ?
                                                   4000000002F7AE60 ?
                                                   00002F833 ?
                                                   60000000000A7D20 ?
                                                   9FFFFFFFFFFF1E20 ?
                                                   C000000000001ABD ?
                                                   4000000000F14558 ?
                                                   60000000000C6CF0 ?
psddr0()+864         CALL     rpidrv()             000000018 ? 000000066 ?
                                                   9FFFFFFFFFFF3700 ?
                                                   000000038 ?
                                                   9FFFFFFFFFFF2B20 ?
                                                   60000000000BA290 ?
psdnal()+736         CALL     psddr0()             000000018 ? 000000066 ?
                                                   9FFFFFFFFFFF3700 ?
                                                   000000030 ?
pevm_EXECC()+832     CALL     psdnal()             9FFFFFFFFFFF54D0 ?
                                                   C00000203489A9F8 ?
                                                   C0000000000011AA ?
                                                   9FFFFFFFBE832840 ?
                                                   C00000203C2CC550 ?
                                                   400000000313C770 ?
                                                   000026035 ?
pfrinstr_EXECC()+16  CALL     pevm_EXECC()         9FFFFFFFBE83D1D0 ?
0                                                  9FFFFFFFBE8328B0 ?
                                                   000000020 ?
pfrrun_no_tool()+19  CALL     pfrinstr_EXECC()     9FFFFFFFBE83D1D0 ?
2                                                  C000001DA198D61C ?
                                                   9FFFFFFFBE83D238 ?
pfrrun()+1376        CALL     pfrrun_no_tool()     9FFFFFFFBE83D1D0 ?
                                                   000002001 ?
                                                   9FFFFFFFBE83D238 ?
                                                   60000000000BA290 ?
                                                   C000000000000A1C ?
                                                   4000000003163040 ?
                                                   9FFFFFFFBE83D620 ?
                                                   9FFFFFFFBE83D298 ?
plsql_run()+1328     CALL     pfrrun()             9FFFFFFFFFFF3788 ?
                                                   9FFFFFFFFFFF3770 ?
                                                   60000000000BA290 ?
                                                   9FFFFFFFFFFF4370 ?
                                                   9FFFFFFFFFFF4370 ?
                                                   C000000000000E23 ?
                                                   4000000002C34D70 ?
peicnt()+560         CALL     plsql_run()          9FFFFFFFFFFF4380 ?
                                                   000000001 ? 000000000 ?
                                                   9FFFFFFFFFFF4380 ?
                                                   60000000000BA290 ?
                                                   9FFFFFFFFFFF4E90 ?
kkxexe()+1008        CALL     peicnt()             9FFFFFFFFFFF54D0 ?
                                                   9FFFFFFFBE83D1D0 ?
                                                   9FFFFFFFFFFF4EA0 ?
                                                   60000000000BA290 ?
                                                   9FFFFFFFFFFF5420 ?
                                                   C000000000000A1C ?
                                                   000000020 ?
                                                   9FFFFFFFFFFF4F00 ?
opiexe()+8016        CALL     kkxexe()             0000051F0 ?
.
.
.

这个SQL语句是正常的业务操作,而这种简单的INSERT都会导致ORA-600和ORA-7445错误,说明插入的表或索引本身存在问题。
检查表和索引的状态,发现索引部分分区状态异常:

CREATE INDEX "U1"."IDX_F_2" 
ON "U1"."TABLE_A" ("S_ID")
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(
BUFFER_POOL DEFAULT) LOCAL
(PARTITION "P1"
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TBS7" ,
PARTITION "P2"
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TBS8" ,
PARTITION "P3"
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TBS9" ,
PARTITION "P4"
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TBS10" ,
PARTITION "P5"
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TBS1" ,
PARTITION "P6"
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TBS2" ,
PARTITION "P7"
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TBS3" ,
PARTITION "P8"
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TBS4" ,
PARTITION "P9"
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TBS5" ,
PARTITION "P10"
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TBS6" )
ALTER INDEX "U1"."IDX_F_2" MODIFY PARTITION "P6" UNUSABLE
ALTER INDEX "U1"."IDX_F_2" MODIFY PARTITION "P7" UNUSABLE

和客户沟通后,确认今天问题发生之前有人对索引进行维护操作,操作语句包括:

ALTER INDEX u1.IDX_F_2 rebuild partition p6 parallel
ALTER INDEX U1.IDX_T_2 rebuild partition P9 nologging parallel online
ALTER INDEX U1.IDX_T_1 rebuild partition P3 nologging parallel online
ALTER INDEX U1.IDX_F_2 rebuild partition P1 nologging parallel online
ALTER INDEX U1.IDX_T_4 rebuild partition P1 nologging parallel online
ALTER INDEX U1.IDX_F_4 rebuild partition P5 nologging parallel online

在对索引进行维护之后,数据库中出现了大量的enq: TM – contention等待,TM锁出现的原和是索引IDX_F_2的PARTITION P6重建语句没有添加ONLINE参数有直接关系。随后大量会话被阻塞,而整个索引重建操作被人为中止,其中包括正在运行的ONLINE REBUILD操作,而ONLINE REBUILD操作被中止是十分危险的,很容易导致bug的产生,而当前就导致了ORA-600和ORA-7445的产生。
由于客户着急解决问题,因此对数据库进行了重启。重启后,ORA-600错误信息不再出现。但是前台应用报错,索引失效。

SELECT 'ALTER INDEX U1.' || INDEX_NAME || ' REBUILD PARTITION ' || PARTITION_NAME || ';'
FROM DBA_IND_PARTITIONS
WHERE INDEX_OWNER = 'U1'
AND STATUS != 'USABLE';

获取所有失效的索引重建语句,将索引重新编译后,数据库恢复正常。
在Oracle的MOS上没有找到与之最相关的bug信息,反而是找到了一个相关性很高的补丁信息,而对应的bug信息并没有公开:补丁程序13737888: ONLINE DDL:ORA-600[KGHSTACK_UNDERFLOW_INTERNAL_3], [0X2B7F4E1E7678], [KEYBUF]。

RAC环境启动单实例报错ORA-1105

客户的4节点RAC环境,其中一个节点实例出现故障,发现无法正常启动。
检查CLUSTER和告警日志信息,发现节点1心跳超时,被踢出集群。服务器重新启动后,数据库实例没有自动启动。
告警日志信息为:

Mon Apr 16 03:42:39 2012
Thread 1 advanced TO log SEQUENCE 22348 (LGWR switch)
CURRENT log# 16 seq# 22348 mem# 0: +DATA/orcl/onlinelog/group_16.291.766326571
CURRENT log# 16 seq# 22348 mem# 1: +DATA/orcl/onlinelog/group_16.293.766330969
Mon Apr 16 15:02:58 2012
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Interface TYPE 1 eth1 10.0.0.0 configured FROM OCR FOR USE AS a cluster interconnect
Interface TYPE 1 eth0 192.168.1.0 configured FROM OCR FOR USE AS a public interface
Picked latch-free SCN scheme 3
Autotune OF undo retention IS turned ON.
LICENSE_MAX_USERS = 0
SYS auditing IS disabled
ksdpec: called FOR event 13740 prior TO event GROUP initialization
Starting up ORACLE RDBMS Version: 10.2.0.4.0.
System parameters WITH non-DEFAULT VALUES:
processes = 1500
sessions = 1655
sga_max_size = 19830669312
pre_page_sga = FALSE
lock_sga = FALSE
__shared_pool_size = 3674210304
__large_pool_size = 16777216
__java_pool_size = 16777216
__streams_pool_size = 33554432
spfile = +DATA/orcl/spfileorcl.ora
sga_target = 19830669312
control_files = +DATA/orcl/controlfile/CURRENT.274.720740395
db_block_size = 8192
__db_cache_size = 16072572928
compatible = 10.2.0.3.0
log_archive_dest_1 = LOCATION=+DATA/
log_archive_format = %t_%s_%r.dbf
db_file_multiblock_read_count= 16
cluster_database = TRUE
cluster_database_instances= 4
db_create_file_dest = +DATA
_gc_affinity_time = 0
_gc_affinity_limit = 10000000
_gc_affinity_minimum = 10000000
thread = 1
instance_number = 1
undo_management = AUTO
undo_tablespace = UNDOTBS1
remote_login_passwordfile= EXCLUSIVE
db_domain =
dispatchers = (PROTOCOL=TCP) (SERVICE=orclXDB)
local_listener = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.21)(PORT = 1521))
remote_listener = LISTENERS_ORCL
job_queue_processes = 20
cursor_sharing = FORCE
background_dump_dest = /u01/app/oracle/admin/orcl/bdump
user_dump_dest = /u01/app/oracle/admin/orcl/udump
core_dump_dest = /u01/app/oracle/admin/orcl/cdump
audit_file_dest = /u01/app/oracle/admin/orcl/adump
db_name = orcl
open_cursors = 1000
pga_aggregate_target = 5872025600
Cluster communication IS configured TO USE the following interface(s) FOR this instance
10.0.0.11
Mon Apr 16 15:02:59 2012
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
PMON started WITH pid=2, OS id=20588
DIAG started WITH pid=3, OS id=20590
PSP0 started WITH pid=4, OS id=20592
LMON started WITH pid=5, OS id=20594
LMD0 started WITH pid=6, OS id=20596
LMS0 started WITH pid=7, OS id=20603
LMS1 started WITH pid=8, OS id=20607
LMS2 started WITH pid=9, OS id=20611
LMS3 started WITH pid=10, OS id=20615
MMAN started WITH pid=11, OS id=20619
DBW0 started WITH pid=12, OS id=20621
DBW1 started WITH pid=13, OS id=20623
LGWR started WITH pid=14, OS id=20625
CKPT started WITH pid=15, OS id=20627
SMON started WITH pid=16, OS id=20629
RECO started WITH pid=17, OS id=20631
CJQ0 started WITH pid=18, OS id=20633
MMON started WITH pid=19, OS id=20635
Mon Apr 16 15:03:00 2012
starting up 1 dispatcher(s) FOR network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
MMNL started WITH pid=20, OS id=20637
Mon Apr 16 15:03:00 2012
starting up 1 shared server(s) ...
Mon Apr 16 15:03:02 2012
lmon registered WITH NM - instance id 1 (internal mem no 0)
Mon Apr 16 15:03:05 2012
Reconfiguration started (OLD inc 0, NEW inc 30)
List OF nodes:
0 1 2 3
Global Resource Directory frozen
* allocate DOMAIN 0, invalid = TRUE
Communication channels reestablished
* DOMAIN 0 valid according TO instance 3
* DOMAIN 0 valid = 1 according TO instance 1
Mon Apr 16 15:03:05 2012
Master broadcasted resource hash VALUE bitmaps
Non-LOCAL Process blocks cleaned OUT
Mon Apr 16 15:03:05 2012
LMS 0: 0 GCS shadows cancelled, 0 closed
Mon Apr 16 15:03:05 2012
LMS 1: 0 GCS shadows cancelled, 0 closed
Mon Apr 16 15:03:05 2012
LMS 2: 0 GCS shadows cancelled, 0 closed
Mon Apr 16 15:03:05 2012
LMS 3: 0 GCS shadows cancelled, 0 closed
SET master node info
Submitted ALL remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
ALL grantable enqueues GRANTED
Mon Apr 16 15:03:07 2012
LMS 2: 0 GCS shadows traversed, 0 replayed
Mon Apr 16 15:03:07 2012
LMS 1: 0 GCS shadows traversed, 0 replayed
Mon Apr 16 15:03:07 2012
LMS 3: 0 GCS shadows traversed, 0 replayed
Mon Apr 16 15:03:07 2012
LMS 0: 0 GCS shadows traversed, 0 replayed
Mon Apr 16 15:03:07 2012
Submitted ALL GCS remote-cache requests
Post SMON TO START 1st pass IR
Fix WRITE IN gcs resources
Reconfiguration complete
LCK0 started WITH pid=23, OS id=20699
Mon Apr 16 15:03:12 2012
ALTER DATABASE MOUNT
Mon Apr 16 15:03:12 2012
Starting background process ASMB
ASMB started WITH pid=25, OS id=20710
Starting background process RBAL
RBAL started WITH pid=26, OS id=20714
Mon Apr 16 15:03:17 2012
SUCCESS: diskgroup DATA was mounted
Mon Apr 16 15:03:21 2012
Setting recovery target incarnation TO 2
Mon Apr 16 15:03:21 2012
SUCCESS: diskgroup DATA was dismounted
Mon Apr 16 15:03:21 2012
ORA-1105 signalled during: ALTER DATABASE MOUNT...

这个ORA-1105错误只是说明当前实例的某些参数设置和RAC其他实例设置的不符,并不能说明导致错误的真正原因。
为了找到问题只有手工启动实例:

[oracle@rac1 ~]$ sqlplus / AS sysdba
SQL*Plus: Release 10.2.0.4.0 - Production ON 星期一 416 17:04:54 2012
Copyright (c) 1982, 2007, Oracle. ALL Rights Reserved.
Connected TO an idle instance.
SQL> STARTUP MOUNT
ORACLE instance started.
Total System Global Area 1.9831E+10 bytes
Fixed SIZE 2119216 bytes
Variable SIZE 3741321680 bytes
DATABASE Buffers 1.6073E+10 bytes
Redo Buffers 14655488 bytes
ORA-01105: mount IS incompatible WITH mounts BY other instances
ORA-01606: gc_files_to_locks NOT identical TO that OF another mounted instance

通过手工执行,可以了解具体导致错误产生的原因。不过gc_files_to_locks并没有设置为不同的值:

SQL> SHOW parameter gc_files_to_locks
NAME                TYPE                VALUE
------------------- ------------------- -----------------
gc_files_to_locks   string
SQL> SELECT sid, name, VALUE FROM v$spparameter WHERE name = 'gc_files_to_locks'; 
SID        NAME                           VALUE
---------- ------------------------------ --------------------------------
*          gc_files_to_locks

不过导致问题产生的确实与GC设置有关,问题并非是gc_files_to_locks参数导致,而是SPFILE中设置的_gc_affinity_time参数。这个参数是静态参数,只有重启后才能生效,而在SPFILE中设置后,会导致重启的实例1生效了该参数,因此和没有重启过的其他实例产生了不兼容。
解决方法有两个,一个是重启所有的节点,另外一个是去掉SPFILE中这个参数的设置:

SQL> ALTER system reset "_gc_affinity_time" scope = spfile sid = '*';
System altered.
SQL> shutdown immediate
ORA-01507: DATABASE NOT mounted
 
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.
Total System Global Area 1.9831E+10 bytes
Fixed SIZE 2119216 bytes
Variable SIZE 3741321680 bytes
DATABASE Buffers 1.6073E+10 bytes
Redo Buffers 14655488 bytes
DATABASE mounted.
SQL> ALTER DATABASE OPEN;
DATABASE altered.

10G开始Oracle区分物化视图和表

在9i以前,很多功能都是不区分表和物化视图的区别的,到了10g以后。很多功能会将表和物化视图区分对待。
原本通用的COMMENT ON TABLE语句,对物化视图不再有效,必须要使用COMMENT ON MATERIALIZED VIEW语句代替。

SQL> SELECT * FROM V$VERSION;
BANNER
----------------------------------------------------------------
Oracle DATABASE 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE 10.2.0.4.0 Production
TNS FOR Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production
SQL> CREATE TABLE T_BASE (ID NUMBER, TYPE VARCHAR2(30), NUM NUMBER);
TABLE created.
SQL> CREATE MATERIALIZED VIEW LOG ON T_BASE
2 WITH ROWID, SEQUENCE (TYPE, NUM)
3 INCLUDING NEW VALUES;
Materialized VIEW log created.
SQL> CREATE MATERIALIZED VIEW MV_BASE
2 REFRESH FAST ENABLE QUERY REWRITE AS
3 SELECT TYPE, SUM(NUM) SUM_NUM, COUNT(NUM) CNT_NUM, COUNT(*) CNT
4 FROM T_BASE
5 GROUP BY TYPE;
Materialized VIEW created.
SQL> COMMENT ON TABLE MV_BASE IS 'COMMENT ON A MATERIALIZED VIEW ';
COMMENT ON TABLE MV_BASE IS 'COMMENT ON A MATERIALIZED VIEW '
*
ERROR at line 1:
ORA-12098: cannot comment ON the materialized VIEW
 
SQL> COMMENT ON MATERIALIZED VIEW MV_BASE IS 'COMMENT ON A MATERIALIZED VIEW ';
Comment created.
SQL> COL COMMENTS FOR A60
SQL> SELECT * FROM USER_MVIEW_COMMENTS;
MVIEW_NAME                     COMMENTS
------------------------------ ------------------------------------------------------------
MV_BASE                        COMMENT ON A MATERIALIZED VIEW

其实不只是COMMENT发生了变化,关于物化视图的执行计划Oracle也对其进行细化,将物化视图的扫描和全表扫描区分开:

SQL> SET AUTOT ON EXP
SQL> SELECT COUNT(*) FROM MV_BASE;
  COUNT(*)
----------
         0
Execution Plan
----------------------------------------------------------
Plan hash VALUE: 3034976462
-------------------------------------------------------------------------
| Id  | Operation             | Name    | ROWS  | Cost (%CPU)| TIME     |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |         |     1 |     2   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE       |         |     1 |            |          |
|   2 |   MAT_VIEW ACCESS FULL| MV_BASE |     1 |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------
Note
-----
   - dynamic sampling used FOR this statement
SQL> SELECT /*+ REWRITE */ TYPE, COUNT(*) FROM T_BASE GROUP BY TYPE;
no ROWS selected
Execution Plan
----------------------------------------------------------
Plan hash VALUE: 1008429399
----------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | ROWS  | Bytes | Cost (%CPU)| TIME     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         |     1 |    30 |     2   (0)| 00:00:01 |
|   1 |  MAT_VIEW REWRITE ACCESS FULL| MV_BASE |     1 |    30 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Note
-----
   - dynamic sampling used FOR this statement

在9i以前,很难从执行计划中区分扫描的是表还是物化视图,但是现在一目了然了。
总的来说,这种改进还是很有意义的,用户可以更清楚的了解处理的对象到底是表还是物化视图。