在线索引重建被取消导致大量ORA-600(kghstack_underflow_internal_3)错误

客户在比较繁忙的时刻执行了索引的REBUILD操作,导致大量会话被锁,最终对ONLINE REBUILD操作执行了取消操作,引发了一系列更严重的ORA-600错误。
登录数据库后,发现两个节点上ORACLE_BASE所在目录都已经100%占用。数据库无法正常通过/ AS SYSDBA方式登录。
查询告警日志发现大量的ORA-600和ORA-7445错误:

Tue May 08 21:20:26 EAT 2012
Errors IN file /oracle/app/admin/orcl/udump/orcl2_ora_1555.trc:
ORA-07445: exception encountered: core dump [0000000000000000] [SIGSEGV] [Invalid permissions FOR mapped object] [0x000000000] [] []
Tue May 08 21:20:27 EAT 2012
Errors IN file /oracle/app/admin/orcl/udump/orcl2_ora_3891.trc:
ORA-00600: internal error code, arguments: [kghstack_underflow_internal_3], [0x60000000003002F0], [keybuf], [], [], [], [], []
Tue May 08 21:20:27 EAT 2012
Errors IN file /oracle/app/admin/orcl/udump/orcl2_ora_26190.trc:
ORA-07445: exception encountered: core dump [0000000000000000] [SIGSEGV] [Invalid permissions FOR mapped object] [0x000000000] [] []
Tue May 08 21:20:27 EAT 2012
Errors IN file /oracle/app/admin/orcl/udump/orcl2_ora_2873.trc:
ORA-07445: exception encountered: core dump [0000000000000000] [SIGSEGV] [Invalid permissions FOR mapped object] [0x000000000] [] []
Tue May 08 21:20:27 EAT 2012
Errors IN file /oracle/app/admin/orcl/udump/orcl2_ora_4518.trc:
ORA-00600: internal error code, arguments: [kghstack_underflow_internal_3], [0x60000000003002F0], [keybuf], [], [], [], [], []
Tue May 08 21:20:27 EAT 2012
Errors IN file /oracle/app/admin/orcl/udump/orcl2_ora_22469.trc:
ORA-00600: internal error code, arguments: [kghstack_underflow_internal_3], [0x60000000003002F0], [keybuf], [], [], [], [], []
Tue May 08 21:20:28 EAT 2012
Errors IN file /oracle/app/admin/orcl/udump/orcl2_ora_26440.trc:
ORA-07445: exception encountered: core dump [0000000000000000] [SIGSEGV] [Invalid permissions FOR mapped object] [0x000000000] [] []
Tue May 08 21:20:29 EAT 2012
Errors IN file /oracle/app/admin/orcl/udump/orcl2_ora_762.trc:
ORA-07445: exception encountered: core dump [0000000000000000] [SIGSEGV] [Invalid permissions FOR mapped object] [0x000000000] [] []
Tue May 08 21:20:29 EAT 2012
Errors IN file /oracle/app/admin/orcl/udump/orcl2_ora_26106.trc:
ORA-07445: exception encountered: core dump [0000000000000000] [SIGSEGV] [Invalid permissions FOR mapped object] [0x000000000] [] []
Tue May 08 21:20:30 EAT 2012
Errors IN file /oracle/app/admin/orcl/udump/orcl2_ora_1597.trc:
ORA-07445: 出现异常错误: 核心转储 [0000000000000000] [SIGSEGV] [Invalid permissions FOR mapped object] [0x000000000] [] []
Tue May 08 21:20:30 EAT 2012
Errors IN file /oracle/app/admin/orcl/udump/orcl2_ora_856.trc:
ORA-07445: exception encountered: core dump [0000000000000000] [SIGSEGV] [Invalid permissions FOR mapped object] [0x000000000] [] []
.
.
.

检查对应的TRACE文件,发现导致错误语句执行的是TABLE_A表的插入:

ksedmp: internal OR fatal error
ORA-00600: 内部错误代码, 参数: [kghstack_underflow_internal_3], [0x60000000003002F0], [keybuf], [], [], [], [], []
CURRENT SQL statement FOR this SESSION:
INSERT INTO TABLE_A (O_ID, P_ID, S_ID, F_ITEM, F_NAME, T_MON, D_MON, I_MON, P_STATE, P_TIME, R1, R2) VALUES (:B10 , SUBSTR(:B10 , LENGTH(:B10 ) - 1, 2), :B9 , :B8 , :B7 , :B6 , :B5 , :B4 , :B3 , NULL, :B2 , :B1 )
----- PL/SQL Call Stack -----
  object      line  object
  handle    NUMBER  name
c00000203c2cc550       119  package body U1.P_O_I
c00000203b788200       288  PROCEDURE U1.U_B_O_I
c00000203719b8d0         1  anonymous block
----- Call Stack Trace -----
calling              CALL     entry                argument VALUES IN hex      
location             TYPE     point                (? means dubious VALUE)     
-------------------- -------- -------------------- ----------------------------
ksedst()+64          CALL     ksedst1()            000000000 ? 000000001 ?
ksedmp()+2176        CALL     ksedst()             000000000 ?
                                                   C000000000000D20 ?
                                                   4000000004037940 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ?
ksfdmp()+112         CALL     ksedmp()             000000003 ?
                                                   9FFFFFFFFFFF1230 ?
                                                   60000000000BA290 ?
                                                   9FFFFFFFFFFF1800 ?
                                                   C000000000000999 ?
                                                   400000000407F9B0 ?
kgerinv()+304        CALL     ksfdmp()             9FFFFFFFFFFF1D90 ?
                                                   000000003 ?
                                                   9FFFFFFFFFFF1810 ?
                                                   60000000000BA290 ?
                                                   C000000000000612 ?
                                                   40000000098C38B0 ?
kgeasnmierr()+144    CALL     kgerinv()            60000000000318D0 ?
                                                   4000000001AD98A0 ?
                                                   6000000000032988 ?
                                                   4000000001AD98A0 ?
                                                   9FFFFFFFFFFF1DD0 ?
$cold_kghstack_unde  CALL     kgeasnmierr()        60000000000318D0 ?
rflow_internal()+36                                9FFFFFFFBF3B1168 ?
8                                                  9FFFFFFFBF3B1178 ?
                                                   6000000000032D00 ?
                                                   000000002 ?
                                                   60000000003002F0 ?
                                                   000000001 ? 000000006 ?
kghstack_free()+208  CALL     $cold_kghstack_unde  60000000000318D0 ?
                              rflow_internal()     60000000003002F0 ?
                                                   60000000000BA290 ?
                                                   C000000000000B1D ?
                                                   4000000002F7A510 ?
                                                   00002C87B ?
                                                   6000000000031A10 ?
ksmfrs()+48          CALL     kghstack_free()      60000000000318D0 ?
                                                   60000000003002F0 ?
rpiswu2()+1312       CALL     ksmfrs()             60000000003002F0 ?
                                                   C000000000001026 ?
                                                   4000000002F78960 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ?
rpidrv()+2352        CALL     rpiswu2()            9FFFFFFFFFFF2AF0 ?
                                                   4000000002F7AE60 ?
                                                   00002F833 ?
                                                   60000000000A7D20 ?
                                                   9FFFFFFFFFFF1E20 ?
                                                   C000000000001ABD ?
                                                   4000000000F14558 ?
                                                   60000000000C6CF0 ?
psddr0()+864         CALL     rpidrv()             000000018 ? 000000066 ?
                                                   9FFFFFFFFFFF3700 ?
                                                   000000038 ?
                                                   9FFFFFFFFFFF2B20 ?
                                                   60000000000BA290 ?
psdnal()+736         CALL     psddr0()             000000018 ? 000000066 ?
                                                   9FFFFFFFFFFF3700 ?
                                                   000000030 ?
pevm_EXECC()+832     CALL     psdnal()             9FFFFFFFFFFF54D0 ?
                                                   C00000203489A9F8 ?
                                                   C0000000000011AA ?
                                                   9FFFFFFFBE832840 ?
                                                   C00000203C2CC550 ?
                                                   400000000313C770 ?
                                                   000026035 ?
pfrinstr_EXECC()+16  CALL     pevm_EXECC()         9FFFFFFFBE83D1D0 ?
0                                                  9FFFFFFFBE8328B0 ?
                                                   000000020 ?
pfrrun_no_tool()+19  CALL     pfrinstr_EXECC()     9FFFFFFFBE83D1D0 ?
2                                                  C000001DA198D61C ?
                                                   9FFFFFFFBE83D238 ?
pfrrun()+1376        CALL     pfrrun_no_tool()     9FFFFFFFBE83D1D0 ?
                                                   000002001 ?
                                                   9FFFFFFFBE83D238 ?
                                                   60000000000BA290 ?
                                                   C000000000000A1C ?
                                                   4000000003163040 ?
                                                   9FFFFFFFBE83D620 ?
                                                   9FFFFFFFBE83D298 ?
plsql_run()+1328     CALL     pfrrun()             9FFFFFFFFFFF3788 ?
                                                   9FFFFFFFFFFF3770 ?
                                                   60000000000BA290 ?
                                                   9FFFFFFFFFFF4370 ?
                                                   9FFFFFFFFFFF4370 ?
                                                   C000000000000E23 ?
                                                   4000000002C34D70 ?
peicnt()+560         CALL     plsql_run()          9FFFFFFFFFFF4380 ?
                                                   000000001 ? 000000000 ?
                                                   9FFFFFFFFFFF4380 ?
                                                   60000000000BA290 ?
                                                   9FFFFFFFFFFF4E90 ?
kkxexe()+1008        CALL     peicnt()             9FFFFFFFFFFF54D0 ?
                                                   9FFFFFFFBE83D1D0 ?
                                                   9FFFFFFFFFFF4EA0 ?
                                                   60000000000BA290 ?
                                                   9FFFFFFFFFFF5420 ?
                                                   C000000000000A1C ?
                                                   000000020 ?
                                                   9FFFFFFFFFFF4F00 ?
opiexe()+8016        CALL     kkxexe()             0000051F0 ?
.
.
.

这个SQL语句是正常的业务操作,而这种简单的INSERT都会导致ORA-600和ORA-7445错误,说明插入的表或索引本身存在问题。
检查表和索引的状态,发现索引部分分区状态异常:

CREATE INDEX "U1"."IDX_F_2" 
ON "U1"."TABLE_A" ("S_ID")
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(
BUFFER_POOL DEFAULT) LOCAL
(PARTITION "P1"
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TBS7" ,
PARTITION "P2"
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TBS8" ,
PARTITION "P3"
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TBS9" ,
PARTITION "P4"
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TBS10" ,
PARTITION "P5"
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TBS1" ,
PARTITION "P6"
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TBS2" ,
PARTITION "P7"
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TBS3" ,
PARTITION "P8"
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TBS4" ,
PARTITION "P9"
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TBS5" ,
PARTITION "P10"
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TBS6" )
ALTER INDEX "U1"."IDX_F_2" MODIFY PARTITION "P6" UNUSABLE
ALTER INDEX "U1"."IDX_F_2" MODIFY PARTITION "P7" UNUSABLE

和客户沟通后,确认今天问题发生之前有人对索引进行维护操作,操作语句包括:

ALTER INDEX u1.IDX_F_2 rebuild partition p6 parallel
ALTER INDEX U1.IDX_T_2 rebuild partition P9 nologging parallel online
ALTER INDEX U1.IDX_T_1 rebuild partition P3 nologging parallel online
ALTER INDEX U1.IDX_F_2 rebuild partition P1 nologging parallel online
ALTER INDEX U1.IDX_T_4 rebuild partition P1 nologging parallel online
ALTER INDEX U1.IDX_F_4 rebuild partition P5 nologging parallel online

在对索引进行维护之后,数据库中出现了大量的enq: TM – contention等待,TM锁出现的原和是索引IDX_F_2的PARTITION P6重建语句没有添加ONLINE参数有直接关系。随后大量会话被阻塞,而整个索引重建操作被人为中止,其中包括正在运行的ONLINE REBUILD操作,而ONLINE REBUILD操作被中止是十分危险的,很容易导致bug的产生,而当前就导致了ORA-600和ORA-7445的产生。
由于客户着急解决问题,因此对数据库进行了重启。重启后,ORA-600错误信息不再出现。但是前台应用报错,索引失效。

SELECT 'ALTER INDEX U1.' || INDEX_NAME || ' REBUILD PARTITION ' || PARTITION_NAME || ';'
FROM DBA_IND_PARTITIONS
WHERE INDEX_OWNER = 'U1'
AND STATUS != 'USABLE';

获取所有失效的索引重建语句,将索引重新编译后,数据库恢复正常。
在Oracle的MOS上没有找到与之最相关的bug信息,反而是找到了一个相关性很高的补丁信息,而对应的bug信息并没有公开:补丁程序13737888: ONLINE DDL:ORA-600[KGHSTACK_UNDERFLOW_INTERNAL_3], [0X2B7F4E1E7678], [KEYBUF]。

This entry was posted in ORACLE and tagged , , , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *