在一个客户的数据库告警日志中,频繁出现ORA-600(kdsgrp1)错误。
详细错误信息为:
/oracle/admin/tjsb/udump/tjsb1_ora_1701138.trc Oracle DATABASE 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production WITH the Partitioning, REAL Application Clusters, DATA Mining AND REAL Application Testing options ORACLE_HOME = /oracle/product/10.2.0/db_1 System name: AIX Node name: p570c Release: 3 Version: 5 Machine: 00C971744C00 Instance name: tjsb1 Redo thread mounted BY this instance: 1 Oracle process NUMBER: 883 Unix process pid: 1701138, image: oracle@p570c *** 2011-07-26 09:59:41.111 *** ACTION NAME:(abc) 2011-07-26 09:59:41.105 *** MODULE NAME:(archive.exe) 2011-07-26 09:59:41.105 *** SERVICE NAME:(tjsb) 2011-07-26 09:59:41.105 *** SESSION ID:(2466.64103) 2011-07-26 09:59:41.105 ROW 0766ce34.a continuation at file# 29 block# 2543156 slot 11 NOT found ************************************************** KDSTABN_GET: 0 ..... ntab: 1 curSlot: 11 ..... nrows: 110 ************************************************** *** 2011-07-26 09:59:41.111 ksedmp: internal OR fatal error ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [] CURRENT SQL statement FOR this SESSION: SELECT DISTINCT KC01.AAC001 AAC001 FROM AC04, AC02, KC01 WHERE KC01.AAB001 = :B2 AND KC01.AKC021 IN ('11', '12') AND AC02.AAE140 = ' 3' AND AC02.AAC031 = '1' AND AC02.AAC001 = KC01.AAC001 AND AC04.AAC001 = KC01.AAC001 AND AC04.AAE030 <= TO_DATE(:B1 , 'yyyymm') AND (AC04.AAE031 >= TO_DATE(:B1 , 'yyyymm') OR AC04.AAE031 IS NULL) AND NVL(AC04.AKC010, 0) > 0 AND NOT EXISTS (SELECT 1 FROM AC01 WHERE AAB001 = :B2 AND NVL(AAC033, '0') = '3' AND AAC001 = KC01.AAC001) AND NOT EXISTS (SELECT 1 FROM DUAL WHERE :B3 <= (SELECT NVL(SUM(C KC010), 0) FROM KC33 WHERE AAC001 = KC01.AAC001 AND AAE001 = TO_NUMBER(SUBSTR(:B1 , 1, 4)))) UNION SELECT DISTINCT KC01.AAC001 AAC00 1 FROM AC02, KC01 WHERE KC01.AAB001 = :B2 AND KC01.AKC021 IN ('21', '22', '23') AND AC02.AAE140 = '3' AND AC02.AAC031 = '1' AND AC02 .AAC001 = KC01.AAC001 AND NOT EXISTS (SELECT 1 FROM AC01 WHERE AAB001 = :B2 AND NVL(AAC033, '0') > '0' AND AAC001 = KC01.AAC001) AND NOT EXISTS (SELECT 1 FROM KC33 WHERE AAC001 = KC01.AAC001 AND AAE002 = :B1 ) AND (NOT EXISTS (SELECT 1 FROM DUAL WHERE :B4 <= (SELE CT NVL(SUM(CKC010), 0) FROM KC33 WHERE AAC001 = KC01.AAC001 AND AAE001 = TO_NUMBER(SUBSTR(:B1 , 1, 4))) AND NVL(TO_CHAR(KC01.AIC162, 'yyyymm'),'199001') < SUBSTR(:B1 , 1, 4) || '01')) AND (NOT EXISTS (SELECT 1 FROM DUAL WHERE :B3 <= (SELECT NVL(SUM(CKC010), 0) FRO M KC33 WHERE AAC001 = KC01.AAC001 AND AAE001 = TO_NUMBER(SUBSTR(:B1 , 1, 4))) AND NVL(TO_CHAR(KC01.AIC162, 'yyyymm'),'199001') >= SU BSTR(:B1 , 1, 4) || '01')) ----- PL/SQL Call Stack ----- object line object handle NUMBER name 70000078a8a6960 8456 package body TJSI.PKG_A_JJZJ 70000078bd54dc0 1 anonymous block ----- Call Stack Trace ----- calling CALL entry argument VALUES IN hex location TYPE point (? means dubious VALUE) -------------------- -------- -------------------- ---------------------------- ksedst+001c bl ksedst1 000000000 ? 000000000 ? ksedmp+0290 bl ksedst 104522370 ? ksfdmp+0018 bl 03F2C1FC kgerinv+00dc bl _ptrgl kgeasnmierr+004c bl kgerinv 000000000 ? 000000001 ? 10D00000000 ? 000000000 ? 110648E58 ? kdsgrp+0460 bl 01F92FB8 kdsfbr+0268 bl kdsgrp 000000001 ? 1105E2988 ? 000000000 ? qertbFetchByRowID+0 bl kdsfbr FFFFFFFFFFF1FA0 ? 000000000 ? 9d0 1040DC78C ? 000000000 ? 1105FC720 ? 1101FB888 ? 1105FCF40 ? 000000000 ? rwsfcd+0054 bl _ptrgl qerflFetchOutside+0 bl 03F2BB90 09c rwsfcd+0054 bl _ptrgl qeruaFetch+013c bl 03F2BB90 qersoFetch+0110 bl 01F92FBC opifch2+141c bl 01F92FBC opifch+003c bl opifch2 000000001 ? 11062BDC0 ? FFFFFFFFFFF36D0 ? opipls+21a4 bl opifch FFFFFFFFFFF3858 ? 110173648 ? FFFFFFFFFFF3870 ? opiodr+0ae0 bl _ptrgl rpidrus+01bc bl opiodr 6686B0E7F0 ? 684B93430 ? FFFFFFFFFFF67E0 ? D86B0E750 ? skgmstack+00c8 bl _ptrgl rpidru+0088 bl skgmstack 110173648 ? FFFFFFFFFFF5830 ? 000000002 ? 000000000 ? FFFFFFFFFFF5FA8 ? rpiswu2+034c bl _ptrgl rpidrv+095c bl rpiswu2 7000007837E2BD8 ? 3A00000000 ? 10451BB80 ? 000000000 ? 110196C78 ? 3A10196C60 ? FFFFFFFFFFF6008 ? 000000000 ? psddr0+02bc bl 01F96D24 psdnal+01d0 bl psddr0 D013BC57C ? 6610573D78 ? FFFFFFFFFFF67E0 ? 3000000000 ? pevm_BFTCHC+03b4 bl _ptrgl pfrinstr_FTCHC+0138 bl pevm_BFTCHC 101479974 ? 000000000 ? 000000010 ? pfrrun_no_tool+005c bl _ptrgl pfrrun+1014 bl pfrrun_no_tool FFFFFFFFFFF6B20 ? 70000078BD54DC0 ? 3000009BB ? plsql_run+06b4 bl pfrrun 110469588 ? peicnt+0224 bl plsql_run 110469588 ? 10001102381F8 ? 000000000 ? kkxexe+0250 bl peicnt FFFFFFFFFFF7E38 ? 110469588 ? opiexe+2ef8 bl kkxexe 11064CEA0 ? kpoal8+0edc bl opiexe FFFFFFFFFFFB454 ? FFFFFFFFFFFB180 ? FFFFFFFFFFF9628 ? opiodr+0ae0 bl _ptrgl ttcpip+1020 bl _ptrgl opitsk+1124 bl 01F95258 opiino+0990 bl opitsk 0FFFFD490 ? 000000000 ? opiodr+0ae0 bl _ptrgl opidrv+0484 bl 01F947C0 sou2o+0090 bl opidrv 3C02D9855C ? 44065F000 ? FFFFFFFFFFFF390 ? opimai_real+01bc bl 01F92214 main+0098 bl opimai_real 000000000 ? 000000000 ? __start+0098 bl main 000000000 ? 000000000 ? |
查询了一下metalink,发现错误号为kdsgrp1的已知600错误有20多个之多。当前的版本是10204,而这些已知bug又有80%都可能在这个版本上发生。不过仔细看一下错误的说明,这些错误有几分相似之处,多半都和表或索引的逻辑损坏有关。
在这个trace文件中,有明确的标识损坏之处:
ROW 0766ce34.a continuation at file# 29 block# 2543156 slot 11 NOT found |
显然这里出现了一个逻辑坏块,那么其实不用再去关心当前的问题属于哪个已知的bug,只要解决了这个坏块,就解决了这个ORA-600的错误。
有了FILE和BLOCK的序号,可以通过DUMP的手段,也可以查询DBA_EXTENTS视图,来确认坏块的内容。
如果坏块属于索引,那么可以通过删除索引然后重建的方式,很容易的解决问题。
如果坏块属于表,那么坏块中的部分行信息可能丢失。如果找不到出现错误之前的备份,可以通过DBMS_REPAIR包进行修复,或者设置alter session set events ‘10231 trace name context forever, level 10’,然后导出或CREATE TABLE AS SELECT将坏块意外的数据进行重建。
可以参考metalink文档ID 468883.1,获取更详细的信息。