客户数据库告警日志出现了这个错误信息。
错误信息如下:
Fri Mar 18 09:52:26 2011 Errors IN file /oracle/db/admin/B1EDB/udump/b1edb_ora_20076.trc: ORA-07445: exception encountered: core dump [qeshIHInsertPRowInBkt()+48] [SIGSEGV] [Address NOT mapped TO object] [0x000B2B7B0] [] [] |
对应的TRACE文件信息为:
/oracle/db/admin/B1EDB/udump/b1edb_ora_20076.trc Oracle DATABASE 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production WITH the Partitioning, OLAP, DATA Mining AND REAL Application Testing options ORACLE_HOME = /oracle/db System name: HP-UX Node name: EDB Release: B.11.23 Version: U Machine: ia64 Instance name: B1EDB Redo thread mounted BY this instance: 1 Oracle process NUMBER: 36 Unix process pid: 20076, image: oracle@EDB *** 2011-03-18 09:52:26.338 *** ACTION NAME:() 2011-03-18 09:52:26.261 *** MODULE NAME:(busobj.exe) 2011-03-18 09:52:26.261 *** SERVICE NAME:(B1EDB) 2011-03-18 09:52:26.261 *** SESSION ID:(501.23160) 2011-03-18 09:52:26.261 Exception signal: 11 (SIGSEGV), code: 1 (Address NOT mapped TO object), addr: 0xb2b7b0, PC: [0x4000000002d51390, qeshIHInsertPRowInBkt()+48] r1: 60000000000b5e18 r20: 1 br5: 0 r2: c000000040e04000 r21: 9fffffffab91bd50 br6: c000000000369bd0 r3: f04b r22: 9fffffffbbc4a01f br7: c000000000218e60 r4: 0 r23: 1fd6 ip: 4000000002d51390 r5: c000000000000408 r24: 9fffffffbf251110 iipa: 0 r6: c000000000043810 r25: 8 cfm: 4b1e r7: 9fffffffbf7f8de8 r26: 7 um: 1a r8: 9fffffffab91bd69 r27: 9fffffffab91bd50 rsc: 1f r9: 1fffff r28: 4 bsp: 9fffffffbf800600 r10: 1fd2 r29: 9fffffffbf250100 bspstore: 9fffffffbf800600 r11: 9fffffffab91bd88 r30: 1fdc rnat: c0000002d4fd7b88 r12: 9fffffffffff5e20 r31: 9fffffffab91bd00 ccv: 0 r13: 9fffffffbf455430 NaTs: 0 unat: 0 r14: 9fffffffab91bd88 PRs: c84f fpsr: 9804c8e76433f r15: 1f9f br0: 4000000002cb5b70 pfs: c00000000000038e r16: 9fffffffbbc4a023 br1: c000000000218e60 lc: 0 r17: 9fffffffab91bd50 br2: 0 ec: 0 r18: 9fffffffbbc4a056 br3: 0 isr: 9fffffffbf800600 r19: 1fd2 br4: 0 ifa: 0 Reason code: 0008 *** 2011-03-18 09:52:26.393 ksedmp: internal OR fatal error ORA-07445: exception encountered: core dump [qeshIHInsertPRowInBkt()+48] [SIGSEGV] [Address NOT mapped TO object] [0x000B2B7B0] [] [] CURRENT SQL statement FOR this SESSION: SELECT EDS_LOT_RUN.EQPID, SUM((to_date(EDS_LOT_RUN.TRACK_OUT_TIME, 'YYYYMMDD HH24MISS') - to_date(EDS_LOT_RUN.TRACK_IN_TIME ,'YYYYMMDD HH24MISS')) * 24*60), substr(EDS_LOT_EVENT_HIST.EVENT_TIME,1,8) ||' '|| substr(EDS_LOT_EVENT_HIST.EVENT_TIME,9,6) FROM EDS_LOT_RUN, EDS_LOT_EVENT_HIST, EDS_LOT WHERE ( EDS_LOT.LINE=EDS_LOT_EVENT_HIST.LINE AND EDS_LOT.FACTORY = EDS_LOT_EVENT_HIST.FACTORY AND EDS_LOT.LOTID= EDS_LOT_EVENT_HIST.LOTID ) AND ( EDS_LOT.LINE=EDS_LOT_RUN.LINE AND EDS_LOT.LOTID=EDS_LOT_RUN.LOTID ) AND ( EDS_LOT_RUN.EVENT = 'TRACKOUT' ) GROUP BY EDS_LOT_RUN.EQPID, substr(EDS_LOT_EVENT_HIST.EVENT_TIME,1,8) ||' '|| substr(EDS_LOT_EVENT_HIST.EVENT_TIME,9,6) ----- Call Stack Trace ----- calling CALL entry argument VALUES IN hex location TYPE point (? means dubious VALUE) -------------------- -------- -------------------- ---------------------------- ksedst()+64 CALL ksedst1() 000000001 ? 000000001 ? ksedmp()+2176 CALL ksedst() 000000001 ? C000000000000C9F ? 400000000404CC40 ? 000000000 ? 000000000 ? 000000000 ? ssexhd()+1264 CALL ksedmp() 000000003 ? 9FFFFFFFFFFECBA0 ? 60000000000B5E18 ? 9FFFFFFFFFFED170 ? C000000000000B9F ? 400000000696A380 ? 00000B6CD ? 60000000000C2ED0 ? <kernel> CALL ssexhd() C00000033F4BF750 ? 60000000000C4DC0 ? C000000040E0B7C0 ? 60000000000B5E18 ? qeshIHInsertPRowInB CALL <kernel> 9FFFFFFFFFFF1000 ? kt()+48 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? qeshIHInsertPRow()+ CALL qeshIHInsertPRowInB 9FFFFFFFBF331430 ? 80 kt() 9FFFFFFFBF256438 ? 05C1656F6 ? 0001656F6 ? 000B2B7B0 ? 9FFFFFFFAB91BD68 ? 000000000 ? qeshHDGrow()+1200 CALL qeshIHInsertPRow() 9FFFFFFFBF331430 ? 9FFFFFFFBF256438 ? 05C1656F6 ? 9FFFFFFFAB91BD68 ? 000000000 ? C00000000000091A ? 4000000005DEF540 ? 000000000 ? $cold_qeshPTInsertU CALL qeshHDGrow() 9FFFFFFFBF3307B8 ? RowForGBY()+256 9FFFFFFFBF251518 ? 9FFFFFFFFFFF5E48 ? 9FFFFFFFFFFF5E50 ? 60000000000B5E18 ? 000000000 ? C000000000000B1D ? qeshLoadRowForGBY() CALL $cold_qeshPTInsertU 9FFFFFFFFFFF5FE0 ? +544 RowForGBY() 9FFFFFFFFFFF6040 ? 9FFFFFFFBF251518 ? 9FFFFFFFFFFF5FE0 ? 9FFFFFFFBF3307B8 ? 9FFFFFFFBF251548 ? 000030E3C ? 9FFFFFFF881971E0 ? qerghRowP()+992 CALL qeshLoadRowForGBY() 9FFFFFFFBF3307B8 ? 9FFFFFFFBF3CB860 ? 000000005 ? 60000000000B5E18 ? C000000000000EA5 ? 4000000002EEBE20 ? 9FFFFFFFFFFF6040 ? 9FFFFFFFBF330850 ? qerhjWalkHashBucket CALL qerghRowP() C0000002D4071220 ? 2()+1632 000007FFF ? 60000000000B5E18 ? C000000000001026 ? 4000000007E45220 ? 00000814B ? qerhjFetchPhase2()+ CALL qerhjWalkHashBucket C0000002D40714F8 ? 1536 2() 4000000001C184B0 ? C0000002D4071220 ? 000007FFF ? 9FFFFFFFBF3CB490 ? 9FFFFFFFFFFF6180 ? 000000000 ? $cold_qerhjFetch()+ CALL qerhjFetchPhase2() C0000002D40714F8 ? 4080 4000000001C184B0 ? C0000002D4071220 ? 000007FFF ? 60000000000A5B28 ? C000000000000F24 ? qerghFetch()+528 CALL $cold_qerhjFetch() C0000002D40714F8 ? 4000000001C184B0 ? C0000002D4071220 ? 000007FFF ? 60000000000B5E18 ? opifch2()+9632 CALL qerghFetch() C0000002D4071220 ? 4000000001C47AD0 ? 9FFFFFFFFFFF6550 ? 0000000C9 ? 60000000000B5E18 ? C000000000001F46 ? 4000000002EB3640 ? 000018375 ? kpoal8()+6240 CALL opifch2() 9FFFFFFFFFFF7510 ? 4000000002AE5700 ? 00001E80B ? 9FFFFFFFFFFF63C0 ? 60000000000B5E18 ? C0000000000015B3 ? 60000000000314F0 ? 040002D91 ? opiodr()+2128 CALL kpoal8() 9FFFFFFFFFFF7C40 ? C000000000001530 ? 000000000 ? 9FFFFFFFFFFF7560 ? 60000000000B5E18 ? 9FFFFFFFBF3CCE40 ? ttcpip()+3088 CALL opiodr() 00000005E ? 000000017 ? 4000000001B5F640 ? 0000046B0 ? 9FFFFFFFFFFF7C50 ? 60000000000B5E18 ? 9FFFFFFFFFFF9DC0 ? C000000000001430 ? opitsk()+2336 CALL ttcpip() 600000000003D0F0 ? 9FFFFFFFFFFFA278 ? 9FFFFFFFFFFFA320 ? 0000001AF ? 9FFFFFFFFFFFA490 ? 9FFFFFFFFFFFA284 ? 4000000001C45750 ? 000000000 ? opiino()+1840 CALL opitsk() 000000000 ? 000000000 ? 60000000000B5E18 ? 4000000002851570 ? 0000180CD ? 4000000001B5F658 ? opiodr()+2128 CALL opiino() 00000003C ? 9FFFFFFFFFFFCCE0 ? 9FFFFFFFFFFFF480 ? 9FFFFFFFFFFFC1A0 ? 60000000000B5E18 ? C000000000001530 ? opidrv()+1088 CALL opiodr() 00000003C ? 000000004 ? 4000000001B5F0F0 ? 0000046B0 ? 9FFFFFFFFFFFCCF0 ? 60000000000B5E18 ? sou2o()+336 CALL opidrv() 00000003C ? 9FFFFFFFFFFFF480 ? 60000000000C2790 ? opimai_real()+224 CALL sou2o() 9FFFFFFFFFFFF4A0 ? 00000003C ? 000000004 ? 9FFFFFFFFFFFF480 ? main()+368 CALL opimai_real() 000000000 ? 9FFFFFFFFFFFF4D0 ? main_opd_entry()+80 CALL main() 000000002 ? 9FFFFFFFFFFFF978 ? 60000000000B5E18 ? C000000000000004 ? . . . ============ Plan TABLE ============ ------------------------------------------+--------------------------------+---------------+ | Id| Operation | Name |ROWS |Bytes|Cost |TIME |Pstart|Pstop | ------------------------------------------+--------------------------------+---------------+ | 0 | SELECT STATEMENT | | | | 922K| | | | | 1 | HASH GROUP BY | | 25M|2876M| 922K|03:09:50| | | | 2 | HASH JOIN | | 25M|2876M| 259K|00:53:59| | | | 3 | HASH JOIN | |2734K| 216M| 61K|00:13:32| | | | 4 | PARTITION RANGE ALL | | 940K| 19M| 6167|00:01:15 | 1 | 16 | | 5 | TABLE ACCESS FULL | EDS_LOT | 940K| 19M| 6167|00:01:15 | 1 | 16 | | 6 | PARTITION RANGE ALL | |2734K| 160M| 44K|00:09:05 | 1 | 16 | | 7 | TABLE ACCESS FULL | EDS_LOT_RUN |2734K| 160M| 44K|00:09:05 | 1 | 16 | | 8 | PARTITION RANGE ALL | | 17M| 624M| 145K|00:30:41 | 1 | 16 | | 9 | TABLE ACCESS FULL | EDS_LOT_EVENT_HIST| 17M| 624M| 145K|00:30:41 | 1 | 16 | -------------------------------------------+-------------------------------+---------------+ |
根据TRACE信息和执行计划分析,这个问题是MOS文档ID 6818207.8描述的Bug 6818207 – Intermittent Wrong Results/Dump (qeshIHInsertPRowInBkt) using hash GROUP BY of large dataset。
从执行计划不难看出,HASH GROUP BY要处理的数据量很大,预计运行时间也很长,所有因素都有bug相关描述相符,这个bug在11.2.0.2中被FIXED,在当前版本中,可以通过设置”_gby_hash_aggregation_enabled”隐患函数为FALSE,来禁止HASH GROUP BY执行计划,从而避免错误的产生。
如果错误发生频率很低,也可以通过优化SQL的方式解决或者简单忽略之。