ORA-7445(qeshIHInsertPRowInBkt)错误

客户数据库告警日志出现了这个错误信息。
错误信息如下:

Fri Mar 18 09:52:26 2011
 Errors IN file /oracle/db/admin/B1EDB/udump/b1edb_ora_20076.trc:
 ORA-07445: exception encountered: core dump [qeshIHInsertPRowInBkt()+48] [SIGSEGV] [Address NOT mapped TO object] [0x000B2B7B0] [] []

对应的TRACE文件信息为:

/oracle/db/admin/B1EDB/udump/b1edb_ora_20076.trc
 Oracle DATABASE 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
 WITH the Partitioning, OLAP, DATA Mining AND REAL Application Testing options
 ORACLE_HOME = /oracle/db
 System name: HP-UX
 Node name: EDB
 Release: B.11.23
 Version: U
 Machine: ia64
 Instance name: B1EDB
 Redo thread mounted BY this instance: 1
 Oracle process NUMBER: 36
 Unix process pid: 20076, image: oracle@EDB
 
*** 2011-03-18 09:52:26.338
 *** ACTION NAME:() 2011-03-18 09:52:26.261
 *** MODULE NAME:(busobj.exe) 2011-03-18 09:52:26.261
 *** SERVICE NAME:(B1EDB) 2011-03-18 09:52:26.261
 *** SESSION ID:(501.23160) 2011-03-18 09:52:26.261
 Exception signal: 11 (SIGSEGV), code: 1 (Address NOT mapped TO object), addr: 0xb2b7b0, PC: [0x4000000002d51390, qeshIHInsertPRowInBkt()+48]
   r1: 60000000000b5e18       r20:                1       br5:                0
   r2: c000000040e04000       r21: 9fffffffab91bd50       br6: c000000000369bd0
   r3:             f04b       r22: 9fffffffbbc4a01f       br7: c000000000218e60
   r4:                0       r23:             1fd6        ip: 4000000002d51390
   r5: c000000000000408       r24: 9fffffffbf251110      iipa:                0
   r6: c000000000043810       r25:                8       cfm:             4b1e
   r7: 9fffffffbf7f8de8       r26:                7        um:               1a
   r8: 9fffffffab91bd69       r27: 9fffffffab91bd50       rsc:               1f
   r9:           1fffff       r28:                4       bsp: 9fffffffbf800600
  r10:             1fd2       r29: 9fffffffbf250100  bspstore: 9fffffffbf800600
  r11: 9fffffffab91bd88       r30:             1fdc      rnat: c0000002d4fd7b88
  r12: 9fffffffffff5e20       r31: 9fffffffab91bd00       ccv:                0
  r13: 9fffffffbf455430      NaTs:                0      unat:                0
  r14: 9fffffffab91bd88       PRs:             c84f      fpsr:    9804c8e76433f
  r15:             1f9f       br0: 4000000002cb5b70       pfs: c00000000000038e
  r16: 9fffffffbbc4a023       br1: c000000000218e60        lc:                0
  r17: 9fffffffab91bd50       br2:                0        ec:                0
  r18: 9fffffffbbc4a056       br3:                0       isr: 9fffffffbf800600
  r19:             1fd2       br4:                0       ifa:                0
 Reason code: 0008
 *** 2011-03-18 09:52:26.393
 ksedmp: internal OR fatal error
 ORA-07445: exception encountered: core dump [qeshIHInsertPRowInBkt()+48] [SIGSEGV] [Address NOT mapped TO object] [0x000B2B7B0] [] []
 CURRENT SQL statement FOR this SESSION:
 SELECT    
  EDS_LOT_RUN.EQPID, 
  SUM((to_date(EDS_LOT_RUN.TRACK_OUT_TIME, 'YYYYMMDD HH24MISS')  - to_date(EDS_LOT_RUN.TRACK_IN_TIME ,'YYYYMMDD HH24MISS')) * 24*60), 
  substr(EDS_LOT_EVENT_HIST.EVENT_TIME,1,8) ||' '|| substr(EDS_LOT_EVENT_HIST.EVENT_TIME,9,6)
 FROM 
  EDS_LOT_RUN, 
  EDS_LOT_EVENT_HIST, 
  EDS_LOT 
WHERE 
  ( EDS_LOT.LINE=EDS_LOT_EVENT_HIST.LINE AND 
EDS_LOT.FACTORY = EDS_LOT_EVENT_HIST.FACTORY AND 
EDS_LOT.LOTID= EDS_LOT_EVENT_HIST.LOTID  ) 
  AND  ( EDS_LOT.LINE=EDS_LOT_RUN.LINE AND  EDS_LOT.LOTID=EDS_LOT_RUN.LOTID  ) 
  AND  ( EDS_LOT_RUN.EVENT = 'TRACKOUT'  ) 
GROUP BY 
  EDS_LOT_RUN.EQPID,  
  substr(EDS_LOT_EVENT_HIST.EVENT_TIME,1,8) ||' '|| substr(EDS_LOT_EVENT_HIST.EVENT_TIME,9,6) 
----- Call Stack Trace -----
 calling              CALL     entry                argument VALUES IN hex      
location             TYPE     point                (? means dubious VALUE)     
-------------------- -------- -------------------- ----------------------------
 ksedst()+64          CALL     ksedst1()            000000001 ? 000000001 ?
 ksedmp()+2176        CALL     ksedst()             000000001 ?
                                                    C000000000000C9F ?
                                                    400000000404CC40 ?
                                                    000000000 ? 000000000 ?
                                                    000000000 ?
 ssexhd()+1264        CALL     ksedmp()             000000003 ?
                                                    9FFFFFFFFFFECBA0 ?
                                                    60000000000B5E18 ?
                                                    9FFFFFFFFFFED170 ?
                                                    C000000000000B9F ?
                                                    400000000696A380 ?
                                                    00000B6CD ?
                                                    60000000000C2ED0 ?
 <kernel>             CALL     ssexhd()             C00000033F4BF750 ?
                                                    60000000000C4DC0 ?
                                                    C000000040E0B7C0 ?
                                                    60000000000B5E18 ?
 qeshIHInsertPRowInB  CALL     <kernel>             9FFFFFFFFFFF1000 ?
 kt()+48                                            000000000 ? 000000000 ?
                                                    000000000 ? 000000000 ?
                                                    000000000 ? 000000000 ?
                                                    000000000 ?
 qeshIHInsertPRow()+  CALL     qeshIHInsertPRowInB  9FFFFFFFBF331430 ?
 80                            kt()                 9FFFFFFFBF256438 ?
                                                    05C1656F6 ? 0001656F6 ?
                                                    000B2B7B0 ?
                                                    9FFFFFFFAB91BD68 ?
                                                    000000000 ?
 qeshHDGrow()+1200    CALL     qeshIHInsertPRow()   9FFFFFFFBF331430 ?
                                                    9FFFFFFFBF256438 ?
                                                    05C1656F6 ?
                                                    9FFFFFFFAB91BD68 ?
                                                    000000000 ?
                                                    C00000000000091A ?
                                                    4000000005DEF540 ?
                                                    000000000 ?
 $cold_qeshPTInsertU  CALL     qeshHDGrow()         9FFFFFFFBF3307B8 ?
 RowForGBY()+256                                    9FFFFFFFBF251518 ?
                                                    9FFFFFFFFFFF5E48 ?
                                                    9FFFFFFFFFFF5E50 ?
                                                    60000000000B5E18 ?
                                                    000000000 ?
                                                    C000000000000B1D ?
 qeshLoadRowForGBY()  CALL     $cold_qeshPTInsertU  9FFFFFFFFFFF5FE0 ?
 +544                          RowForGBY()          9FFFFFFFFFFF6040 ?
                                                    9FFFFFFFBF251518 ?
                                                    9FFFFFFFFFFF5FE0 ?
                                                    9FFFFFFFBF3307B8 ?
                                                    9FFFFFFFBF251548 ?
                                                    000030E3C ?
                                                    9FFFFFFF881971E0 ?
 qerghRowP()+992      CALL     qeshLoadRowForGBY()  9FFFFFFFBF3307B8 ?
                                                    9FFFFFFFBF3CB860 ?
                                                    000000005 ?
                                                    60000000000B5E18 ?
                                                    C000000000000EA5 ?
                                                    4000000002EEBE20 ?
                                                    9FFFFFFFFFFF6040 ?
                                                    9FFFFFFFBF330850 ?
 qerhjWalkHashBucket  CALL     qerghRowP()          C0000002D4071220 ?
 2()+1632                                           000007FFF ?
                                                    60000000000B5E18 ?
                                                    C000000000001026 ?
                                                    4000000007E45220 ?
                                                    00000814B ?
 qerhjFetchPhase2()+  CALL     qerhjWalkHashBucket  C0000002D40714F8 ?
 1536                          2()                  4000000001C184B0 ?
                                                    C0000002D4071220 ?
                                                    000007FFF ?
                                                    9FFFFFFFBF3CB490 ?
                                                    9FFFFFFFFFFF6180 ?
                                                    000000000 ?
 $cold_qerhjFetch()+  CALL     qerhjFetchPhase2()   C0000002D40714F8 ?
 4080                                               4000000001C184B0 ?
                                                    C0000002D4071220 ?
                                                    000007FFF ?
                                                    60000000000A5B28 ?
                                                    C000000000000F24 ?
 qerghFetch()+528     CALL     $cold_qerhjFetch()   C0000002D40714F8 ?
                                                    4000000001C184B0 ?
                                                    C0000002D4071220 ?
                                                    000007FFF ?
                                                    60000000000B5E18 ?
 opifch2()+9632       CALL     qerghFetch()         C0000002D4071220 ?
                                                    4000000001C47AD0 ?
                                                    9FFFFFFFFFFF6550 ?
                                                    0000000C9 ?
                                                    60000000000B5E18 ?
                                                    C000000000001F46 ?
                                                    4000000002EB3640 ?
                                                    000018375 ?
 kpoal8()+6240        CALL     opifch2()            9FFFFFFFFFFF7510 ?
                                                    4000000002AE5700 ?
                                                    00001E80B ?
                                                    9FFFFFFFFFFF63C0 ?
                                                    60000000000B5E18 ?
                                                    C0000000000015B3 ?
                                                    60000000000314F0 ?
                                                    040002D91 ?
 opiodr()+2128        CALL     kpoal8()             9FFFFFFFFFFF7C40 ?
                                                    C000000000001530 ?
                                                    000000000 ?
                                                    9FFFFFFFFFFF7560 ?
                                                    60000000000B5E18 ?
                                                    9FFFFFFFBF3CCE40 ?
 ttcpip()+3088        CALL     opiodr()             00000005E ? 000000017 ?
                                                    4000000001B5F640 ?
                                                    0000046B0 ?
                                                    9FFFFFFFFFFF7C50 ?
                                                    60000000000B5E18 ?
                                                    9FFFFFFFFFFF9DC0 ?
                                                    C000000000001430 ?
 opitsk()+2336        CALL     ttcpip()             600000000003D0F0 ?
                                                    9FFFFFFFFFFFA278 ?
                                                    9FFFFFFFFFFFA320 ?
                                                    0000001AF ?
                                                    9FFFFFFFFFFFA490 ?
                                                    9FFFFFFFFFFFA284 ?
                                                    4000000001C45750 ?
                                                    000000000 ?
 opiino()+1840        CALL     opitsk()             000000000 ? 000000000 ?
                                                    60000000000B5E18 ?
                                                    4000000002851570 ?
                                                    0000180CD ?
                                                    4000000001B5F658 ?
 opiodr()+2128        CALL     opiino()             00000003C ?
                                                    9FFFFFFFFFFFCCE0 ?
                                                    9FFFFFFFFFFFF480 ?
                                                    9FFFFFFFFFFFC1A0 ?
                                                    60000000000B5E18 ?
                                                    C000000000001530 ?
 opidrv()+1088        CALL     opiodr()             00000003C ? 000000004 ?
                                                    4000000001B5F0F0 ?
                                                    0000046B0 ?
                                                    9FFFFFFFFFFFCCF0 ?
                                                    60000000000B5E18 ?
 sou2o()+336          CALL     opidrv()             00000003C ?
                                                    9FFFFFFFFFFFF480 ?
                                                    60000000000C2790 ?
 opimai_real()+224    CALL     sou2o()              9FFFFFFFFFFFF4A0 ?
                                                    00000003C ? 000000004 ?
                                                    9FFFFFFFFFFFF480 ?
 main()+368           CALL     opimai_real()        000000000 ?
                                                    9FFFFFFFFFFFF4D0 ?
 main_opd_entry()+80  CALL     main()               000000002 ?
                                                    9FFFFFFFFFFFF978 ?
                                                    60000000000B5E18 ?
                                                    C000000000000004 ?
 .
 .
 .
 ============
 Plan TABLE
 ============
 ------------------------------------------+--------------------------------+---------------+
 | Id| Operation               | Name              |ROWS |Bytes|Cost |TIME    |Pstart|Pstop |
 ------------------------------------------+--------------------------------+---------------+
 | 0 | SELECT STATEMENT        |                   |     |     | 922K|        |      |      |
 | 1 |  HASH GROUP BY          |                   |  25M|2876M| 922K|03:09:50|      |      |
 | 2 |   HASH JOIN             |                   |  25M|2876M| 259K|00:53:59|      |      |
 | 3 |    HASH JOIN            |                   |2734K| 216M|  61K|00:13:32|      |      |
 | 4 |     PARTITION RANGE ALL |                   | 940K|  19M| 6167|00:01:15 | 1    | 16   |
 | 5 |      TABLE ACCESS FULL  | EDS_LOT           | 940K|  19M| 6167|00:01:15 | 1    | 16   |
 | 6 |     PARTITION RANGE ALL |                   |2734K| 160M|  44K|00:09:05 | 1    | 16   |
 | 7 |      TABLE ACCESS FULL  | EDS_LOT_RUN       |2734K| 160M|  44K|00:09:05 | 1    | 16   |
 | 8 |    PARTITION RANGE ALL  |                   |  17M| 624M| 145K|00:30:41 | 1    | 16   |
 | 9 |     TABLE ACCESS FULL   | EDS_LOT_EVENT_HIST|  17M| 624M| 145K|00:30:41 | 1    | 16   |
 -------------------------------------------+-------------------------------+---------------+

根据TRACE信息和执行计划分析,这个问题是MOS文档ID 6818207.8描述的Bug 6818207 – Intermittent Wrong Results/Dump (qeshIHInsertPRowInBkt) using hash GROUP BY of large dataset。

从执行计划不难看出,HASH GROUP BY要处理的数据量很大,预计运行时间也很长,所有因素都有bug相关描述相符,这个bug在11.2.0.2中被FIXED,在当前版本中,可以通过设置”_gby_hash_aggregation_enabled”隐患函数为FALSE,来禁止HASH GROUP BY执行计划,从而避免错误的产生。

如果错误发生频率很低,也可以通过优化SQL的方式解决或者简单忽略之。

This entry was posted in BUG and tagged , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *