ORA-600(2037)错误

最近已经碰到多起客户数据库无法打开的情况,这就是其中一次。
这是一个10201 for Windows 64bit数据库,在一次掉电后,数据库无法启动,在后台告警日志中出现下列的错误:

Mon Oct 31 00:00:28 2011
ALTER DATABASE OPEN
Mon Oct 31 00:00:28 2011
Beginning crash recovery OF 1 threads
parallel recovery started WITH 16 processes
Mon Oct 31 00:00:28 2011
Started redo scan
Mon Oct 31 00:00:28 2011
Completed redo scan
12587 redo blocks READ, 161 DATA blocks need recovery
Mon Oct 31 00:00:29 2011
Started redo application at
Thread 1: logseq 661, block 34225
Mon Oct 31 00:00:29 2011
Recovery OF Online Redo Log: Thread 1 GROUP 3 Seq 661 Reading mem 0
Mem# 0 errs 0: D:\ORACLE\PRODUCT\10.2.0\ORADATA\YCBG\REDO03.LOG
Mon Oct 31 00:00:29 2011
Completed redo application
Mon Oct 31 00:00:29 2011
Errors IN file e:\oracle\product\10.2.0\admin\ycbg\bdump\ycbg_p002_2196.trc:
ORA-00600: internal error code, arguments: [2037], [21598604], [2441871360], [0], [0], [0], [3729260873], [536936448]
Mon Oct 31 00:00:29 2011
Errors IN file e:\oracle\product\10.2.0\admin\ycbg\bdump\ycbg_p006_2212.trc:
ORA-00600: internal error code, arguments: [2037], [4198106], [249167872], [229], [159], [0], [1456275520], [100740646]
.
.
.
Mon Oct 31 00:00:30 2011
Errors IN file e:\oracle\product\10.2.0\admin\ycbg\bdump\ycbg_p002_2196.trc:
ORA-07445: exception encountered: core dump [ACCESS_VIOLATION] [kcbs_dump_adv_state+1127] [PC:0x60D4E1] [ADDR:0xFFFFFFFFFFFFFFFF] [UNABLE_TO_READ] []
ORA-00600: internal error code, arguments: [2037], [21598604], [2441871360], [0], [0], [0], [3729260873], [536936448]
.
.
.
Mon Oct 31 00:00:31 2011
Errors IN file e:\oracle\product\10.2.0\admin\ycbg\bdump\ycbg_p012_2236.trc:
ORA-07445: exception encountered: core dump [ACCESS_VIOLATION] [ksl_cleanup+1220] [PC:0x49DA9C] [ADDR:0xFFFFFFFFFFFFFFFF] [UNABLE_TO_READ] []
ORA-00081: address range [0x179B63600000, 0x179B63600004) IS NOT readable
ORA-00600: internal error code, arguments: [2037], [21649715], [1496514560], [228], [10], [0], [3855679818], [100820580]
Mon Oct 31 00:00:31 2011
Errors IN file e:\oracle\product\10.2.0\admin\ycbg\bdump\ycbg_p007_2216.trc:
ORA-07445: exception encountered: core dump [ACCESS_VIOLATION] [kslwte_tm+754] [PC:0x49FC16] [ADDR:0x82FD2228AB8] [UNABLE_TO_WRITE] []
.
.
.
Mon Oct 31 00:00:35 2011
Errors IN file e:\oracle\product\10.2.0\admin\ycbg\udump\ycbg_ora_2256.trc:
ORA-00600: internal error code, arguments: [ksuapc1], [0x110000078], [0x7FFD63F4E90], [131072], [0x7FFD62F4E98], [], [], []
.
.
.
Corrupt block relative dba: 0x014895b9 (file 5, block 562617)
Bad header found during crash/instance recovery
DATA IN bad block:
TYPE: 91 format: 5 rdba: 0x95b90000
LAST CHANGE scn: 0x0229.e7210148 seq: 0x0 flg: 0x00
spare1: 0x6 spare2: 0xa2 spare3: 0x8b1d
consistency VALUE IN tail: 0x06010000
CHECK VALUE IN block header: 0x401
block checksum disabled
.
.
.
Mon Oct 31 00:01:23 2011
Errors IN file e:\oracle\product\10.2.0\admin\ycbg\bdump\ycbg_ora_2164.trc:
ORA-07445: exception encountered: core dump [ACCESS_VIOLATION] [ksliwat+3150] [PC:0x4A1D92] [ADDR:0xFFFFFFFFFFFFFFFF] [UNABLE_TO_READ] []
.
.
.
Mon Oct 31 00:01:26 2011
Errors IN file e:\oracle\product\10.2.0\admin\ycbg\bdump\ycbg_reco_2148.trc:
ORA-07445: exception encountered: core dump [ACCESS_VIOLATION] [kslwte_tm+754] [PC:0x49FC16] [ADDR:0x82FD2228AB8] [UNABLE_TO_WRITE] []
.
.
.
Mon Oct 31 00:09:13 2011
USER: terminating instance due TO error 472
Mon Oct 31 00:09:13 2011
Errors IN file e:\oracle\product\10.2.0\admin\ycbg\udump\ycbg_ora_2772.trc:
ORA-07445: exception encountered: core dump [ACCESS_VIOLATION] [ksuitm+1655] [PC:0x417C41] 
.
.
.
Mon Oct 31 00:41:26 2011
Errors IN file e:\oracle\product\10.2.0\admin\ycbg\bdump\ycbg_ora_2080.trc:
ORA-00600: internal error code, arguments: [kmcpsched:Bad caller], [], [], [], [], [], [], []
.
.
.
Mon Oct 31 04:58:05 2011
Errors IN file e:\oracle\product\10.2.0\admin\ycbg\bdump\ycbg_p007_2960.trc:
ORA-07445: exception encountered: core dump [ACCESS_VIOLATION] [kksOnErrorMutexCleanup+53] [PC:0x4A7ED5] [ADDR:0xFFFFFFFFFFFFFFFF] [UNABLE_TO_READ] []
ORA-00081: address range [0x163A63600000, 0x163A63600004) IS NOT readable
ORA-00600: internal error code, arguments: [ksfdrmms1], [0x1111B5080], [], [], [], [], [], []
.
.
.
Mon Oct 31 05:04:57 2011
Errors IN file e:\oracle\product\10.2.0\admin\ycbg\bdump\ycbg_p004_3920.trc:
ORA-00600: internal error code, arguments: [ksfdchkfob1], [0x1111B8138], [0x1112694580000], [], [], [], [], []
.
.
.
Mon Oct 31 05:04:58 2011
Errors IN file e:\oracle\product\10.2.0\admin\ycbg\bdump\ycbg_mmon_2384.trc:
ORA-00600: internal error code, arguments: [KSSRMP2], [0x1112697F8], [0], [0], [], [], [], []
ORA-27041: unable TO OPEN file
OSD-04001: 逻辑块大小无效 (OS 33554432)
.
.
.
Mon Oct 31 05:15:08 2011
Errors IN file e:\oracle\product\10.2.0\admin\ycbg\bdump\ycbg_p013_2484.trc:
ORA-07445: exception encountered: core dump [ACCESS_VIOLATION] [kksOnErrorMutexCleanup+53] [PC:0x4A7ED5] [ADDR:0xFFFFFFFFFFFFFFFF] [UNABLE_TO_READ] []
ORA-00081: address range [0x17F663600000, 0x17F663600004) IS NOT readable
ORA-00600: internal error code, arguments: [545], [0x110563848], [0], [16], [], [], [], []

这里只是记录了不重复错误,每个错误都会出现很多次,简直是ORA-600和ORA-7445错误的大集合。由于一个错误,一次性引发这么多的不同ORA-600和ORA-7445,我还是第一次碰到。
虽然错误信息很多,但是后面的大部分错误,除了个别几个在MOS中没有记录外,一些错误与不正常的启动有关,一些错误与实例崩溃有关,另一些是与文件损坏有关。也就是说,这些错误的产生只是现象,而问题的主要原因还要从第一个错误上入手。

*** 2011-10-31 00:00:29.259
ksedmp: internal OR fatal error
ORA-00600: internal error code, arguments: [2037], [21598604], [2441871360], [0], [0], [0], [3729260873], [536936448]
----- Call Stack Trace -----
calling              CALL     entry                argument VALUES IN hex      
location             TYPE     point                (? means dubious VALUE)     
-------------------- -------- -------------------- ----------------------------
ksedst+55            CALL???  ksedst1+573          970A00000000 00001D688
                                                   014BBDE10 00001E5F0
ksedmp+663           CALL???  ksedst+55            003A750A8 000000000 014BBC888
                                                   000000000
ksfdmp+19            CALL???  ksedmp+663           000000003 013EE4650 013A78A88
                                                   003A975A0
kgeriv+184           CALL???  ksfdmp+19            00001E5F0 013EE4010 000AEBD23
                                                   27F00001FA0
kgesiv+102           CALL???  kgeriv+184           000000000 000000000 013EE4010
                                                   000000000
ksesic7+125          CALL???  kgesiv+102           7FFD2001EC0 232FBC020
                                                   400000000 000000000
kcoexam+249          CALL???  ksesic7+125          1000007F5 000000000 00149918C
                                                   000000000
kcbtema+647          CALL???  kcoexam+249          01B7C5FA0 10F776000 000000001
                                                   000000007
kcrpap+356           CALL???  kcbtema+647          1A1FBB198 7FF9EFC0B88
                                                   7FF9EFC0604 013A7A558
kcrpdv+1521          CALL???  kcrpap+356           003E5B9A8 00000000B 000000000
                                                   000000000
kxfprdp+1360         CALL???  kcrpdv+1521          000000001 003A7AC60 000000000
                                                   000000000
opirip+1250          CALL???  kxfprdp+1360         30315C740000001E 003A782E8
                                                   014BBFA70 000000000
opidrv+860           CALL???  opirip+1250          000000032 000000004 014BBFDA0
                                                   000000000
sou2o+52             CALL???  opidrv+860           000000032 000000004 014BBFDA0
                                                   0068679D2
opimai_real+272      CALL???  sou2o+52             000000000 0139C0000 000000828
                                                   000000000
opimai+96            CALL???  opimai_real+272      000000000 000000000 000000000
                                                   000000000
BackgroundThreadSta  CALL???  opimai+96            014BBFEF0 000000001 000000000
rt+530                                             000000000
0000000078D3B6DA     CALL???  BackgroundThreadSta  0068675C0 000000000 000000000
                              rt+530               014BBFFA8
--------------------- Binary Stack Dump ---------------------

查询MOS后发现,文档During Startup (Open Database) Alert Log Shows ORA-600[2037] and ORA-7445[kcbs_dump_adv_state] [ID 551993.1]和当前描述的现象最为接近:不但2037错误和kcbs_dump_adv_state错误信息一致,数据库版本相符,连文档中提到的kcoexam kcbtema kcrpap kcrpdv kxfprdp堆栈信息在TRACE中也可以一点不差的找到。
如果在处理分布式事务的时候使用了IMU(In Memory Undo)技术,那么如果这个时候数据库出现了崩溃,那么很不幸,在Oracle重启的时候尝试实例恢复的过程中,就会导致REDO或UNDO的损坏,从而导致数据库无法打开。而当前数据库的报错集中发生在PNNN进程上,这说明Oracle在启用并行恢复,而错误信息中还可以看到RECO进程,说明Oracle在处理分布式事务的恢复,所有的现象都很好的符合了bug的描述。
这个Bug 4899479影响10.2.0.1到10.2.0.3版本,在10.2.0.4中被FIXED。而解决这个问题的常规手段只有通过备份进行及时点不完全恢复。

This entry was posted in BUG and tagged , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *