以前同时记录两个ORA-600错误,多半是由于这个两个错误在同时,是同一次故障的不同表现,而这次两个错误则是分别出现。
客户的10.2.0.4的逻辑STANDBY备库上前后几次出现了这两个错误:
Thu Jun 16 13:45:05 2011 Errors IN file /u01/app/oracle/admin/db/bdump/db_pmon_27660.trc: ORA-07445: exception encountered: core dump [kggchk()+77] [SIGSEGV] [Address NOT mapped TO object] [0x000000000] [] [] Thu Jun 16 13:45:13 2011 CKPT: terminating instance due TO error 472 Instance TERMINATED BY CKPT, pid = 27670 . . . Sat Jun 25 01:44:02 2011 Errors IN file /u01/app/oracle/admin/db/bdump/db_pmon_18907.trc: ORA-00600: internal error code, arguments: [kcbshlc_1], [5], [], [], [], [], [], [] Sat Jun 25 01:44:04 2011 Errors IN file /u01/app/oracle/admin/db/bdump/db_pmon_18907.trc: ORA-00600: internal error code, arguments: [kcbshlc_1], [5], [], [], [], [], [], [] Sat Jun 25 01:44:04 2011 PMON: terminating instance due TO error 472 Sat Jun 25 01:44:04 2011 krvxerpt: Errors detected IN process 20, ROLE reader. Sat Jun 25 01:44:04 2011 krvxmrs: Leaving BY exception: 472 Sat Jun 25 01:44:04 2011 Errors IN file /u01/app/oracle/admin/db/bdump/db_p000_19090.trc: ORA-00472: PMON process TERMINATED WITH error LOGSTDBY STATUS: ORA-00472: PMON process TERMINATED WITH error . . . Mon Oct 31 23:23:03 2011 Errors IN file /u01/app/oracle/admin/db/bdump/db_pmon_20147.trc: ORA-07445: exception encountered: core dump [kggchk()+77] [SIGSEGV] [Address NOT mapped TO object] [0x000000000] [] [] Mon Oct 31 23:23:07 2011 CJQ0: terminating instance due TO error 472 Mon Oct 31 23:23:07 2011 krvxerpt: Errors detected IN process 20, ROLE reader. Mon Oct 31 23:23:07 2011 krvxmrs: Leaving BY exception: 472 Mon Oct 31 23:23:07 2011 Errors IN file /u01/app/oracle/admin/db/bdump/db_p000_20224.trc: ORA-00472: PMON process TERMINATED WITH error LOGSTDBY STATUS: ORA-00472: PMON process TERMINATED WITH error Mon Oct 31 23:23:07 2011 Errors IN file /u01/app/oracle/admin/db/bdump/db_psp0_20149.trc: ORA-00472: PMON process TERMINATED WITH error |
之所以将两个错误合在一起是有原因的,一方面无论是ORA-600(kcbshlc_1)错误,还是ORA-7445(kggchk)错误,错误都出现在PMON进程上,而且都直接导致了数据库的崩溃;其二,逻辑STANDBY的应用一般都是只读应用,一般来说出错概率最大的都是应用进程,而这两个错误在这方面的表相是一样的,虽然都导致了数据库崩溃,但是数据库重启之后,错误并不会马上重现,日志的应用可以顺利的执行,这说明错误和日志应用没有必然的因果关系;其三,也是最重要的一点,在ORA-7445的详细trace中,在kggchk函数之前出现的就是kcbshlc函数:
*** 2011-10-31 23:23:03.108 ksedmp: internal OR fatal error ORA-07445: exception encountered: core dump [kggchk()+77] [SIGSEGV] [Address NOT mapped TO object] [0x000000000] [] [] ----- Call Stack Trace ----- calling CALL entry argument VALUES IN hex location TYPE point (? means dubious VALUE) -------------------- -------- -------------------- ---------------------------- ksedst()+31 CALL ksedst1() 000000000 ? 000000001 ? 2A97172D50 ? 2A97172DB0 ? 2A97172CF0 ? 000000000 ? ksedmp()+610 CALL ksedst() 000000000 ? 000000001 ? 2A97172D50 ? 2A97172DB0 ? 2A97172CF0 ? 000000000 ? ssexhd()+629 CALL ksedmp() 000000003 ? 000000001 ? 2A97172D50 ? 2A97172DB0 ? 2A97172CF0 ? 000000000 ? __funlockfile()+64 CALL ssexhd() 00000000B ? 2A97173D70 ? 2A97173C40 ? 2A97172DB0 ? 2A97172CF0 ? 000000000 ? kggchk()+77 signal __funlockfile() 0066876E0 ? 000000000 ? 000000018 ? 0010F4468 ? 000000000 ? 0052EBEA0 ? kcbshlc()+105 CALL kggchk() 0066876E0 ? 000000000 ? 000000018 ? 0010F4468 ? 000000000 ? 0052EBEA0 ? kslilcr()+770 CALL kcbshlc() 0066876E0 ? 84EC40698 ? 000000018 ? 0010F4468 ? 000000000 ? 0052EBEA0 ? ksl_cleanup()+1567 CALL kslilcr() 0010F4468 ? 000000000 ? 000000000 ? 84EC40698 ? 0066876E0 ? 0052EBEA0 ? ksuxfl()+492 CALL ksl_cleanup() 000000000 ? 000000000 ? 000000000 ? 84EC40698 ? 0066876E0 ? 0052EBEA0 ? ksuxda()+55 CALL ksuxfl() 85F3A6168 ? 000000000 ? 000000000 ? 84EC40698 ? 0066876E0 ? 0052EBEA0 ? ksucln()+1390 CALL ksuxda() 85F3A6168 ? 000000000 ? 000000000 ? 84EC40698 ? 0066876E0 ? 0052EBEA0 ? ksbrdp()+794 CALL ksucln() 060008100 ? 000000000 ? 043FC1A0B ? 84EC40698 ? 0066876E0 ? 0052EBEA0 ? opirip()+616 CALL ksbrdp() 060008100 ? 000000000 ? 000000001 ? 060008100 ? 0066876E0 ? 0052EBEA0 ? opidrv()+582 CALL opirip() 000000032 ? 000000004 ? 7FBFFFF738 ? 060008100 ? 0066876E0 ? 0052EBEA0 ? sou2o()+114 CALL opidrv() 000000032 ? 000000004 ? 7FBFFFF738 ? 060008100 ? 0066876E0 ? 0052EBEA0 ? opimai_real()+317 CALL sou2o() 7FBFFFF710 ? 000000032 ? 000000004 ? 7FBFFFF738 ? 0066876E0 ? 0052EBEA0 ? main()+116 CALL opimai_real() 000000003 ? 7FBFFFF7A0 ? 000000004 ? 7FBFFFF738 ? 0066876E0 ? 0052EBEA0 ? __libc_start_main() CALL main() 000000003 ? 7FBFFFF7A0 ? +219 000000004 ? 7FBFFFF738 ? 0066876E0 ? 0052EBEA0 ? _start()+42 CALL __libc_start_main() 000713988 ? 000000001 ? 7FBFFFF8E8 ? 005288D00 ? 000000000 ? 000000003 ? --------------------- Binary Stack Dump --------------------- |
根据上面三点进行判断,这两个错误应该是同一个BUG引发的,根据MOS查询ORA-600 [kcbshlc_1] [ID 1274837.1]文档记录的信息最为接近,要解决这个问题可以通过将数据库版本升级到10.2.0.4.3或10.2.0.5。