RAC节点频繁重启出现ORA-29702

数据库的Oracle 10204 RAC for Windows出现频繁节点重启的问题。

从告警日志看,当前节点的重启一般发生在节点刚启动或关闭时:

Thu May 03 17:22:45 2012
cluster interconnect IPC version:Oracle 9i Winsock2 TCP/IP IPC
IPC Vendor 0 proto 0
Version 0.0
PMON started WITH pid=2, OS id=1616
DIAG started WITH pid=3, OS id=120
PSP0 started WITH pid=4, OS id=6104
LMON started WITH pid=5, OS id=3844
LMD0 started WITH pid=6, OS id=6120
LMS0 started WITH pid=7, OS id=3548
LMS1 started WITH pid=8, OS id=5688
LMS2 started WITH pid=9, OS id=3636
LMS3 started WITH pid=10, OS id=3588
MMAN started WITH pid=11, OS id=3168
DBW0 started WITH pid=12, OS id=3208
DBW1 started WITH pid=13, OS id=5784
LGWR started WITH pid=14, OS id=6208
CKPT started WITH pid=15, OS id=3100
SMON started WITH pid=16, OS id=5948
RECO started WITH pid=17, OS id=3748
CJQ0 started WITH pid=18, OS id=7152
MMON started WITH pid=19, OS id=4552
MMNL started WITH pid=20, OS id=6940
Thu May 03 17:22:46 2012
lmon registered WITH NM - instance id 1 (internal mem no 0)
Thu May 03 17:22:46 2012
Reconfiguration started (OLD inc 0, NEW inc 8)
List OF nodes:
0 1
Global Resource Directory frozen
* allocate DOMAIN 0, invalid = TRUE 
Communication channels reestablished
Error: KGXGN aborts the instance (6)
Thu May 03 17:22:51 2012
Errors IN file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl1_lmon_3844.trc:
ORA-29702: ???????????
LMON: terminating instance due TO error 29702
Thu May 03 17:22:51 2012
Errors IN file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl1_pmon_1616.trc:
ORA-29702: ???????????
Thu May 03 17:22:51 2012
Errors IN file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl1_psp0_6104.trc:
ORA-29702: ???????????
Thu May 03 17:22:51 2012
Errors IN file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl1_dbw0_3208.trc:
ORA-29702: ???????????
Thu May 03 17:22:51 2012
Errors IN file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl1_mman_3168.trc:
ORA-29702: ???????????
Thu May 03 17:22:51 2012
Errors IN file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl1_dbw1_5784.trc:
ORA-29702: ???????????
Thu May 03 17:22:51 2012
Errors IN file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl1_ckpt_3100.trc:
ORA-29702: ???????????
Thu May 03 17:22:51 2012
Errors IN file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl1_lgwr_6208.trc:
ORA-29702: ???????????
Thu May 03 17:22:52 2012
Errors IN file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl1_reco_3748.trc:
ORA-29702: ???????????
Thu May 03 17:22:52 2012
Errors IN file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl1_smon_5948.trc:
ORA-29702: ???????????
Thu May 03 17:22:52 2012
Errors IN file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl1_lms1_5688.trc:
ORA-29702: ???????????
Thu May 03 17:22:52 2012
Errors IN file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl1_lms0_3548.trc:
ORA-29702: ???????????
Instance TERMINATED BY LMON, pid = 3844

而从CSSD日志文件中可以发现下面的信息:

[ CSSD]2012-04-29 16:26:07.953 [7112] >TRACE: clssgmReconfigThread: completed FOR reconfig(13), WITH STATUS(1)
2012-04-30 09:07:04.718: [ OCROSD]utgdv:11:could NOT READ reg VALUE ocrmirrorconfig_loc os error= 操作系统找不到已输入的环境选项。
2012-04-30 09:07:04.718: [ OCROSD]utgdv:11:could NOT READ reg VALUE ocrmirrorconfig_loc os error= 操作系统找不到已输入的环境选项。
[ CSSD]2012-04-30 09:07:04.765 >USER: Copyright 2012, Oracle version 10.2.0.4.0
[ CSSD]2012-04-30 09:07:04.765 >USER: CSS daemon log FOR node crct-oadb, NUMBER 1, IN cluster crs
[ CSSD]2012-04-30 09:07:04.765 [3780] >TRACE: clssscmain: local-ONLY SET TO FALSE
[ clsdmt]Listening TO (ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=61180))
[ CSSD]2012-04-30 09:07:04.781 [3780] >TRACE: clssnmReadNodeInfo: added node 1 (crct-oadb) TO cluster
[ CSSD]2012-04-30 09:07:04.781 [3780] >TRACE: clssnmReadNodeInfo: added node 2 (crct-oapt) TO cluster
[ CSSD]2012-04-30 09:07:04.828 [3724] >TRACE: clssnm_skgxninit: Compatible vendor clusterware NOT IN USE
[ CSSD]2012-04-30 09:07:04.828 [3724] >TRACE: clssnm_skgxnmon: skgxn init failed
[ CSSD]2012-04-30 09:07:04.843 [3780] >TRACE: clssnmNMInitialize: misscount SET TO (60)
[ CSSD]2012-04-30 09:07:04.843 [3780] >TRACE: clssnmNMInitialize: Network heartbeat thresholds are: impending reconfig 30000 ms, reconfig START (misscount) 60000 ms
[ CSSD]2012-04-30 09:07:04.843 [3780] >TRACE: clssnmDiskStateChange: state FROM 1 TO 2 disk (0/\\.\votedsk1)
[ CSSD]2012-04-30 09:07:04.843 [3112] >TRACE: clssnmvDPT: spawned FOR disk 0 (\\.\votedsk1)
[ CSSD]2012-04-30 09:07:06.843 [3112] >TRACE: clssnmDiskStateChange: state FROM 2 TO 4 disk (0/\\.\votedsk1)
[ CSSD]2012-04-30 09:07:06.843 [4492] >TRACE: clssnmvKillBlockThread: spawned FOR disk 0 (\\.\votedsk1) initial sleep INTERVAL (1000)ms

根据这些信息查询,发现属于10.2.0.4上的bug:10gR2/11gR1: Instances Abort With ORA-29702 When The Server is rebooted or shut down [ID 752399.1]。这个bug影响10.2.0.1到10.2.0.4以及11.1.0.6和11.1.0.7版本。
Oracle给出的解决方案是修改操作系统启动时调用的K96 link替换为K19 link。不过当前版本是Windows环境,显然这种解决方法并不适用。恐怕除了升级版本外,没有什么太好的其他解决方法。
将产品环境部署在Windows环境下的系统确实少见,而在Windows上部署RAC的就更是凤毛麟角了,而大多数这样部署的不只是对于Oracle不了解,连Windows和Linux的稳定性的差别都不是很清楚,出现各种问题的几率自然要大得多了。

Posted in BUG | Tagged , , , , , | Leave a comment

sqlplus直连数据库出现ORA-27504错误

客户数据库使用sqlplus直连方式连接数据库报错,而如果使用tnsnames方式则可以正常连接。
详细错误信息为:

Thu Apr 26 10:17:56 2012
Errors IN file /oracle/admin/trs/udump/trs2_ora_2619.trc:
ORA-00603: ORACLE server SESSION TERMINATED BY fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:IPC init failed WITH STATUS: 65
ORA-27301: OS failure message: Package NOT installed
ORA-27302: failure occurred at: skgxpcini
ORA-27303: additional information: libskgxpd.so called
libskgxp10.so should reference REAL implementation.

根据MOS文档,这个问题的原因是由于环境变量中指定了CRS的目录,导致部分LIB没有找到:sqlplus Local connection to Instance is not possible , remote Using tns is fine . [ID 859778.1]。
解决问题的方法是在环境变量SHLIB_PATH和LIBPATH中,去掉CRS的HOME信息,使得Oracle正确找到ORACLE_HOME下的LIB目录。

Posted in BUG | Tagged , , , , , , , , , , , , , | Leave a comment

ORA-600(krboReadBitmap_badbitmap)错误

一个备份导致的错误。
数据库版本为10.2.0.3 RAC for HP-UX,在备份数据库的时候出现了这个ORA-600错误:

Sat Oct 29 04:39:31 2011
Errors IN file /u01/app/oracle/admin/ORCL/udump/orcl2_ora_29252.trc:
ORA-00600: internal error code, arguments: [krboReadBitmap_badbitmap], [57599987], [+DATA/orcl/datafile/ts1.ora], [6], [30], [17629205], [1024], [52822033]
Sat Oct 29 04:39:53 2011
Backup optimization FOR file +DATA/orcl/datafile/ts1.ora stopped due TO errors:
Sat Oct 29 04:39:53 2011
Errors IN file /u01/app/oracle/admin/ORCL/udump/orcl2_ora_29252.trc:
ORA-00600: internal error code, arguments: [krboReadBitmap_badbitmap], [57599987], [+DATA/orcl/datafile/ts1.ora], [6], [30], [17629205], [1024], [52822033]

这种在RMAN备份时导致的错误还真是不多见,查询MOS发现是10.2.0.3上的bug:ORA-00600[krboReadBitmap_badbitmap] During RMAN Backup [ID 456391.1]。简单的说,导致问题的原因在于10.2.0.2之后Oracle提供了备份优化的功能,对于本地管理表空间通过BITMAP来备份已经分配的BLOCK,而不必读取所有的BLOCK,但是BITMAP信息如果存在错误,就会引发这个ORA-600错误。
解决方法除了升级到10.2.0.4以外,还可以通过在BACKUP命令中添加BLOCKS ALL关键字,来避免备份优化的使用。
此外,可以根据表空间段管理方式的不同选择DBMS_SPACE_ADMIN包的ASSM_TABLESPACE_VERIFY或TABLESPACE_VERIFY过程来检查表空间的BITMAP是否存在异常。

Posted in BUG | Tagged , , , , , , , | 1 Comment

ORA-7445(_kgscReleaseCursor)和ORA-4030错误

又一个ORA-4030引起的错误。
数据库版本10.2.0.3 for Windows 64环境。错误信息为:

Tue Mar 13 20:51:28 中国标准时间 2012
Errors IN file d:\oracle\admin\orcl\udump\orcl_ora_6136.trc:
ORA-07445: 出现异常错误: 核心转储 [ACCESS_VIOLATION] [_kgscReleaseCursor+174] [PC:0x423AA92] [ADDR:0xC] [UNABLE_TO_READ] []
ORA-04030: 在尝试分配 244 字节 (kxs-heap-w,kntx.1) 时进程内存不足
Tue Mar 13 20:51:32 中国标准时间 2012
Errors IN file d:\oracle\admin\orcl\udump\orcl_ora_6136.trc:
ORA-04030: 在尝试分配 82444 字节 (pga heap,control file i/o buffer) 时进程内存不足
ORA-07445: 出现异常错误: 核心转储 [ACCESS_VIOLATION] [_kgscReleaseCursor+174] [PC:0x423AA92] [ADDR:0xC] [UNABLE_TO_READ] []
ORA-04030: 在尝试分配 244 字节 (kxs-heap-w,kntx.1) 时进程内存不足

这个ORA-7445错误非常罕见,在METALINK上都找不到任何信息,不过问题并不难判断,显然这个ORA-7445[_kgscReleaseCursor]错误是由于PGA内存不足引起的,根据错误信息Oracle在尝试分配PGA内存时由于PGA_AGGREGATE_TARGET参数设置太小导致内存分配报错,对于7445错误而言,错误发生在释放游标的函数上。
当前的PGA_AGGREGATE_TARGET只设置了300M,将其扩大,并重启数据库后,问题消失。值得注意的是,这个参数是可以动态修改的,但是动态调整后,ORA-4030和ORA-7445错误仍然出现,显然这个参数针对新创建的进程有效,而现有存在的会话仍然受修改之前的参数的限制。

Posted in BUG | Tagged , , , , , , , | Leave a comment

编译过程导致ORA-4068错误

一个10203上的bug,这种问题还是第一次碰到。
在10.2.0.3上,存在bug可能导致编译过程或视图失败,而失败的结果并不只是当前视图或过程不可用,而是可能影响整个数据字典,导致存储过程在执行时出现ORA-4068错误。
错误信息如下:

ORA-04068: existing state OF packages has been discarded.
ORA-04065: NOT executed, altered OR dropped stored PROCEDURE P_PACKAGE.P_PRO
ORA-06508: PL/SQL: could NOT find program unit being called: P_PACKAGE.P_PRO
ORA-06512: at line 1

这个问题对应的BUG信息为:Bug 6136074 – ORA-4068 / ORA-4065 ORA-6508 on VALID objects [ID 6136074.8],导致问题的原因就是编译对象时导致PLSQL的依赖对象的时间戳发生不一致,从而导致问题的产生。
Oracle文档上给出了检查问题的SQL语句:

SELECT do.obj# d_obj,do.name d_name, do.type# d_type,
po.obj# p_obj,po.name p_name,
to_char(p_timestamp,'DD-MON-YYYY HH24:MI:SS') "P_Timestamp",
to_char(po.stime ,'DD-MON-YYYY HH24:MI:SS') "STIME",
decode(sign(po.stime-p_timestamp),0,'SAME','*DIFFER*') X
FROM sys.obj$ do, sys.dependency$ d, sys.obj$ po
WHERE P_OBJ#=po.obj#(+)
AND D_OBJ#=do.obj#
AND do.status=1 /*dependent is valid*/
AND po.status=1 /*parent is valid*/
AND po.stime!=p_timestamp /*parent timestamp not match*/
ORDER BY 2,1;

通过这个脚本将获取的对象进行重新编译,可以解决这个问题,要避免问题的再次出现,应该将数据库版本升级到10.2.0.4以上。

Posted in BUG | Tagged , , , , | Leave a comment

ORA-600(kglobpn-bad-pga)错误

这个错误比较罕见,无论是GOOGLE还是METALINK都找不到任何的相关信息。
数据库版本是11.2.0.2 RAC环境,这个错误之所以这么罕见,也和11.2的版本太新,以致于很多问题都还没有暴露出来有关。
错误信息为:

2012-03-29 04:06:02.353000 +08:00
Errors IN file /oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_18355.trc (incident=144401):
ORA-00600: internal error code, arguments: [kglobpn-bad-pga], [], [], [], [], [], [], [], [], [], [], []
Incident details IN: /oracle/diag/rdbms/orcl/orcl1/incident/incdir_144401/orcl1_ora_18355_i144401.trc
2012-03-29 04:10:12.017000 +08:00
Dumping diagnostic DATA IN directory=[cdmp_20120329041012], requested BY (instance=1, osid=18355), summary=[incident=144401].
USE ADRCI OR Support Workbench TO package the incident.
See Note 411.1 at My Oracle Support FOR error AND packaging details.
2012-03-29 04:10:13.750000 +08:00
Sweep [inc][144401]: completed
Sweep [inc2][144401]: completed

详细的TRACE信息为:

*** 2012-03-29 04:06:02.385
*** SESSION ID:(106.45) 2012-03-29 04:06:02.385
*** CLIENT ID:() 2012-03-29 04:06:02.385
*** SERVICE NAME:(orcl) 2012-03-29 04:06:02.385
*** MODULE NAME:(PL/SQL Developer) 2012-03-29 04:06:02.385
*** ACTION NAME:(Main SESSION) 2012-03-29 04:06:02.385
Dump continued FROM file: /oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_18355.trc
ORA-00600: internal error code, arguments: [kglobpn-bad-pga], [], [], [], [], [], [], [], [], [], [], []
========= Dump FOR incident 144401 (ORA 600 [kglobpn-bad-pga]) ========
*** 2012-03-29 04:06:02.426
dbkedDefDump(): Starting incident DEFAULT dumps (flags=0x2, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=8m7jgqrwrtyj1) -----
CREATE OR REPLACE TRIGGER N_TRIGGER
  AFTER INSERT OR UPDATE ON T
  FOR EACH ROW
DECLARE
.
.
.
END;
----- Call Stack Trace -----
calling              CALL     entry                argument VALUES IN hex      
location             TYPE     point                (? means dubious VALUE)     
-------------------- -------- -------------------- ----------------------------
ksedst1()+96         CALL     skdstdst()           FFFFFFFF7FFEED60 ?
                                                   1006B0C80 ? 000000000 ?
                                                   00000000A ? 000000001 ?
                                                   10BD95B00 ?
ksedst()+60          CALL     ksedst1()            000000000 ? 000000001 ?
                                                   00010C212 ? 00010C000 ?
                                                   10C20A000 ? 00010C20A ?
dbkedDefDump()+2032  CALL     ksedst()             000000000 ? 10B25B000 ?
                                                   10B25B2B0 ? 10C212000 ?
                                                   00010B000 ? 00010C212 ?
dbgexPhaseII()+1800  PTR_CALL dbkedDefDump()       000000003 ? 000000002 ?
                                                   10A6EC2C8 ? 0000014B0 ?
                                                   10C20A000 ? 000000003 ?
dbgexProcessError()  CALL     dbgexPhaseII()       10C3B4650 ?
+1248                                              FFFFFFFF7BE3A578 ?
                                                   FFFFFFFF7FFF3AB8 ?
                                                   0018E0000 ? 10A6E35B8 ?
                                                   000001C00 ?
dbgePostErrorKGE()+  CALL     dbgeExecuteForError  10AE0C3FD ?
1320                          ()                   FFFFFFFFFEC0B62D ?
                                                   001050000 ?
                                                   FFFFFFFF7FFF6268 ?
                                                   001060000 ? 000000028 ?
dbkePostKGE_kgsf()+  CALL     dbgePostErrorKGE()   10C20AC90 ? 000000000 ?
44                                                 FFFFFFFF7BE3A578 ?
                                                   000000000 ? 000000258 ?
                                                   00010C000 ?
kgerinv_internal()+  CALL     kgeadse()            10C20AC90 ?
72                                                 FFFFFFFF7D1042C0 ?
                                                   000000258 ? 000002868 ?
                                                   10A6E4000 ? 00010A6E4 ?
kgerinv()+40         CALL     kgerinv_internal()   10C20AC90 ? 004EA2360 ?
                                                   10BDED4D0 ? 000000258 ?
                                                   000000000 ? 000000000 ?
kgeasnmierr()+28     CALL     kgerinv()            10C20AC90 ?
                                                   FFFFFFFF7D1042C0 ?
                                                   10BDED4D0 ? 000000000 ?
                                                   FFFFFFFF7FFF7630 ?
                                                   000001400 ?
kglobpn()+2972       CALL     kgeasnmierr()        10C20AC90 ?
                                                   FFFFFFFF7D1042C0 ?
                                                   10BDED4D0 ? 000000000 ?
                                                   0000060A6 ? 00000005B ?
kglpim()+444         CALL     kglobpn()            FFFFFFFFFFBEC700 ?
                                                   10C20AC90 ? 0000020BE ?
                                                   000000000 ? CF6AF45C8 ?
                                                   D163B01D0 ?
kktcrt()+1204        CALL     kglpim()             000001154 ?
                                                   FFFFFFFF7FFF7EC8 ?
                                                   D163B01D0 ? 000000002 ?
                                                   000000006 ? 00000004E ?
opiexe()+24660       CALL     kktcrt()             000006000 ?
                                                   FFFFFFFF7BC19328 ?
                                                   D1772AAE0 ? 10C20A950 ?
                                                   FFFFFFFF7BC190C0 ?
                                                   000000002 ?
opiosq0()+6416       CALL     opiexe()             000000004 ? 000000000 ?
                                                   FFFFFFFF7FFFA43C ?
                                                   00010A6E3 ? 10A6E3000 ?
                                                   FFFFFFFF7FFF9E10 ?
kpooprx()+232        CALL     opiosq0()            00000004A ? 00000000E ?
                                                   FFFFFFFF7FFFA600 ?
                                                   0000000A4 ? 00010A6E3 ?
                                                   10C20AC90 ?
kpoal8()+3884        CALL     kpooprx()            FFFFFFFF7FFFDF3C ?
                                                   FFFFFFFF7DFD0660 ?
                                                   0000015EC ? 0000000A4 ?
                                                   00010C000 ? 0000000A4 ?
opiodr()+1428        PTR_CALL kpoal8()             00000005E ? 00000001C ?
                                                   FFFFFFFF7FFFDF38 ?
                                                   00010C000 ? 10C20A000 ?
                                                   000001648 ?
ttcpip()+1056        PTR_CALL opiodr()             00010A795 ? 00000001C ?
                                                   103EAD460 ? 00010A400 ?
                                                   000001400 ? 10C20A000 ?
opitsk()+1528        CALL     ttcpip()             000000000 ? 10A6C7694 ?
                                                   10C20AC90 ?
                                                   FFFFFFFF7FFFDF38 ?
                                                   FFFFFFFF7FFFC980 ?
                                                   10C221848 ?
opiino()+1000        CALL     opitsk()             10A6C7694 ? 10C226C98 ?
                                                   10C221654 ? 10C21F958 ?
                                                   000000000 ? 10C20A950 ?
opiodr()+1428        PTR_CALL opiino()             00010C000 ? 10C2216D0 ?
                                                   10C2216D0 ? 000380000 ?
                                                   0000000E9 ?
                                                   FFFFFFFF7FFFF890 ?
opidrv()+1100        CALL     opiodr()             10C221000 ? 000000004 ?
                                                   1035DD740 ? 00010C000 ?
                                                   000001400 ? 10C20A000 ?
sou2o()+92           CALL     opidrv()             00000003C ? 000000004 ?
                                                   FFFFFFFF7FFFF890 ?
                                                   0001EB250 ?
                                                   FFFFFFFF7C945110 ?
                                                   FFFFFFFF7FFFFC98 ?
opimai_real()+304    CALL     sou2o()              FFFFFFFF7FFFF868 ?
                                                   00000003C ? 000000004 ?
                                                   FFFFFFFF7FFFF890 ?
                                                   00010C000 ? 00010B800 ?
ssthrdmain()+320     PTR_CALL opimai_real()        000000000 ?
                                                   FFFFFFFF7FFFFB38 ?
                                                   FFFFFFFF7F900768 ?
                                                   00010B800 ? 000000001 ?
                                                   000000002 ?
main()+308           CALL     ssthrdmain()         00010C000 ? 000000002 ?
                                                   00044D000 ? 100644B40 ?
                                                   10C230000 ? 00010C230 ?
_start()+380         CALL     main()               000000002 ?
                                                   FFFFFFFF7FFFFC48 ?
                                                   000000000 ?
                                                   FFFFFFFF7FFFFB48 ?
                                                   FFFFFFFF7FFFFC58 ?
                                                   FFFFFFFF7DB00200 ?
--------------------- Binary Stack Dump ---------------------

显然这是利用PL/SQL Developer在创建一个触发器,由于触发器的内容很长,这里将其省略了。
虽然没有MOS的佐证,但是不难判断,问题处在PL/SQL Developer在编译触发器时与Oracle的配合上。虽然不能确定问题导致的原因就是Developer,但是这个错误的产生肯定与工具有关,因为很多罕见的错误都是工具引入的,而Develper引入的ORA-7445和ORA-600错误我这个几乎不用工具的人都已经碰到很多个了,难免这个问题不是Developer与最新版的ORACLE不兼容所致。

Posted in BUG | Tagged , , , | Leave a comment

ORA-7445(ksuklms)错误

客户的11.2.0.2 RAC环境出现了这个ORA-7445[ksuklms]错误。
错误信息为:

2012-05-10 16:25:38.561000 +08:00
Exception [TYPE: SIGSEGV, Address NOT mapped TO object] [ADDR:0x163A] [PC:0x100A554A8, ksuklms()+392] [flags: 0x0, COUNT: 1]
Errors IN file /app/diag/rdbms/orcl/orcl1/trace/orcl1_ora_28387.trc (incident=449642):
ORA-07445: exception encountered: core dump [ksuklms()+392] [SIGSEGV] [ADDR:0x163A] [PC:0x100A554A8] [Address NOT mapped TO object] []
Incident details IN: /app/diag/rdbms/orcl/orcl1/incident/incdir_449642/orcl1_ora_28387_i449642.trc
USE ADRCI OR Support Workbench TO package the incident.
See Note 411.1 at My Oracle Support FOR error AND packaging details.
2012-05-10 16:25:40.465000 +08:00
Dumping diagnostic DATA IN directory=[cdmp_20120510162540], requested BY (instance=1, osid=28387), summary=[incident=449642].
2012-05-10 16:25:42.061000 +08:00
Sweep [inc][449642]: completed
Sweep [inc2][449642]: completed

详细错误信息为:

*** 2012-05-10 16:25:38.800
*** SESSION ID:(917.28879) 2012-05-10 16:25:38.800
*** CLIENT ID:() 2012-05-10 16:25:38.800
*** SERVICE NAME:(SYS$USERS) 2012-05-10 16:25:38.800
*** MODULE NAME:(sqlplus@ecsyhdb1 (TNS V1-V3)) 2012-05-10 16:25:38.800
*** ACTION NAME:() 2012-05-10 16:25:38.800
Dump continued FROM file: /app/diag/rdbms/orcl/orcl1/trace/orcl1_ora_28387.trc
ORA-07445: exception encountered: core dump [ksuklms()+392] [SIGSEGV] [ADDR:0x163A] [PC:0x100A554A8] [Address NOT mapped TO object] []
========= Dump FOR incident 449642 (ORA 7445 [ksuklms()+392]) ========
----- Beginning of Customized Incident Dump(s) -----
Exception [TYPE: SIGSEGV, Address NOT mapped TO object] [ADDR:0x163A] [PC:0x100A554A8, ksuklms()+392] [flags: 0x0, COUNT: 1]
Registers:
----------
%i0: 0xffffffff7fff72e4 %i1: 0xffffffff7fff72d6 %i2: 0xffffffff7fff72ee
%i3: 0xffffffff7fff72e4 %i4: 0x0000000000000000 %i5: 0xffffffff7a6068f8
%i6: 0xffffffff7fff6a21 %fp: 0xffffffff7fff7220 %i7: 0x0000000100a54020
%l0: 0xffffffff7fff721c %l1: 0x000000038000b840 %l2: 0x00000000000019d8
%l3: 0x0000000000001670 %l4: 0x000000000000163a %l5: 0x0000000000000000
%l6: 0xffffffff7fff72d6 %l7: 0x00000000000019e0
%o0: 0x0000000000000000 %o1: 0x0000000000008343 %o2: 0x0000000000000b4a
%o3: 0xffffffff7fff72e8 %o4: 0x000000000000001a %o5: 0x0000000000000000
%o6: 0xffffffff7fff68c1 %sp: 0xffffffff7fff70c0 %o7: 0x0000000100a5546c
%g1: 0x0000000be0d2fbd8 %g2: 0x0000000000008342 %g3: 0x0000000000008342
%g4: 0x0000000000041a10 %g5: 0x0000000000000000 %g6: 0x0000000000000000
%g7: 0xffffffff7c100200
%pc: 0x0000000100a554a8 %npc: 0x0000000100a554ac %y: 0x0000000000000000
Stack info:
----------
ss_sp: 0xffffffff7e000000 ss_size: 0x0000000002000000 ss_flags: 0
Swap entries = 1 
path=/dev/md/dsk/d1, SIZE=75500789760, free=75500789760, LENGTH=147462480
*** 2012-05-10 16:25:38.814
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x3, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=55z15wr7xssjk) -----
ALTER system KILL SESSION '33603,2890'
----- Call Stack Trace -----
calling              CALL     entry                argument VALUES IN hex      
location             TYPE     point                (? means dubious VALUE)     
-------------------- -------- -------------------- ----------------------------
ksedst1()+96         CALL     skdstdst()           FFFFFFFF7A5FD640 ?
                                                   000000000 ? 000000000 ?
                                                   00000000A ? 000000001 ?
                                                   10BD552E0 ?
ksedst()+60          CALL     ksedst1()            000000001 ? 000000001 ?
                                                   00010C1D1 ? 00010C000 ?
                                                   10C1CA000 ? 00010C1CA ?
dbkedDefDump()+2032  CALL     ksedst()             000000001 ? 10B21A000 ?
                                                   10B21AA90 ? 10C1D2000 ?
                                                   00010B000 ? 00010C1D2 ?
ssexhd()+2196        CALL     ksedmp()             000000003 ? 000000003 ?
                                                   000000B4A ? 00038000B ?
                                                   000380000 ? 000000003 ?
__sighndlr()+12      PTR_CALL ssexhd()             10C1CE000 ? BE8D30A10 ?
                                                   10B86D110 ?
                                                   FFFFFFFF7A601EF0 ?
                                                   00010C1C9 ? 0000003E8 ?
call_user_handler()  CALL     __sighndlr()         00000000B ?
+992                                               FFFFFFFF7A601EF0 ?
                                                   FFFFFFFF7A601C10 ?
                                                   1046F1AA0 ? 000000000 ?
                                                   00000000A ?
ksuklms()+392        PTR_CALL 0000000000000000     FFFFFFFF7C100200 ?
                                                   FFFFFFFF7C100200 ?
                                                   FFFFFFFF7A601C10 ?
                                                   000000009 ? 000000000 ?
                                                   000000000 ?
ksukil()+480         CALL     ksuklms()            FFFFFFFF7FFF72E4 ?
                                                   FFFFFFFF7FFF72D6 ?
                                                   FFFFFFFF7FFF72EE ?
                                                   FFFFFFFF7FFF72E4 ?
                                                   000000000 ?
                                                   FFFFFFFF7A6068F8 ?
kkyasy()+4988        CALL     ksukil()             000000000 ? 000000001 ?
                                                   AE4BE0C42 ? AE4BE0B08 ?
                                                   10529ACB0 ? 000000B4A ?
kksExecutorclmand()  CALL     kkyasy()             000000001 ?
+2244                                              FFFFFFFF7AA56F28 ?
                                                   07FFFFFFF ? 000000001 ?
                                                   000000000 ? 07FFFFFFF ?
opiexe()+13404       CALL     kksExecutorclmand()  FFFFFFFF7AA56F28 ?
                                                   00010C1E3 ? 000000004 ?
                                                   BF0F4AB10 ? 10C1CE900 ?
                                                   10C1CE4C8 ?
kpoal8()+2368        CALL     opiexe()             000000049 ? 000000003 ?
                                                   FFFFFFFF7FFFA91C ?
                                                   000000000 ? 000000000 ?
                                                   0BFFFFFFF ?
opiodr()+1428        PTR_CALL kpoal8()             00000005E ? 00000001C ?
                                                   FFFFFFFF7FFFDDD8 ?
                                                   00010C000 ? 10C1CA000 ?
                                                   000001648 ?
ttcpip()+1056        PTR_CALL opiodr()             00010A755 ? 00000001C ?
                                                   103E6CC40 ? 00010A400 ?
                                                   000001400 ? 10C1CA000 ?
opitsk()+1528        CALL     ttcpip()             000000000 ? 10A686E74 ?
                                                   10C1CA3E0 ?
                                                   FFFFFFFF7FFFDDD8 ?
                                                   FFFFFFFF7FFFC820 ?
                                                   10C1E0F98 ?
opiino()+1000        CALL     opitsk()             10A686E74 ? 10C1E63E8 ?
                                                   10C1E0DA4 ? 10C1DF0A8 ?
                                                   000000000 ? 10C1CA0A0 ?
opiodr()+1428        PTR_CALL opiino()             000002270 ? 10C1E0E20 ?
                                                   00010C000 ? 000380000 ?
                                                   0000000FC ?
                                                   FFFFFFFF7FFFF730 ?
opidrv()+1100        CALL     opiodr()             10C1E0000 ? 000000004 ?
                                                   10359CF20 ? 00010C000 ?
                                                   000001400 ? 10C1CA000 ?
sou2o()+92           CALL     opidrv()             00000003C ? 000000004 ?
                                                   FFFFFFFF7FFFF730 ?
                                                   0001EA190 ?
                                                   FFFFFFFF7AF42F10 ?
                                                   10C3D42B0 ?
opimai_real()+304    CALL     sou2o()              FFFFFFFF7FFFF708 ?
                                                   00000003C ? 000000004 ?
                                                   FFFFFFFF7FFFF730 ?
                                                   00010C000 ? 00010B800 ?
ssthrdmain()+320     PTR_CALL opimai_real()        000000000 ?
                                                   FFFFFFFF7FFFF9D8 ?
                                                   FFFFFFFF7DF3AEB8 ?
                                                   00010B800 ? 000000001 ?
                                                   000000002 ?
main()+308           CALL     ssthrdmain()         00010C000 ? 000000002 ?
                                                   00044D000 ? 100604320 ?
                                                   10C1EF000 ? 00010C1EF ?
_start()+380         CALL     main()               000000002 ?
                                                   FFFFFFFF7FFFFAE8 ?
                                                   000000000 ?
                                                   FFFFFFFF7FFFF9E8 ?
                                                   FFFFFFFF7FFFFAF8 ?
                                                   FFFFFFFF7C100200 ?
--------------------- Binary Stack Dump ---------------------

这个错误是10.2.0.4开始引入的,Oracle在10.2.0.5中已经fixed了这个问题,没想到在11.2.0.2中,这个问题仍然出现。在10.2.0.4中,在RAC环境下杀掉一个会话可能导致节点的CRASH,但是11.2中,虽然出现了同样的错误,但是数据库实例并未CRASH。该问题的描述可以参考文档:Bug 7038750 – Dump (ksuklms) / instance crash [ID 7038750.8]。
在11.2中可以简单的忽略这个问题,而10.2.0.4环境如果碰到这个错误,除了将数据库升级到10.2.0.5或10.2.0.4.1以外,还可以在初始化参数文件中添加event:‘10422 trace name context forever, level 1’来避免这个错误造成实例的CRASH。

最近碰到了10.2.0.4上的这个BUG,在检查MOS发现Oracle更新了这个错误的状态,在文档Bug 14024668 – ORA-7445 [ksuklms] from ‘alter system kill session (non-existent)’ [ID 14024668.8]中记录了11.2上的问题。

这个错误确认影响的版本为11.2.0.3,Oracle在11.2.0.4中确认FIXED了这个错误。

 

Posted in BUG | Tagged , , , , , , , | Leave a comment

ORA-600(kjbrrefp:key)和ORA-600(kjbmprlst:shadow)错误

这两个错误是由同一个BUG导致的。
数据库环境11.2.0.2 RAC for Solaris sparc,错误信息如下:

2012-01-29 06:15:10.168000 +08:00
Errors IN file /app/diag/rdbms/orcl/orcl1/trace/orcl1_lms3_81.trc (incident=384590):
ORA-00600: internal error code, arguments: [kjbrref:pkey], [332269], [202], [137064], [0], [], [], [], [], [], [], []
Incident details IN: /app/diag/rdbms/orcl/orcl1/incident/incdir_384590/orcl1_lms3_81_i384590.trc
USE ADRCI OR Support Workbench TO package the incident.
See Note 411.1 at My Oracle Support FOR error AND packaging details.
2012-01-29 06:15:11.923000 +08:00
Dumping diagnostic DATA IN directory=[cdmp_20120129061511], requested BY (instance=1, osid=81 (LMS3)), summary=[incident=384590].
Sweep [inc][384590]: completed
Sweep [inc2][384590]: completed
2012-01-29 06:15:17.289000 +08:00
Errors IN file /app/diag/rdbms/orcl/orcl1/trace/orcl1_lms3_81.trc:
ORA-00600: internal error code, arguments: [kjbrref:pkey], [332269], [202], [137064], [0], [], [], [], [], [], [], []
LMS3 (ospid: 81): terminating the instance due TO error 484
2012-01-29 06:15:20.910000 +08:00
ORA-1092 : opitsk aborting process
2012-01-29 06:15:22.384000 +08:00
.
.
.
2012-04-17 04:26:44.373000 +08:00
Errors IN file /app/diag/rdbms/orcl/orcl1/trace/orcl1_lms1_8678.trc (incident=432578):
ORA-00600: internal error code, arguments: [kjbmprlst:shadow], [], [], [], [], [], [], [], [], [], [], []
Incident details IN: /app/diag/rdbms/orcl/orcl1/incident/incdir_432578/orcl1_lms1_8678_i432578.trc
USE ADRCI OR Support Workbench TO package the incident.
See Note 411.1 at My Oracle Support FOR error AND packaging details.
2012-04-17 04:26:45.864000 +08:00
Dumping diagnostic DATA IN directory=[cdmp_20120417042645], requested BY (instance=1, osid=8678 (LMS1)), summary=[incident=432578].
Errors IN file /app/diag/rdbms/orcl/orcl1/trace/orcl1_lms1_8678.trc:
ORA-00600: internal error code, arguments: [kjbmprlst:shadow], [], [], [], [], [], [], [], [], [], [], []
2012-04-17 04:26:47.359000 +08:00
Sweep [inc][432578]: completed
Sweep [inc2][432578]: completed
2012-04-17 04:26:53.095000 +08:00
Errors IN file /app/diag/rdbms/orcl/orcl1/trace/orcl1_lms1_8678.trc:
ORA-00600: internal error code, arguments: [kjbmprlst:shadow], [], [], [], [], [], [], [], [], [], [], []
LMS1 (ospid: 8678): terminating the instance due TO error 484
2012-04-17 04:26:56.593000 +08:00
ORA-1092 : opitsk aborting process
2012-04-17 04:26:58.088000 +08:00
Instance TERMINATED BY LMS1, pid = 8678

可以看到,无论是kjbrref:pkey错误的出现还是kjbmprlst:shadow错误的出现,都直接导致了实例的CRASH。可以说这两个错误都是非常严重的问题。而且二者都发生在LMSn进程上。

*** 2012-01-29 06:15:10.194
*** SESSION ID:(1009.1) 2012-01-29 06:15:10.194
*** CLIENT ID:() 2012-01-29 06:15:10.194
*** SERVICE NAME:(SYS$BACKGROUND) 2012-01-29 06:15:10.194
*** MODULE NAME:() 2012-01-29 06:15:10.194
*** ACTION NAME:() 2012-01-29 06:15:10.194
Dump continued FROM file: /app/diag/rdbms/orcl/orcl1/trace/orcl1_lms3_81.trc
ORA-00600: internal error code, arguments: [kjbrref:pkey], [332269], [202], [137064], [0], [], [], [], [], [], [], []
========= Dump FOR incident 384590 (ORA 600 [kjbrref:pkey]) ========
----- Beginning of Customized Incident Dump(s) -----
 GCS RESOURCE 0xb92d0cfa0 hashq [0xbb35eddc8,0xc0f9b1f60] name[0x511ed.ca] pkey 136931.0
   GRANT 0xb94a7e8f8 cvt 0x0 send 0x0@1,0 WRITE 0x0,0@65536
   flag 0x2 mdrole 0x1 mode 1 scan 0.0 ROLE LOCAL
   disk: 0x0000.00000000 WRITE: 0x0000.00000000 cnt 0x0 hist 0x0
   xid 0x0000.000.00000000 sid 3 pkwait 0s rmacks 0
   refpcnt 0 weak: 0x0000.00000000
   pkey 136931.0
   hv 91 [stat 0x0, 1->1, wm 32768, RMno 0, reminc 12, dom 0]
   kjga st 0x4, step 0.35.0, cinc 18, rmno 6345, flags 0x20
   lb 16384, hb 32767, myb 16957, drmb 16957, apifrz 1
   GCS SHADOW 0xb94a7e8f8,626 resp[0xb92d0cfa0,0x511ed.ca] pkey 136931.0
     GRANT 1 cvt 0 mdrole 0x1 st 0x100 lst 0x40 GRANTQ rl LOCAL
     master 1 owner 2 sid 3 remote[0x68fde3ef0,11] hist 0x10c30086180431f
     history 0x1f.0x6.0x1.0xc.0x6.0x1.0xc.0x6.0x1.0x0.
     cflag 0x0 sender 0 flags 0x0 replay# 0 abast 0x0.x0.1 dbmap 0x0
     disk: 0x0000.00000000 WRITE request: 0x0000.00000000
     pi scn: 0x0000.00000000 sq[0xb92d0cfd0,0xb92d0cfd0]
     msgseq 0x1 updseq 0x0 reqids[11,0,0] infop 0x0 lockseq x67d9
   GCS SHADOW END
 GCS RESOURCE END
----- End of Customized Incident Dump(s) -----
*** 2012-01-29 06:15:10.261
dbkedDefDump(): Starting incident DEFAULT dumps (flags=0x2, level=3, mask=0x0)
----- SQL Statement (None) -----
CURRENT SQL information unavailable - no cursor.
----- Call Stack Trace -----
calling              CALL     entry                argument VALUES IN hex      
location             TYPE     point                (? means dubious VALUE)     
-------------------- -------- -------------------- ----------------------------
ksedst1()+96         CALL     skdstdst()           FFFFFFFF7FFF4C00 ?
                                                   100670460 ? 000000000 ?
                                                   00000000A ? 000000001 ?
                                                   10BD552E0 ?
ksedst()+60          CALL     ksedst1()            000000000 ? 000000001 ?
                                                   00010C1D1 ? 00010C000 ?
                                                   10C1CA000 ? 00010C1CA ?
dbkedDefDump()+2032  CALL     ksedst()             000000000 ? 10B21A000 ?
                                                   10B21AA90 ? 10C1D2000 ?
                                                   00010B000 ? 00010C1D2 ?
dbgexPhaseII()+1800  PTR_CALL dbkedDefDump()       000000003 ? 000000002 ?
                                                   10A6ABAA8 ? 0000014B0 ?
                                                   10C1C9000 ? 000000003 ?
dbgexExplicitEndInc  CALL     dbgexPhaseII()       10C373D30 ?
()+728                                             FFFFFFFF7A634920 ?
                                                   FFFFFFFF7FFF8FDC ?
                                                   0018E0001 ? 10A6A2D98 ?
                                                   000001C00 ?
dbgeEndDDEInvocatio  CALL     dbgexExplicitEndInc  10A6A2C50 ?
nImpl()+704                   ()                   FFFFFFFF7A634920 ?
                                                   FFFFFFFF7FFF8F28 ?
                                                   FFFFFFFF7FFFC620 ?
                                                   000000000 ?
                                                   FFFFFFFFFE4E26A0 ?
kjbrref()+1496       CALL     dbgeEndDDEInvocatio  10C373D30 ? 001B1D800 ?
                              n()                  FFFFFFFFFEC0AF31 ?
                                                   FFFFFFFF7FFFC620 ?
                                                   000002868 ? 0018E0001 ?
kjblreplay()+7380    CALL     kjbrref()            000002868 ? 10C1CA3E0 ?
                                                   000021768 ? A681AFA10 ?
                                                   B92D0CFA0 ? C0F96F920 ?
kjbldrmrpst()+4864   CALL     kjblreplay()         000000000 ? 000000001 ?
                                                   10C1CA0A0 ? BDA03C9B8 ?
                                                   000000000 ? 10C1E8890 ?
kjmprcfgsync()+1424  CALL     kjbldrmrpst()        A681AFA10 ? 000000001 ?

另一个trace文件:

*** 2012-04-17 04:26:44.389
*** SESSION ID:(673.1) 2012-04-17 04:26:44.389
*** CLIENT ID:() 2012-04-17 04:26:44.389
*** SERVICE NAME:(SYS$BACKGROUND) 2012-04-17 04:26:44.389
*** MODULE NAME:() 2012-04-17 04:26:44.389
*** ACTION NAME:() 2012-04-17 04:26:44.389
Dump continued FROM file: /app/diag/rdbms/orcl/orcl1/trace/orcl1_lms1_8678.trc
ORA-00600: internal error code, arguments: [kjbmprlst:shadow], [], [], [], [], [], [], [], [], [], [], []
========= Dump FOR incident 432578 (ORA 600 [kjbmprlst:shadow]) ========
----- Beginning of Customized Incident Dump(s) -----
 FUSION MSG 0xffffffff79c40b80,39 FROM 2 spnum 14 ver[38,11161] ln 144 sq[2,8]
        REPLAY 1 [0x103699.c7, 151132.0] c[0x7e7bd3240,55] [0x494e,x38]
        GRANT 2 CONVERT 0 ROLE x0
        pi [0x0.0x0] flags 0x0 state 0x100
        disk scn 0x0.0 writereq scn 0x0.0 rreqid x0
        msgRM# 11161 bkt# 18131 drmbkt# 18131
    pkey 151132.0 undo 0 stat 5 masters[32768, 2->32768] reminc 38 RM# 11152
 flg x0 TYPE x0 afftime x8517cf38
 nreplays BY lms 0 = 4046 
 nreplays BY lms 1 = 4105 
 nreplays BY lms 2 = 4176 
 nreplays BY lms 3 = 4214 
 nreplays BY lms 4 = 4158 
 nreplays BY lms 5 = 4162 
   hv 125 [stat 0x0, 1->1, wm 32768, RMno 0, reminc 36, dom 0]
   kjga st 0x4, step 0.36.0, cinc 38, rmno 11161, flags 0x20
   lb 16384, hb 32767, myb 18131, drmb 18131, apifrz 1
 FUSION MSG DUMP END
 GCS RESOURCE 0xbb93a40e8 hashq [0xba8f40298,0xc27d16700] name[0x103699.c7] pkey 151008.0
   GRANT 0xb99d64f38 cvt 0x0 send 0x0@1,0 WRITE 0x0,0@65536
   flag 0x2 mdrole 0x1 mode 1 scan 0.0 ROLE LOCAL
   disk: 0x0000.00000000 WRITE: 0x0000.00000000 cnt 0x0 hist 0x0
   xid 0x0000.000.00000000 sid 1 pkwait 0s rmacks 0
   refpcnt 0 weak: 0x0000.00000000
   pkey 151008.0
   hv 125 [stat 0x0, 1->1, wm 32768, RMno 0, reminc 36, dom 0]
   kjga st 0x4, step 0.36.0, cinc 38, rmno 11161, flags 0x20
   lb 16384, hb 32767, myb 18131, drmb 18131, apifrz 1
   GCS SHADOW 0xb99d64f38,42 resp[0xbb93a40e8,0x103699.c7] pkey 151008.0
     GRANT 1 cvt 0 mdrole 0x1 st 0x100 lst 0x40 GRANTQ rl LOCAL
     master 1 owner 2 sid 1 remote[0x85fed2220,13] hist 0xb93e302087234c9f
     history 0x1f.0x19.0xd.0x39.0x8.0x4.0xc.0x1f.0x39.0x1.
     cflag 0x0 sender 0 flags 0x0 replay# 0 abast 0x0.x0.1 dbmap 0x0
     disk: 0x0000.00000000 WRITE request: 0x0000.00000000
     pi scn: 0x0000.00000000 sq[0xbb93a4118,0xbb93a4118]
     msgseq 0x1 updseq 0x0 reqids[13,0,0] infop 0x0 lockseq xf0d1
   GCS SHADOW END
 GCS RESOURCE END
----- End of Customized Incident Dump(s) -----
*** 2012-04-17 04:26:44.478
dbkedDefDump(): Starting incident DEFAULT dumps (flags=0x2, level=3, mask=0x0)
----- SQL Statement (None) -----
CURRENT SQL information unavailable - no cursor.
----- Call Stack Trace -----
calling              CALL     entry                argument VALUES IN hex      
location             TYPE     point                (? means dubious VALUE)     
-------------------- -------- -------------------- ----------------------------
ksedst1()+96         CALL     skdstdst()           FFFFFFFF7FFF4D20 ?
                                                   100670460 ? 000000000 ?
                                                   00000000A ? 000000001 ?
                                                   10BD552E0 ?
ksedst()+60          CALL     ksedst1()            000000000 ? 000000001 ?
                                                   00010C1D1 ? 00010C000 ?
                                                   10C1CA000 ? 00010C1CA ?
dbkedDefDump()+2032  CALL     ksedst()             000000000 ? 10B21A000 ?
                                                   10B21AA90 ? 10C1D2000 ?
                                                   00010B000 ? 00010C1D2 ?
dbgexPhaseII()+1800  PTR_CALL dbkedDefDump()       000000003 ? 000000002 ?
                                                   10A6ABAA8 ? 0000014B0 ?
                                                   10C1C9000 ? 000000003 ?
dbgexExplicitEndInc  CALL     dbgexPhaseII()       10C373D30 ?
()+728                                             FFFFFFFF7A634920 ?
                                                   FFFFFFFF7FFF90FC ?
                                                   0018E0001 ? 10A6A2D98 ?
                                                   000001C00 ?
dbgeEndDDEInvocatio  CALL     dbgexExplicitEndInc  10A6A2C50 ?
nImpl()+704                   ()                   FFFFFFFF7A634920 ?
                                                   FFFFFFFF7FFF9048 ?
                                                   FFFFFFFF7FFFC740 ?
                                                   000000000 ?
                                                   FFFFFFFFFE4E26A0 ?
kjbmprlst()+13504    CALL     dbgeEndDDEInvocatio  10C373D30 ? 001B1D800 ?
                              n()                  FFFFFFFFFEC0AF31 ?
                                                   FFFFFFFF7FFFC740 ?
                                                   0013F5000 ? 0018E0001 ?
kjmxmpm()+796        PTR_CALL kjbmprlst()          101782000 ? 00010C1CA ?
                                                   10C1EA000 ? 10C1CA000 ?
                                                   10A6A3000 ? 10A6A3000 ?
kjmpbmsg()+4584      CALL     kjmxmpm()            00010A400 ? 000000000 ?
                                                   0852DA2C5 ? 00010C000 ?
                                                   10A7EE000 ? BE22AF0C0 ?
kjmsm()+11308        CALL     kjmpbmsg()           00010A400 ? 00000009C ?
                                                   00010C000 ? 10A7EE000 ?
                                                   000000001 ? 000000027 ?
ksbrdp()+1236        PTR_CALL kjmsm()              000001888 ? 25916872D1 ?
                                                   000002000 ? 000000000 ?
                                                   00000024B ? 000001000 ?
opirip()+1008        CALL     ksbrdp()             10BB56000 ? BD8C0B680 ?
                                                   000000001 ? 000001400 ?
                                                   00010B800 ? 10AC212D8 ?
opidrv()+780         CALL     opirip()             10A6A3000 ? 380013D50 ?
                                                   000380002 ? 3800055C0 ?
                                                   380002000 ? 00010C000 ?
sou2o()+92           CALL     opidrv()             000000032 ? 000000004 ?
                                                   FFFFFFFF7FFFF780 ?
                                                   0001EA190 ?
                                                   FFFFFFFF7AF42F10 ?
                                                   FFFFFFFF7FFFFBB8 ?
opimai_real()+516    CALL     sou2o()              FFFFFFFF7FFFF758 ?

可以看到,两个TRACE文件也非常接近,而且连报错的前几个堆栈函数的名称都完全一样。
查询MOS,确认为Bug 12834027 ORA-600 [kjbmprlst:shadow] / ORA-600 [kjbrasr:pkey] with RAC read mostly locking,这个问题在最新的11.2.0.3.1PSU中被FIXED,除了打补丁之外,还可以考虑通过隐含参数”_gc_read_mostly_locking”=FALSE来禁止READ-MOSTLY OBJECT LOCKING。此外,禁止DRM也可以避免该错误的产生。

Posted in BUG | Tagged , , , , , , , | Leave a comment

ORA-600(qerrmOStart2)错误

客户11.2.0.2 RAC for Solaris 10出现ORA-600错误。
错误信息为:

Fri May 11 14:22:45 2012
Errors IN file /oracle/diag/rdbms/orcl/orcl2/trace/orcl2_ora_25552.trc (incident=194913):
ORA-00600: 内部错误代码, 参数: [qerrmOStart2], [1740], [ORA-01740: 标识符中缺失双引号
], [], [], [], [], [], [], [], [], []
Incident details IN: /oracle/diag/rdbms/orcl/orcl2/incident/incdir_194913/orcl2_ora_25552_i194913.trc
Fri May 11 14:23:29 2012
Fri May 11 14:23:29 2012
Dumping diagnostic DATA IN directory=[cdmp_20120511142329], requested BY (instance=2, osid=25552), summary=[incident=194913].Use ADRCI OR Support Workbench t
o package the incident.
See Note 411.1 at My Oracle Support FOR error AND packaging details.
Exception [TYPE: SIGSEGV, Stack Overflow] [ADDR:0x8] [PC:0x10611C144, qerrmOdcl()+36] [flags: 0x0, COUNT: 1]
Errors IN file /oracle/diag/rdbms/orcl/orcl2/trace/orcl2_ora_25552.trc (incident=194914):
ORA-07445: 出现异常错误: 核心转储 [qerrmOdcl()+36] [SIGSEGV] [ADDR:0x8] [PC:0x10611C144] [Stack Overflow] []
ORA-00600: 内部错误代码, 参数: [qerrmOStart2], [1740], [ORA-01740: 标识符中缺失双引号
], [], [], [], [], [], [], [], [], []
Incident details IN: /oracle/diag/rdbms/orcl/orcl2/incident/incdir_194914/orcl2_ora_25552_i194914.trc
USE ADRCI OR Support Workbench TO package the incident.
See Note 411.1 at My Oracle Support FOR error AND packaging details.
Fri May 11 14:23:30 2012
Sweep [inc][194913]: completed
Sweep [inc][194914]: completed
Sweep [inc2][194913]: completed

虽然这个操作在MOS没有相关描述,但是根据错误信息分析,这个ORA-600只是标识作用,真正的错误是随后参数中的ORA-1740错误。而这个错误信息更像是一个语句分析的错误。

$ more /oracle/diag/rdbms/orcl/orcl2/incident/incdir_194913/orcl2_ora_25552_i194913.trc
Dump file /oracle/diag/rdbms/orcl/orcl2/incident/incdir_194913/orcl2_ora_25552_i194913.trc
Oracle DATABASE 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
WITH the Partitioning, REAL Application Clusters, OLAP, DATA Mining
AND REAL Application Testing options
ORACLE_HOME = /oracle/product/11.2.0/dbhome_1
System name:    SunOS
Node name:      orcl2
Release:        5.10
Version:        Generic_147440-02
Machine:        sun4u
Instance name: orcl2
Redo thread mounted BY this instance: 2
Oracle process NUMBER: 364
Unix process pid: 25552, image: oracle@orcl2
 
 
*** 2012-05-11 14:22:45.579
*** SESSION ID:(1590.33807) 2012-05-11 14:22:45.579
*** CLIENT ID:() 2012-05-11 14:22:45.579
*** SERVICE NAME:(orcl) 2012-05-11 14:22:45.579
*** MODULE NAME:(PL/SQL Developer) 2012-05-11 14:22:45.579
*** ACTION NAME:(SQL Window - NEW) 2012-05-11 14:22:45.579
 
Dump continued FROM file: /oracle/diag/rdbms/orcl/orcl2/trace/orcl2_ora_25552.trc
ORA-00600: 内部错误代码, 参数: [qerrmOStart2], [1740], [ORA-01740: 标识符中缺失双引号
], [], [], [], [], [], [], [], [], []
 
========= Dump FOR incident 194913 (ORA 600 [qerrmOStart2]) ========
 
*** 2012-05-11 14:22:45.606
dbkedDefDump(): Starting incident DEFAULT dumps (flags=0x2, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=2z1m6fp3p7qjd) -----
SELECT o_no,
       c_state,
       d_state,
       D_CONTENT,
       O_TIME,
       o_id,
       O_NAME,
       l_name,
       l_order
 FROM
(SELECT b.o_no,a.c_state,'' d_state, a.D_CONTENT,a.O_TIME,a.o_id,a.O_NAME,
       l_name,l_order,
       ROW_NUMBER() OVER(PARTITION BY a.O_ID ORDER BY a.O_TIME DESC) RN
  FROM T_O_D@DB_A A, T_B_O@DB_A B,t_b_o_l@db_a c
 WHERE A.O_ID = B.O_ID
   AND a.o_id = c.o_id
   AND A.O_TIME >= TRUNC(SYSDATE - 1)
   AND A.O_TIME < TRUNC(SYSDATE)
   AND NOT EXISTS (SELECT 1
          FROM T_B_O_E@DB_A O
         WHERE O.O_ID = A.O
           AND O.O_TIME >= TRUNC(SYSDATE - 1)
           AND O.O_TIME < TRUNC(SYSDATE))
    UNION ALL
SELECT b.o_no,a.c_state, 'TK',a.s_remark,a.O_TIME,a.o_id,a.O_NAME,
       l_name,l_order,
       ROW_NUMBER() OVER(PARTITION BY a.O_ID ORDER BY a.O_TIME DESC) RN
  FROM T_B_O_E@DB_A A, T_B_O@DB_A B,t_b_o_l@db_a c
 WHERE A.O_ID = B.O_ID
   AND a.o_id = c.o_id
   AND A.O_TIME >= TRUNC(SYSDATE - 1)
   AND A.O_TIME < TRUNC(SYSDATE)
   AND b.r_time >= TRUNC(SYSDATE - 1)
   AND b.r_time < TRUNC(SYSDATE)
   AND b.p_state = '2'
   AND a.e_type = '14'
    UNION ALL
SELECT b.o_no,a.c_state,'TD', a.s_remark,a.O_TIME,a.o_id,a.O_NAME,
       l_name,l_order,
       ROW_NUMBER() OVER(PARTITION BY a.O_ID ORDER BY a.O_TIME DESC) RN
  FROM T_B_O_E@DB_A  A, T_B_O@DB_A B,t_b_o_l@db_a c
 WHERE A.O_ID = B.O_ID
   AND a.o_id = c.o_id
   AND A.O_TIME >= TRUNC(SYSDATE - 1)
   AND A.O_TIME < TRUNC(SYSDATE)
   AND b.c_tag = '1'
   AND a.e_type = '04'
UNION ALL
SELECT to_char(P_ID), STATE, DECODE(STATE, '1', 'S', '2', 'ZDTD'), NULL,
       U_TIME, NULL, NULL, NULL, NULL, 1
  FROM T_B_B_I@DB_A A
 WHERE STATE IN ('1', '2')
   AND SCODE = 'EMAL'
   AND U_TIME >= TRUNC(SYSDATE - 1)
   AND U_TIME < TRUNC(SYSDATE))
   WHERE rn = 1
 
----- Call Stack Trace -----
calling              CALL     entry                argument VALUES IN hex      
location             TYPE     point                (? means dubious VALUE)     
-------------------- -------- -------------------- ----------------------------
ksedst1()+96         CALL     skdstdst()           FFFFFFFF7FFEEED0 ?
                                                   1006B0C80 ? 000000000 ?
                                                   00000000A ? 000000001 ?
                                                   10BD95B00 ?
ksedst()+60          CALL     ksedst1()            000000000 ? 000000001 ?
                                                   00010C212 ? 00010C000 ?
                                                   10C20A000 ? 00010C20A ?
dbkedDefDump()+2032  CALL     ksedst()             000000000 ? 10B25B000 ?
                                                   10B25B2B0 ? 10C212000 ?
                                                   00010B000 ? 00010C212 ?
dbgexPhaseII()+1800  PTR_CALL dbkedDefDump()       000000003 ? 000000002 ?
                                                   10A6EC2C8 ? 0000014B0 ?
                                                   10C20A000 ? 000000003 ?
dbgexProcessError()  CALL     dbgexPhaseII()       10C3B4650 ?
+1248                                              FFFFFFFF7BE3A578 ?
                                                   FFFFFFFF7FFF3C28 ?
                                                   0018E0000 ? 10A6E35B8 ?
                                                   000001C00 ?
dbgePostErrorKGE()+  CALL     dbgeExecuteForError  10AE0C3FD ?
1320                          ()                   FFFFFFFFFEC0B62D ?
                                                   001050000 ?
                                                   FFFFFFFF7FFF63D8 ?
                                                   001060000 ? 000000028 ?
dbkePostKGE_kgsf()+  CALL     dbgePostErrorKGE()   10C20AC90 ? 000000000 ?
44                                                 FFFFFFFF7BE3A578 ?
                                                   000000000 ? 000000258 ?
                                                   00010C000 ?
kgerinv_internal()+  CALL     kgeadse()            10C20AC90 ?
72                                                 FFFFFFFF7BC22F20 ?
                                                   000000258 ? 000002868 ?
                                                   10A6E4000 ? 00010A6E4 ?
kgerinv()+40         CALL     kgerinv_internal()   10C20AC90 ? 004EA2360 ?
                                                   10BC2F628 ? 000000258 ?
                                                   000000000 ? 000000002 ?
kgesinv()+20         CALL     kgerinv()            10C20AC90 ?
                                                   FFFFFFFF7BC22F20 ?
                                                   10BC2F628 ? 000000002 ?
                                                   FFFFFFFF7FFF7840 ?
                                                   000001400 ?
ksesin()+92          CALL     kgesinv()            10C20AC90 ?
                                                   FFFFFFFF7BC22F20 ?
                                                   10BC2F628 ? 000000002 ?
                                                   FFFFFFFF7FFF7840 ?
                                                   00010C212 ?
OCIKSIN()+412        CALL     ksesin()             10BC2F628 ? 10C212000 ?
                                                   00010C000 ? 00010C000 ?
                                                   00010C20A ? 00010C212 ?
qerrmOStart()+516    CALL     OCIKSIN()            FFFFFFFFFFFFFFFF ?
                                                   0000006CC ? 10BC2F628 ?
                                                   00010BA9D ? 10BA9D890 ?
                                                   FFFFFFFF7FFF786C ?
qerrmStart()+1528    CALL     qerrmOStart()        10E603CD58 ?
                                                   FFFFFFFF7BC68170 ?
                                                   000000000 ?
                                                   FFFFFFFF7DF91088 ?
                                                   000106000 ?
                                                   FFFFFFFF7BC67F2C ?
selexe0()+976        PTR_CALL qerrmStart()         FFFFFFFF7BC68170 ?
                                                   000000003 ? 10E603CD58 ?
                                                   FFFFFFFF7BC75250 ?
                                                   000000001 ?
                                                   FFFFFFFF7DFBE5C8 ?
opiexe()+11664       CALL     selexe0()            FFFFFFFF7BC367C8 ?
                                                   10C21FAE0 ?
                                                   FFFFFFFF7BC41C90 ?
                                                   000000000 ? 10E60B2EE0 ?
                                                   10C21F000 ?
kpoal8()+2368        CALL     opiexe()             000000049 ? 000000003 ?
                                                   FFFFFFFF7FFF9E8C ?
                                                   000000000 ? 000000000 ?
                                                   0BFFFFFFF ?
opiodr()+1428        PTR_CALL kpoal8()             00000005E ? 00000001C ?
                                                   FFFFFFFF7FFFD348 ?
                                                   00010C000 ? 10C20A000 ?
                                                   000001648 ?
ttcpip()+1056        PTR_CALL opiodr()             00010A795 ? 00000001C ?
                                                   103EAD460 ? 00010A400 ?
                                                   000001400 ? 10C20A000 ?
opitsk()+1528        CALL     ttcpip()             000000000 ? 10A6C7694 ?
                                                   10C20AC90 ?
                                                   FFFFFFFF7FFFD348 ?
                                                   FFFFFFFF7FFFBD90 ?
                                                   10C221848 ?
opiino()+1000        CALL     opitsk()             10A6C7694 ? 10C226C98 ?
                                                   10C221654 ? 10C21F958 ?
                                                   000000000 ? 10C20A950 ?
opiodr()+1428        PTR_CALL opiino()             00010C000 ? 10C2216D0 ?
                                                   10C2216D0 ? 000380000 ?
                                                   0000000EB ?
                                                   FFFFFFFF7FFFECA0 ?
opidrv()+1100        CALL     opiodr()             10C221000 ? 000000004 ?
                                                   1035DD740 ? 00010C000 ?
                                                   000001400 ? 10C20A000 ?
sou2o()+92           CALL     opidrv()             00000003C ? 000000004 ?

显然这是PL/DEVELOPER执行或解析语句的时候引发的问题,本质上是一个普通的编译错误,但是Oracle将其作为内部ORA-600错误报了出来。
对于这个错误可以简单的忽略。

Posted in BUG | Tagged , , , , , , | Leave a comment

ORA-600(qkaffsindex5)错误

客户11.2.0.2 RAC for Solaris10上的错误。
错误信息为:

2012-05-04 22:00:04.768000 +08:00
BEGIN automatic SQL Tuning Advisor run FOR special tuning task "SYS_AUTO_SQL_TUNING_TASK"
2012-05-04 22:00:17.279000 +08:00
Errors IN file /oracle/diag/rdbms/orcl/orcl1/trace/orcl1_j002_5730.trc (incident=231810):
ORA-00600: internal error code, arguments: [qkaffsindex5], [], [], [], [], [], [], [], [], [], [], []
Incident details IN: /oracle/diag/rdbms/orcl/orcl1/incident/incdir_231810/orcl1_j002_5730_i231810.trc
2012-05-04 22:01:38.428000 +08:00
Dumping diagnostic DATA IN directory=[cdmp_20120504220138], requested BY (instance=1, osid=5730 (J002)), summary=[incident=231810].
USE ADRCI OR Support Workbench TO package the incident.
See Note 411.1 at My Oracle Support FOR error AND packaging details.
Sweep [inc][231810]: completed
Sweep [inc2][231810]: completed

查询详细信息:

$ more /oracle/diag/rdbms/orcl/orcl1/incident/incdir_231810/orcl1_j002_5730_i231810.trc
Dump file /oracle/diag/rdbms/orcl/orcl1/incident/incdir_231810/orcl1_j002_5730_i231810.trc
Oracle DATABASE 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
WITH the Partitioning, REAL Application Clusters, OLAP, DATA Mining
AND REAL Application Testing options
ORACLE_HOME = /oracle/product/11.2.0/dbhome_1
System name:    SunOS
Node name:      racdb1
Release:        5.10
Version:        Generic_142900-14
Machine:        sun4u
Instance name: orcl1
Redo thread mounted BY this instance: 1
Oracle process NUMBER: 129
Unix process pid: 5730, image: oracle@racdb1 (J002)
 
 
*** 2012-05-04 22:00:17.299
*** SESSION ID:(25.62649) 2012-05-04 22:00:17.299
*** CLIENT ID:() 2012-05-04 22:00:17.299
*** SERVICE NAME:(SYS$USERS) 2012-05-04 22:00:17.299
*** MODULE NAME:(DBMS_SCHEDULER) 2012-05-04 22:00:17.299
*** ACTION NAME:(ORA$AT_SQ_SQL_SW_6463) 2012-05-04 22:00:17.299
 
Dump continued FROM file: /oracle/diag/rdbms/orcl/orcl1/trace/orcl1_j002_5730.trc
ORA-00600: internal error code, arguments: [qkaffsindex5], [], [], [], [], [], [], [], [], [], [], []
 
========= Dump FOR incident 231810 (ORA 600 [qkaffsindex5]) ========
 
*** 2012-05-04 22:00:17.319
dbkedDefDump(): Starting incident DEFAULT dumps (flags=0x2, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=c9nhsv0f2b021) -----
/* SQL Analyze(25,1) */ SELECT MENU_ID,MENU_NAME,PROV_CODE FROM VA_MENU WHERE STATUS = :1  AND (PROV_CODE = '098' OR PROV_CODE = :2 ) ORDER BY PROV_CODE DESC
, SEQ_NUM
----- PL/SQL Stack -----
----- PL/SQL Call Stack -----
  object      line  object
  handle    NUMBER  name
9e289c2e0     11816  package body SYS.DBMS_SQLTUNE_INTERNAL
9f7693938         7  SYS.WRI$_ADV_SQLTUNE
9e86e5c88       587  package body SYS.PRVT_ADVISOR
9e86e5c88      2655  package body SYS.PRVT_ADVISOR
5a34f0858       241  package body SYS.DBMS_ADVISOR
9e5ef9668       821  package body SYS.DBMS_SQLTUNE
9e8456960         4  anonymous block
 
----- Call Stack Trace -----
calling              CALL     entry                argument VALUES IN hex      
location             TYPE     point                (? means dubious VALUE)     
-------------------- -------- -------------------- ----------------------------
ksedst1()+96         CALL     skdstdst()           FFFFFFFF7FFD27F0 ?
                                                   100670460 ? 000000000 ?
                                                   00000000A ? 000000001 ?
                                                   10BD552E0 ?
ksedst()+60          CALL     ksedst1()            000000000 ? 000000001 ?
                                                   00010C1D1 ? 00010C000 ?
                                                   10C1CA000 ? 00010C1CA ?
dbkedDefDump()+2032  CALL     ksedst()             000000000 ? 10B21A000 ?
                                                   10B21AA90 ? 10C1D2000 ?
                                                   00010B000 ? 00010C1D2 ?
dbgexPhaseII()+1800  PTR_CALL dbkedDefDump()       000000003 ? 000000002 ?
                                                   10A6ABAA8 ? 0000014B0 ?
                                                   10C1C9000 ? 000000003 ?
dbgexProcessError()  CALL     dbgexPhaseII()       10C373D30 ?
+1248                                              FFFFFFFF7A632830 ?
                                                   FFFFFFFF7FFD7548 ?
                                                   0018E0000 ? 10A6A2D98 ?
                                                   000001C00 ?
dbgePostErrorKGE()+  CALL     dbgeExecuteForError  10ADCBBDD ?
1320                          ()                   FFFFFFFFFEC0B62D ?
                                                   001050000 ?
                                                   FFFFFFFF7FFD9CF8 ?
                                                   001060000 ? 000000028 ?
dbkePostKGE_kgsf()+  CALL     dbgePostErrorKGE()   10C1CA3E0 ? 000000000 ?
44                                                 FFFFFFFF7A632830 ?
                                                   000000000 ? 000000258 ?
                                                   00010C000 ?
kgerinv_internal()+  CALL     kgeadse()            10C1CA3E0 ?
72                                                 FFFFFFFF7A63ADC0 ?
                                                   000000258 ? 000002868 ?
                                                   10A6A3000 ? 00010A6A3 ?
kgerinv()+40         CALL     kgerinv_internal()   10C1CA3E0 ? 004EA2360 ?
                                                   10B77E7B0 ? 000000258 ?
                                                   000000000 ? 000000000 ?
kgeasnmierr()+28     CALL     kgerinv()            10C1CA3E0 ?
                                                   FFFFFFFF7A63ADC0 ?
                                                   10B77E7B0 ? 000000000 ?
                                                   FFFFFFFF7FFDB0C0 ?
                                                   000001400 ?
qkaffsindex()+7648   CALL     kgeasnmierr()        10C1CA3E0 ?
                                                   FFFFFFFF7A63ADC0 ?
                                                   10B77E7B0 ? 000000000 ?
                                                   10C1CA000 ? 00010C1D1 ?
qkatab()+4060        CALL     qkaffsindex()        FFFFFFFF7A03ACB8 ?

错误发生在SQL_TUNING的过程中,那么多半是Oracle的bug。查询MOS,果然发现文档Bug 12869386 : DBMS_SQLTUNE.EXECUTE_TUNING_TASK REPORTS ORA-600 [QKAFFSINDEX5]记录了这个问题,不过Oracle目前虽然确认了这个bug,但是还没有提供明确的解决这个问题的方案。
好在问题发生在SQLTUNE功能上,即使失败对于数据库的运行也没有影响。

Posted in BUG | Tagged , , , | Leave a comment