统计信息收集出现DFS等待导致实例HANG死

客户10.2.0.4 RAC环境,出现大量的library cache lock和cursor: pin S wait on X等待,经分析是由于统计信息收集僵死导致的。
数据库在8点到9点期间,数据库两个节点都存在明显的cursor: pin S wait on X和library cache lock的等待:

Event

Waits

Time(s)

Avg   Wait(ms)

%   Total Call Time

Wait   Class

cursor:   pin S wait on X

1,573,056

30,651

19

146.2

Concurrency

library   cache lock

31,757

7,009

221

33.4

Concurrency

CPU   time

6,416

30.6

DFS   lock handle

12,381

2,979

241

14.2

Other

latch:   library cache

1,646

1,974

1,199

9.4

Concurrency

Event

Waits

Time(s)

Avg   Wait(ms)

%   Total Call Time

Wait   Class

cursor:   pin S wait on X

17,586,451

342,437

19

812.1

Concurrency

library   cache lock

63,657

30,153

474

71.5

Concurrency

CPU   time

3,820

9.1

db   file sequential read

241,761

1,094

5

2.6

User   I/O

inactive   session

1,105

1,090

986

2.6

Other

两个节点的等待现象基本一致,而节点1上还存在明显的DFS lock handle等待事件。
通过分析ASH信息,发现library cache lock和cursor: pin S wait on X等待基本上都是6点之后才开始出现:

SQL> SELECT trunc(sample_time, 'hh24') TIME, COUNT(*)
  2  FROM WRH$_ACTIVE_SESSION_HISTORY ash, wrh$_event_name en
  3  WHERE ash.event_id = en.event_id
  4  AND sample_time >= to_timestamp('20130703', 'yyyymmdd')
  5  AND event_name IN ('cursor: pin S wait on X', 'library cache lock')
  6  GROUP BY trunc(sample_time, 'hh24')
  7  ORDER BY 1;
TIME                  COUNT(*)
------------------- ----------
2013-07-03 00:00:00          4
2013-07-03 01:00:00          8
2013-07-03 02:00:00          3
2013-07-03 03:00:00          8
2013-07-03 04:00:00          9
2013-07-03 05:00:00         14
2013-07-03 06:00:00        348
2013-07-03 07:00:00       2138
2013-07-03 08:00:00      41576
2013-07-03 09:00:00      50108
2013-07-03 10:00:00       2452

而观察出现cursor: pin S wait on X和library cache lock等待会话的BLOCKING SESSION,发现大部分处于等待的会话被会话37锁定:

SQL> SELECT to_char(sample_time, 'hh24:mi:ss.ff') TIME, session_id sid, 
  2  decode(en.event_name, 'cursor: pin S wait on X', to_number(substr(ltrim(to_char(p2, '0xxxxxxxxxxxxxxx')), 1, 8), 'xxxxxxxx'), blocking_session) b_sid, 
  3  en.event_name  
  4  FROM WRH$_ACTIVE_SESSION_HISTORY ash, wrh$_event_name en
  5  WHERE ash.event_id = en.event_id(+)
  6  AND instance_number = 1
  7  AND sample_time >= to_timestamp('201307030800', 'yyyymmddhh24mi')
  8  AND sample_time <= to_timestamp('201307030801', 'yyyymmddhh24mi')
  9  ORDER BY 1;
TIME                      SID      B_SID EVENT_NAME
------------------ ---------- ---------- ------------------------------
08:00:09.569              457 4294967291 latch: library cache
08:00:09.569               52        981 cursor: pin S wait ON X
08:00:09.569              168         73 library cache LOCK
08:00:09.569             1763 4294967291 SQL*Net message FROM client
08:00:09.569              123 4294967295 kksfbc child completion
08:00:09.569              476 4294967291 kst: async disk IO
08:00:09.569              500 4294967292 db file sequential READ
08:00:09.569              327 4294967291 SQL*Net message TO client
08:00:09.569              233 4294967291 db file scattered READ
08:00:09.569               73         73 DFS LOCK handle
08:00:09.569             1889 4294967291 gc CURRENT block 2-way
08:00:09.569             1969        981 cursor: pin S wait ON X
08:00:09.569              857        981 cursor: pin S wait ON X
08:00:09.569              812        168 cursor: pin S wait ON X
08:00:09.569              981         73 library cache LOCK
08:00:19.669               52        981 cursor: pin S wait ON X
08:00:19.669              812        168 cursor: pin S wait ON X
08:00:19.669              857        981 cursor: pin S wait ON X
08:00:19.669             1969        981 cursor: pin S wait ON X
08:00:19.669              233 4294967291 db file scattered READ
08:00:19.669              500 4294967292 db file sequential READ
08:00:19.669             1763 4294967291 db file sequential READ
08:00:19.669              476 4294967291 kst: async disk IO
08:00:19.669              457 4294967291 kksfbc child completion
08:00:19.669               73         73 DFS LOCK handle
08:00:19.669              654 4294967291 SQL*Net message TO client
08:00:19.669              168         73 library cache LOCK
08:00:19.669              981         73 library cache LOCK
08:00:29.769             1969        981 cursor: pin S wait ON X
08:00:29.769               52        981 cursor: pin S wait ON X
08:00:29.769              457 4294967291 kksfbc child completion
08:00:29.769              981         73 library cache LOCK
08:00:29.769              168         73 library cache LOCK
08:00:29.769              500 4294967292 db file sequential READ
08:00:29.769               73         73 DFS LOCK handle
08:00:29.769              654 4294967291 SQL*Net message TO client
08:00:29.769              327 4294967291 SQL*Net message TO client
08:00:29.769             1763 4294967291 db file sequential READ
08:00:29.769              857        981 cursor: pin S wait ON X
08:00:29.769              123 4294967295 latch: library cache
08:00:29.769              196 4294967295 latch: library cache
08:00:29.769              233 4294967295 latch: library cache
08:00:29.769              812        168 cursor: pin S wait ON X
08:00:29.769              476 4294967291 kst: async disk IO
08:00:39.879              476 4294967291 kst: async disk IO
08:00:39.879              658 4294967291 db file sequential READ
08:00:39.879              500 4294967292 db file sequential READ
08:00:39.879             1969        981 cursor: pin S wait ON X
08:00:39.879              857        981 cursor: pin S wait ON X
08:00:39.879              812        168 cursor: pin S wait ON X
08:00:39.879               52        981 cursor: pin S wait ON X
08:00:39.879              981         73 library cache LOCK
08:00:39.879              168         73 library cache LOCK
08:00:39.879             1849 4294967291 SQL*Net message TO client
08:00:39.879              327 4294967291 SQL*Net message TO client
08:00:39.879              235 4294967291 rdbms ipc reply
08:00:39.879               19 4294967291 rdbms ipc reply
08:00:39.879               73         73 DFS LOCK handle
08:00:39.879              123 4294967294 enq: WF - contention
08:00:49.987              476 4294967291 kst: async disk IO
08:00:49.987              500 4294967292 db file sequential READ
08:00:49.987              382 4294967291 gc cr block busy
08:00:49.987             1969        981 cursor: pin S wait ON X
08:00:49.987              857        981 cursor: pin S wait ON X
08:00:49.987              812        168 cursor: pin S wait ON X
08:00:49.987              235 4294967295 kksfbc child completion
08:00:49.987              658 4294967295 latch: library cache
08:00:49.987               19 4294967291 latch: library cache
08:00:49.987              981         73 library cache LOCK
08:00:49.987              168         73 library cache LOCK
08:00:49.987             1889 4294967291 SQL*Net message FROM client
08:00:49.987               73         73 DFS LOCK handle
08:00:49.987               52        981 cursor: pin S wait ON X

可以判断,导致数据库实例HANG死的原因是因为SID为73的进程:

SQL> SELECT to_char(sample_time, 'hh24:mi:ss'), program, action, event_name 
  2  FROM wrh$_active_session_history ash, wrh$_event_name en
  3  WHERE session_id = 73
  4  AND ash.event_id = en.event_id(+)
  5  AND sample_time >= to_date('201307030550', 'yyyymmddhh24mi')
  6  AND sample_time <= to_date('201307030601', 'yyyymmddhh24mi')
  7  ORDER BY 1;
TO_CHAR( PROGRAM                                  ACTION                         EVENT_NAME
-------- ---------------------------------------- ------------------------------ ---------------------------
05:50:07 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               db file sequential READ
05:50:17 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               db file sequential READ
05:50:27 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:50:37 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:50:47 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               gc cr block 2-way
05:50:57 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:51:08 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:51:18 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               db file scattered READ
05:51:28 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               gc cr multi block request
05:51:38 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               direct path WRITE temp
05:51:48 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:51:58 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:52:08 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:52:18 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               db file scattered READ
05:52:28 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               db file sequential READ
05:52:38 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:52:49 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:52:59 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               db file scattered READ
05:53:09 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:53:19 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               db file scattered READ
05:53:29 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:53:39 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:53:49 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               gc CURRENT GRANT 2-way
05:53:59 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:54:09 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:54:20 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:54:30 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:54:40 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:54:50 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:55:00 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:55:10 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               db file sequential READ
05:55:20 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:55:30 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:55:40 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:55:51 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:56:01 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:56:11 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:56:21 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:56:31 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:56:41 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:56:51 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               gc CURRENT GRANT 2-way
05:57:01 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               library cache pin
05:57:11 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               DFS LOCK handle
05:57:21 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               DFS LOCK handle
05:57:32 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:57:42 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:57:52 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               DFS LOCK handle
05:58:02 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               db file sequential READ
05:58:12 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               db file sequential READ
05:58:22 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:58:32 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:58:42 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:58:52 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:59:03 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:59:13 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:59:23 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:59:33 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:59:43 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               db file sequential READ
05:59:53 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
06:00:03 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               DFS LOCK handle
06:00:13 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
06:00:23 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               DFS LOCK handle
06:00:34 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               DFS LOCK handle
06:00:44 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               DFS LOCK handle
06:00:54 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               DFS LOCK handle

根据MOS文档,Bug 6011045 – DBMS_STATS causes deadlock between ‘cursor: pin S wait on X’ and ‘library cache lock’ [ID 6011045.8],这个问题是RAC环境中Oracle收集统计信息的BUG,在自动收集数据字典信息时,可能会出现进程HANG死,并导致大量的library cache lock和cursor: pin S wait on X等待,此外还可能出现DFS lock handle以及row cache lock的等待。
这个BUG确认影响版本包括10.2.0.4和10.2.0.5,这个BUG Oracle在10.2.0.5.5 PSU中进行了修复。

Posted in BUG | Tagged , , , , | Leave a comment

ORA-600(kposcws-2)错误

客户11.2.0.2 RAC环境出现ORA-600[kposcws-2]错误。
错误信息为:

Thu Sep 27 10:48:15 2012
Errors IN file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_34145026.trc  (incident=682292):
ORA-00600: 内部错误代码, 参数: [kposcws-2], [], [], [], [], [], [], [], [], [], [], []
Incident details IN: /u01/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_682292/orcl1_ora_34145026_i682292.trc

根据MOS文档Bug 6615409 – ORA-600 [kposcws-2] on scrollable cursor [ID 6615409.8],导致问题是由于使用了scrollable游标。
不过这个BUG应该在10.2.0.5和11.2.0.1中已经被修正,虽然问题在11.2.0.2中重现的可能性不大,但是根据已知的BUG描述,似乎BUG重现是最合理的解释。
由于没有11.2.0.2上问题出现的描述,因此也不会存在单独的补丁,唯一可以尝试的版本就是将数据库版本升级到11.2.0.3以上。

Posted in BUG | Tagged , , | Leave a comment

ORA-600(kollasg:client-side tmp lob)错误

客户11.2.0.3 RAC环境,出现ORA-600[kollasg:client-side tmp lob]错误。
错误信息为:

Thu Apr 28 20:54:53 2011
Errors IN file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_31916032.trc  (incident=148665):
ORA-00600: 内部错误代码, 参数: [kollasg:client-side tmp lob], [], [], [], [], [], [], [], [], [], [], []
Incident details IN: /u01/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_148665/orcl1_ora_31916032_i148665.trc
Thu Apr 28 20:54:59 2011
Trace dumping IS performing id=[cdmp_20110428205459]
Thu Apr 28 20:54:59 2011
Sweep [inc][148665]: completed
Sweep [inc2][148665]: completed

从错误信息上很容易判断,问题应该和客户端处理临时LOB有关。根据MOS文档ORA-00600 [kollasg:client-side tmp lob] Fetching Row With LOB Column in OCI Application [ID 1418135.1],在11.1以上版本中,如果读取一个没有被初始化过的LOB列,会引发这个600错误。
Oracle给出的解决方案是对于空的CLOB,使用EMPTY_CLOB函数对齐进行初始化。

Posted in BUG | Tagged , , , | Leave a comment

ORA-600(ksxp_rm_check0)错误

客户11.2.0.3 RAC的ASM实例出现ORA-600[ksxp_rm_check0]错误。
错误信息如下:

Tue Jul 02 19:22:59 2013
Errors IN file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_6488960.trc  (incident=165804):
ORA-04031: unable TO allocate 3000 bytes OF shared memory ("shared pool","select  INSTANCE_NUMBER , IN...","sga heap(1,0)","call")
USE ADRCI OR Support Workbench TO package the incident.
See Note 411.1 at My Oracle Support FOR error AND packaging details.
Tue Jul 02 19:22:59 2013
ERROR: Unable TO normalize symbol name FOR the following short stack (at offset 174):
dbgexProcessError()+180<-dbgeExecuteForError()+72<-dbgePostErrorKGE()+2048<-dbkePostKGE_kgsf()+68<-kgeadse()+380<-kgerinv_internal()+48<-kgerinv()+48<-kgeasnmierr()+72<-IPRA.$ksxp_rm_check()+188<-ksxp_rm_check()+80<-ksliwat()+9376<-kslwaitctx()+180<-kmgs_immediate_req()+2396<-ksmasg()+144<-kghnospc()+632<-kghalo()+1964<-ksmdacnk()+276<-ksmdget()+628<-kssadpm()+324<-ksucrp()+904<-ksvrdp()+156<-opirip()+724<-opidrv()+608<-sou2o()+136<-opimai_real()+188<-ssthrdmain()+268<-main()+204<-__start()+112
Errors IN file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_m000_22151498.trc  (incident=186272):
ORA-00600: internal error code, arguments: [ksxp_rm_check0], [0x110A6FE08], [0], [0x700000059D65FD8], [], [], [], [], [], [], [], []
Incident details IN: /u01/app/grid/diag/asm/+asm/+ASM2/incident/incdir_186272/+ASM2_m000_22151498_i186272.trc
USE ADRCI OR Support Workbench TO package the incident.
See Note 411.1 at My Oracle Support FOR error AND packaging details.
Tue Jul 02 19:23:01 2013
Process m000 died, see its trace file
Tue Jul 02 19:23:01 2013
Dumping diagnostic DATA IN directory=[cdmp_20130702192301], requested BY (instance=2, osid=22151498 (M000)), summary=[incident=186272].

从错误信息上判断,这个ORA-600错误和前面出现了ORA-4031应该有直接的关系。根据MOS文档Bug 12925089 – ORA-600 [ksxp_rm_check0] can occur on process startup [ID 12925089.8],当一个进程启动时除非了ORA-4031错误,就可能会导致ORA-600[ksxp_rm_check()]的错误。而当前是m000进程启动出现了ORA-4031错误。至于当前ASM的ORA-4031错误,是由于11.2.0.3默认的MEMORY_TARGET不足导致的。
这个错误影响的版本为11.2.0.2和11.2.0.3,Oracle在12.1.0.1中以及将要发布的11.2.0.4中解决了这个问题。其实要避免这个错误的根源是给ASM实例的MEMORY_TARGET分配足够的空间,至少应分配1536M以上。

Posted in BUG | Tagged , , , , , , , , | Leave a comment

11.2.0.3 ASM实例出现ORA-4031导致数据库归档失败

客户的11.2.0.3 RAC数据库出现了归档失败的情况,导致单个实例出现HANG死的状况。
检查错误信息发现:

Tue Jul 02 16:49:13 2013
ARC1: Error 19504 Creating archive log file TO '+DATA02'
ARCH: Archival stopped, error occurred. Will continue retrying
ORACLE Instance orcl1 - Archival Error
ORA-16038: log 14 SEQUENCE# 68244 cannot be archived
ORA-19504: failed TO CREATE file ""
ORA-00312: online log 14 thread 1: '+DATA02/orcl/onlinelog/group_14.264.792274883'
ORA-00312: online log 14 thread 1: '+DATA02/orcl/onlinelog/group_14.265.792274889'
Archiver process freed FROM errors. No longer stopped
Tue Jul 02 16:50:37 2013
ARC0: LGWR IS actively archiving destination LOG_ARCHIVE_DEST_3
ARCH: Archival stopped, error occurred. Will continue retrying
ORACLE Instance orcl1 - Archival Error
ORA-16014: log 14 SEQUENCE# 68244 NOT archived, no available destinations
ORA-00312: online log 14 thread 1: '+DATA02/orcl/onlinelog/group_14.264.792274883'
ORA-00312: online log 14 thread 1: '+DATA02/orcl/onlinelog/group_14.265.792274889'
ARC0: Archive log rejected (thread 1 SEQUENCE 68240) at host 'orclsh'
FAL[server, ARC0]: FAL archive failed, see trace file.
ARCH: FAL archive failed. Archiver continuing
ORACLE Instance orcl1 - Archival Error. Archiver continuing.

由于归档失败发生在ASM磁盘上,首先检查ASM磁盘空间以及DB_RECOVERY_FILE_DEST_SIZE,ASM磁盘空间是足够的,而且由于只有一个节点出现出现了无法归档的问题,也可以排除是空间不足造成的。确认两个节点的DB_RECOVERY_FILE_DEST_SIZE参数设置都是0,基本上可以判断问题和当前节点的ASM实例状态不正常有关。
检查ASM实例的错误信息:

Tue Jul 02 16:41:43 2013
Dumping diagnostic DATA IN directory=[cdmp_20130702164115], requested BY (instance=2, osid=2032294 (LMD0)), summary=[incident=165521].
Tue Jul 02 16:49:13 2013
Dumping diagnostic DATA IN directory=[cdmp_20130702164845], requested BY (instance=2, osid=2032294 (LMD0)), summary=[incident=165522].
Tue Jul 02 16:55:45 2013
Dumping diagnostic DATA IN directory=[cdmp_20130702165517], requested BY (instance=2, osid=2032294 (LMD0)), summary=[incident=165523].
Tue Jul 02 17:01:48 2013
Dumping diagnostic DATA IN directory=[cdmp_20130702170120], requested BY (instance=2, osid=2032294 (LMD0)), summary=[incident=165524].
Tue Jul 02 17:07:27 2013
Dumping diagnostic DATA IN directory=[cdmp_20130702170659], requested BY (instance=2, osid=2032294 (LMD0)), summary=[incident=165525].

当前节点ASM实例出现了的这个信息,说明报错发生在实例2上:

Tue Jul 02 18:29:55 2013
Errors IN file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_lmd0_2032294.trc  (incident=186256):
ORA-04031: unable TO allocate 3768 bytes OF shared memory ("shared pool","unknown object","sga heap(1,0)","ges enqueues")
USE ADRCI OR Support Workbench TO package the incident.
See Note 411.1 at My Oracle Support FOR error AND packaging details.
 Insufficient shared pool TO allocate a GES object (ospid 2032294)
Tue Jul 02 18:29:55 2013
Sweep [inc][186256]: completed
Tue Jul 02 18:36:49 2013
Errors IN file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_lmd0_2032294.trc  (incident=186257):
ORA-04031: unable TO allocate 3768 bytes OF shared memory ("shared pool","unknown object","sga heap(1,0)","ges enqueues")
USE ADRCI OR Support Workbench TO package the incident.
See Note 411.1 at My Oracle Support FOR error AND packaging details.
 Insufficient shared pool TO allocate a GES object (ospid 2032294)

果然实例2上的ASM出现了大量ORA-4031错误。检查ASM启动的参数配置:

Sat Aug 25 20:06:55 2012
NOTE: No asm libraries found IN the system
ERROR: -5(Duplicate disk DATA_DG01:ASM_DISK1)
ERROR: -5(Duplicate disk DATA_DG01:ASM_DISK2)
MEMORY_TARGET defaulting TO 411041792.
* instance_number obtained FROM CSS = 2, checking FOR the existence OF node 0... 
* node 0 does NOT exist. instance_number = 2 
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Private Interface 'en1' configured FROM GPnP FOR USE AS a private interconnect.
  [name='en1', TYPE=1, ip=169.254.78.6, mac=00-1a-64-bb-50-7d, net=169.254.0.0/16, mask=255.255.0.0, USE=haip:cluster_interconnect/62]
Public Interface 'en0' configured FROM GPnP FOR USE AS a public interface.
  [name='en0', TYPE=1, ip=10.1.16.35, mac=00-1a-64-bb-50-7c, net=10.1.16.32/27, mask=255.255.255.224, USE=public/1]
Picked latch-free SCN scheme 3
USING LOG_ARCHIVE_DEST_1 parameter DEFAULT VALUE AS /u01/app/11.2.0.3/grid/dbs/arch
Autotune OF undo retention IS turned ON. 
LICENSE_MAX_USERS = 0
SYS auditing IS disabled
NOTE: Volume support  enabled
Starting up:
Oracle DATABASE 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
WITH the REAL Application Clusters AND Automatic Storage Management options.
ORACLE_HOME = /u01/app/11.2.0.3/grid
System name:	AIX
Node name:	orcldb2
Release:	1
Version:	6
Machine:	00C94E064C00
USING parameter settings IN server-side pfile /u01/app/11.2.0.3/grid/dbs/init+ASM2.ora
System parameters WITH non-DEFAULT VALUES:
  large_pool_size          = 12M
  instance_type            = "asm"
  remote_login_passwordfile= "EXCLUSIVE"
  asm_diskstring           = "/dev/ocr_*"
  asm_diskstring           = "/dev/voting_*"
  asm_diskstring           = "/dev/asm_*"
  asm_diskgroups           = "DATA"
  asm_diskgroups           = "DATA_DG01"
  asm_diskgroups           = "SPFILE_DG"
  asm_power_limit          = 1
  diagnostic_dest          = "/u01/app/grid"
Cluster communication IS configured TO USE the following interface(s) FOR this instance
  169.254.78.6
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2

当前ASM实例使用默认的MEMORY_TARGET配置,分配大小大约是400M,根据Oracle的MOS文章:ASM & Shared Pool (ORA-4031) [ID 437924.1],在11.2.0.3中,Oracle增加了ASM实例所允许的默认进程数PROCESSES,但是默认的MEMORY_TARGET参数没有增加。
根据Oracle的建议,11.2.0.3的MEMORY_TARGET至少应该设置到1536M,而MEMORY_MAX_TARGET设置为4096M。

SQL> ALTER system SET memory_max_target=4096m scope=spfile;
SQL> ALTER system SET memory_target=1536m scope=spfile;

对于当前的情况,如果短时间内无法重启DB和ASM实例,可以在问题节点配置一个第二本地归档路径,设置目标路径为本地磁盘,从而避免归档无法完成而导致的实例HANG死。

Posted in BUG | Tagged , , , , , , , , | Leave a comment

ORA-7445(ptmax)错误

一个9.2.0.8上的bug。
虽然12c已经发布,但是仍然有部分客户的核心数据库还跑在9i环境上,有碰到了一个9208上的bug。

Thu May 16 14:18:48 2013
Errors IN file /oracle/oracle/admin/orcl/udump/orcl_ora_11922.trc:
ORA-07445: 出现异常: 核心转储 [00000001023A0F08] [SIGSEGV] [Address NOT mapped TO object] [0x000000010] [] []
Thu May 16 14:19:49 2013
Errors IN file /oracle/oracle/admin/orcl/udump/orcl_ora_12389.trc:
ORA-07445: 出现异常: 核心转储 [00000001023A0F08] [SIGSEGV] [Address NOT mapped TO object] [0x000000010] [] []

详细TRACE信息:

*** SESSION ID:(1298.47235) 2013-05-16 14:18:48.666
Exception signal: 11 (SIGSEGV), code: 1 (Address NOT mapped TO object), addr: 0x10, PC: [0x1023a0f08, 00000001023A0F08]
*** 2013-05-16 14:18:48.669
ksedmp: internal OR fatal error
ORA-07445: 出现异常: 核心转储 [00000001023A0F08] [SIGSEGV] [Address NOT mapped TO object] [0x000000010] [] []
CURRENT SQL statement FOR this SESSION:
BEGIN
  -- Call the procedure
  z_r_res_pk.g_num;
END;
----- Call Stack Trace -----
calling              CALL     entry                argument VALUES IN hex      
location             TYPE     point                (? means dubious VALUE)     
-------------------- -------- -------------------- ----------------------------
ksedmp()+328         CALL     ksedst()             00000000B ? 000000000 ?
                                                   000000000 ? 1033CF6C8 ?
                                                   00000003E ?
                                                   FFFFFFFF7FFF1188 ?
ssexhd()+676         CALL     ksedmp()             00010380C ? 10380C000 ?
                                                   10380C718 ? 103811000 ?
                                                   000102C00 ? 000000000 ?
__sighndlr()+12      PTR_CALL 0000000000000000     000103814 ?
                                                   FFFFFFFF7FFF8400 ?
                                                   103814000 ? 1038118E8 ?
                                                   000000000 ? 103814908 ?
call_user_handler()  CALL     __sighndlr()         00000000B ?
+992                                               FFFFFFFF7FFF8400 ?
                                                   FFFFFFFF7FFF8120 ?
                                                   10023F860 ? 000000000 ?
                                                   00000000A ?
sigacthandler()+104  CALL     call_user_handler()  FFFFFFFF7DB00200 ?
                                                   FFFFFFFF7DB00200 ?
                                                   FFFFFFFF7FFF8120 ?
                                                   00000000C ? 000000000 ?
                                                   000000000 ?
ptmak()+424          PTR_CALL 0000000000000000     000000000 ?
                                                   FFFFFFFF7FFF8400 ?
                                                   FFFFFFFF7FFF8120 ?
                                                   FFFFFFFF7DB00200 ?
                                                   000000000 ?
                                                   FFFFFFFF7DB3E000 ?
pdtidc()+4716        CALL     ptmak()              FFFFFFFF7FFF9300 ?
                                                   000000000 ? 000000004 ?
                                                   000000002 ? 000000000 ?
                                                   FFFFFFFF7CE61800 ?
pdlifu()+620         CALL     pdtidc()             000000000 ? 00002000C ?
                                                   000000001 ? 000000001 ?
                                                   102FBB000 ? 000102C00 ?
phpcmp()+1016        CALL     phpcog()             000000000 ?
                                                   FFFFFFFF7CE5C150 ?
                                                   00000000C ? 000000011 ?
                                                   FFFFFFFF7FFF9300 ?
                                                   F988B6748 ?
pcicog()+184         CALL     phpcmp()             FFFFFFFF7FFF9300 ?
                                                   F988B6748 ? 000000000 ?
                                                   000000004 ? 000000004 ?
                                                   000000000 ?
kkxcog()+288         CALL     pcicog()             FFFFFFFF7FFF9300 ?
                                                   F988B6748 ? 10380C49C ?
                                                   00000AB90 ? 000000001 ?
                                                   000000225 ?
opitca()+4796        CALL     kkxcog()             F988B6748 ? 3800231C0 ?
                                                   000380000 ? 000103813 ?
                                                   103813000 ? 103813F34 ?
rpiswu2()+384        PTR_CALL 0000000000000000     FFFFFFFF7FFF9A60 ?
                                                   000000001 ? 00000000A ?
                                                   000000001 ? 00000001A ?
                                                   F988B6748 ?
kkslod()+5196        CALL     rpiswu2()            F46434118 ? 00010380C ?
                                                   10380C940 ? 10380C978 ?
                                                   000000000 ? 10338B000 ?
kglobld()+696        PTR_CALL 0000000000000000     000000000 ?
                                                   FFFFFFFF7CE64658 ?
                                                   000000000 ? 10380C978 ?
                                                   F98D09B58 ? FD7970EF0 ?
kglobpn()+1284       CALL     kglobld()            000000001 ? 000000000 ?
                                                   F98ECBA88 ? 10380CAB8 ?
                                                   000000001 ? F988B6748 ?
kglpim()+236         CALL     kglobpn()            000000001 ? F922092E8 ?
                                                   000000001 ? 10390E848 ?
                                                   10390EA08 ? 10380DA10 ?
kglpin()+764         CALL     kglpim()             0000000FF ? 10380CAB8 ?
                                                   102DC74E8 ? 010010000 ?
                                                   F988B6748 ? F922092E8 ?
kksfbc()+6956        CALL     kglpin()             000010000 ?
                                                   FFFFFFFF7FFFB278 ?
                                                   F9100F9C8 ? 000000000 ?
                                                   FFFFFFFF7FFFAE28 ?
                                                   FFFFFFFF7FFFA950 ?
kkspsc0()+988        CALL     kksfbc()             000004000 ? 000004000 ?
                                                   000103800 ? 000000000 ?
                                                   000004000 ? 000000000 ?
opiosq0()+936        CALL     kkspsc0()            FFFFFFFF7CE60400 ?
                                                   000000024 ? 102FA3B10 ?
                                                   000000003 ? 000000004 ?
                                                   FFFFFFFF7FFFC5D0 ?
kpooprx()+204        CALL     opiosq0()            10104C000 ? 10380EC78 ?
                                                   000000042 ?
                                                   FFFFFFFF7FFFCC90 ?
                                                   10380EA84 ? 000000016 ?
kpoal8()+528         CALL     kpooprx()            FFFFFFFF7FFFEC2C ?
                                                   FFFFFFFF7FFFCC90 ?
                                                   000000042 ? 103814B90 ?
                                                   000000000 ? 000000024 ?
opiodr()+1720        PTR_CALL 0000000000000000     10380E000 ? 000000000 ?
                                                   FFFFFFFF7FFFEC28 ?
                                                   000000024 ? 000000000 ?
                                                   000000000 ?
ttcpip()+1564        PTR_CALL 0000000000000000     000102DD9 ? 00010380C ?
                                                   103814B98 ? 10380CAB8 ?
                                                   10380EFF8 ?
                                                   FFFFFFFF7FFFC5D0 ?
opitsk()+1156        CALL     ttcpip()             103814B90 ? 000000001 ?
                                                   FFFFFFFF7FFFEC28 ?
                                                   000000014 ?
                                                   FFFFFFFF7FFFD718 ?
                                                   FFFFFFFF7FFFD714 ?
opiino()+1504        CALL     opitsk()             000000000 ? 103814B20 ?
                                                   000000000 ? 000000000 ?
                                                   102E484C8 ? 10380EFE0 ?
opiodr()+1720        PTR_CALL 0000000000000000     000000000 ? 000000000 ?
                                                   000000000 ?
                                                   FFFFFFFF7FFFFAA0 ?
                                                   000000001 ? 000000001 ?
opidrv()+764         CALL     opiodr()             000102DD9 ? 00010380C ?
                                                   103814B98 ? 10380CAB8 ?
                                                   10380EFF8 ?
                                                   FFFFFFFF7FFFF5C0 ?
sou2o()+16           CALL     opidrv()             00010380C ? 000000004 ?
                                                   10380CAB8 ? 00000003C ?
                                                   10380CAB8 ? 10380C718 ?
main()+184           CALL     sou2o()              FFFFFFFF7FFFFAC0 ?
                                                   00000003C ? 000000004 ?
                                                   FFFFFFFF7FFFFAA0 ?
                                                   00003B000 ? 00003B030 ?
_start()+380         CALL     main()               000000002 ?
                                                   FFFFFFFF7FFFFC08 ?
                                                   FFFFFFFF7FFFFC20 ?
                                                   000000000 ?
                                                   FFFFFFFF7E100100 ?
                                                   FFFFFFFF7DB00200 ?
--------------------- Binary Stack Dump ---------------------

进一步检查TRACE,发现当前程序为DEVELOPER在进行DEBUG操作:

PROCESS STATE
-------------
Process global information:
     process: f474d5360, CALL: f95814378, xact: 0, curses: f46434118, usrses: f46434118
  ----------------------------------------
  SO: f474d5360, TYPE: 2, owner: 0, flag: INIT/-/-/0x00
  (process) Oracle pid=848, calls cur/top: f95814378/f95814378, flag: (0) -
            INT error: 0, CALL error: 0, sess error: 0, txn error 0
  (post info) LAST post received: 0 0 0
              LAST post received-location: No post
              LAST process TO post me: NONE
              LAST post sent: 0 0 0
              LAST post sent-location: No post
              LAST process posted BY me: NONE
    (latch info) wait_event=0 bits=0
    Process GROUP: DEFAULT, pseudo proc: f443c8d50
    O/S info: USER: oracle, term: UNKNOWN, ospid: 11922
    OSD pid info: Unix process pid: 11922, image: oracle@ysun7 (TNS V1-V3)
    ----------------------------------------
    SO: f46434118, TYPE: 4, owner: f474d5360, flag: INIT/-/-/0x00
    (SESSION) trans: 0, creator: f474d5360, flag: (41) USR/- BSY/-/-/-/-/-
              DID: 0001-0350-00106A26, short-term DID: 0000-0000-00000000
              txn branch: 0
              oct: 47, prv: 0, SQL: f98ecb960, psql: fdcc324f8, USER: 65/GZMPCMIS
    O/S info: USER: Administrator, term: S-E-SERVE, ospid: 5944:4000, machine: WORKGROUP\S-E-SERVE
              program: plsqldev.exe
    application name: PL/SQL Developer, hash VALUE=1190136663
    action name: Test Window - Script FOR procedu, hash VALUE=3367543201
    LAST wait FOR 'SQL*Net message from client' blocking sess=0x0 seq=34 wait_time=8952
                driver id=54435000, #bytes=1, =0
    TEMPORARY object counter: 0
      ----------------------------------------
      SO: f922092e8, TYPE: 52, owner: f46434118, flag: INIT/-/-/0x00
      LIBRARY OBJECT PIN: pin=f922092e8 handle=f988b6748 mode=X LOCK=f9100f9c8
      USER=f46434118 SESSION=f46434118 COUNT=0 mask=00ff SAVEPOINT=195 flags=[00]
      ----------------------------------------
.
.
.
      SO: f6a2eaba8, TYPE: 51, owner: f46434118, flag: INIT/-/-/0x00
      LIBRARY OBJECT LOCK: LOCK=f6a2eaba8 handle=f8dbc9320 mode=N
      CALL pin=faf005300 SESSION pin=0
      htl=f6a2eac18[fa6f5b618,fa6f5b618] htb=fa6f5b618
      USER=f46434118 SESSION=f46434118 COUNT=1 flags=[00] SAVEPOINT=186
      LIBRARY OBJECT HANDLE: handle=f8dbc9320
      name=SYS.DBMS_DEBUG 
      hash=2c51e752 TIMESTAMP=10-04-2010 00:55:36
      namespace=BODY/TYBD flags=KGHP/TIM/SML/[02000000]
      kkkk-dddd-llll=0000-0011-0011 LOCK=N pin=S latch#=21
      lwt=f8dbc9350[f8dbc9350,f8dbc9350] ltm=f8dbc9360[f8dbc9360,f8dbc9360]
      pwt=f8dbc9380[f8dbc9380,f8dbc9380] ptm=f8dbc9410[f8dbc9410,f8dbc9410]
      REF=f8dbc9330[f8dbc9330, f8dbc9330] lnd=f8dbc9428[f8dbc9428,f8dbc9428]
        LIBRARY OBJECT: object=fb741e720
        TYPE=PKBD flags=EXS/LOC[0005] pflags=/SWR [08] STATUS=VALD LOAD=0
        DEPENDENCIES: COUNT=10 SIZE=16
        ACCESSES: COUNT=9 SIZE=16
        DATA BLOCKS:
        DATA#     heap  pointer STATUS pins CHANGE
        ----- -------- -------- ------ ---- ------
            0 f94627a40 fb741e8c8 I/P/A     0 NONE  
            4 fb741e840 fb0c81e68 I/P/A     1 NONE  
      ----------------------------------------
      SO: f85863c38, TYPE: 51, owner: f46434118, flag: INIT/-/-/0x00
      LIBRARY OBJECT LOCK: LOCK=f85863c38 handle=fcbfb58d0 mode=N
      CALL pin=f83bafa18 SESSION pin=0
      htl=f85863ca8[fa6f5b608,fa6f5b608] htb=fa6f5b608
      USER=f46434118 SESSION=f46434118 COUNT=1 flags=[00] SAVEPOINT=186
      LIBRARY OBJECT HANDLE: handle=fcbfb58d0
      name=SYS.DBMS_DEBUG 
      hash=2c51e751 TIMESTAMP=10-04-2010 00:55:32
      namespace=TABL/PRCD/TYPE flags=KGHP/TIM/SML/[02000000]
      kkkk-dddd-llll=0000-0215-0215 LOCK=N pin=S latch#=22
      lwt=fcbfb5900[fcbfb5900,fcbfb5900] ltm=fcbfb5910[fcbfb5910,fcbfb5910]
      pwt=fcbfb5930[fcbfb5930,fcbfb5930] ptm=fcbfb59c0[fcbfb59c0,fcbfb59c0]
      REF=fcbfb58e0[fcbfb58e0, fcbfb58e0] lnd=fcbfb59d8[fa81f4ca0,f9a3a9ef0]
        LIBRARY OBJECT: object=fb62f8710
        TYPE=PCKG flags=EXS/LOC[0005] pflags= [00] STATUS=VALD LOAD=0
        DEPENDENCIES: COUNT=2 SIZE=16
        ACCESSES: COUNT=1 SIZE=16
        DATA BLOCKS:
        DATA#     heap  pointer STATUS pins CHANGE
        ----- -------- -------- ------ ---- ------
            0 fafa8d3f8 fb62f88b8 I/P/A     0 NONE  
            2 fb62f8830 fb68a95f8 I/-/A     0 NONE  
            4 fb6f38c50 fd2b7d280 I/P/A     1 NONE  
            9 fb6f38ba0 fb6a0f450 I/-/A     0 NONE  
      ----------------------------------------

根据MOS文档ORA-07445 [ptmak()+107] During PL/SQL Debug [ID 726611.1],这个问题应该是DEBUG模式下编译对象导致的。
虽然BUG的信息是10.2上的,但是不排除9.2.0.8上存在相同的问题,不要使用PL/SQL的DEBUG功能,或者禁止过程的DEBUG选项,可以避免这个错误的产生。

Posted in BUG | Tagged , , , | Leave a comment

新特性手册12cr1

赶赶时髦,开始看Oracle 12C的新特性。
其实早在去年OOW上,就介绍了大量的12C新特性,这样导致我对12C有点审美疲劳。当然OOW中介绍的只是新特性中最值得称道的,更多重要的新特性还是要看新特性文档。而事实上最全面的新特性介绍,分布到各个的专门的文档中,比如管理员手册、性能优化手册、工具手册、升级手册……
最近手头的事情太多,因此新特性的测试文章可能会慢一点,不过没有关系,现在网上已经到处都是12C的文章了。
12C的在线阅读地址:http://www.oracle.com/pls/db121/to_toc?pathname=server.121/e17906/toc.htm,还没有看过的DBA赶紧去补课了。

Posted in BOOKS | Tagged , , | Leave a comment

Timesten问题诊断手册总结

感觉Timesten中很多内容和思路与Oracle差别不大。
之所以会有这个感觉,主要源于两个方面。一是文档中确实介绍了大量Oracle相关的内容,尤其是诊断Timesten作为Oracle数据库CACHE部分,很多引发Timesten的错误实际上就是Oracle中的错误。二是Oracle在收购了Timesten之后,利用Oracle关系型数据库的优势,将很多Oracle的特性也放到了Timesten中,因此看一些概念以及诊断的手段会有似曾相识的感觉。
不过Timesten的锁机制显然没有得到Oracle数据库的真谛,而和其他大部分数据库一样。比如在Timesten中一个聚集物化视图,可能导致基表所有记录被锁定,而Oracle则通过ON COMMIT时刻的刷新,避免了物化视图扩大锁范围的问题。

Posted in BOOKS | Leave a comment

ORA-600(ktsfbfmt:objdchk_kcbnew_3)错误

客户的11.2.0.3 RAC环境出现ORA-600[ktsfbfmt:objdchk_kcbnew_3]错误。
错误信息为:

Sat May 18 01:37:23 2013
Errors IN file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_j009_13090.trc  (incident=1002558):
ORA-00600: 内部错误代码, 参数: [ktsfbfmt:objdchk_kcbnew_3], [3], [5370496], [0], [], [], [], [], [], [], [], []
Incident details IN: /oracle/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_1002558/orcl1_j009_13090_i1002558.trc
USE ADRCI OR Support Workbench TO package the incident.
See Note 411.1 at My Oracle Support FOR error AND packaging details.

详细TRACE信息如下:

*** 2013-05-18 01:43:46.304
*** SESSION ID:(956.39309) 2013-05-18 01:43:46.304
*** CLIENT ID:() 2013-05-18 01:43:46.304
*** SERVICE NAME:(SYS$USERS) 2013-05-18 01:43:46.304
*** MODULE NAME:(DBMS_SCHEDULER) 2013-05-18 01:43:46.304
*** ACTION NAME:(G_JOB_MON) 2013-05-18 01:43:46.304
 
Dump continued FROM file: /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_j009_13090.trc
ORA-00600: 内部错误代码, 参数: [ktsfbfmt:objdchk_kcbnew_3], [3], [5370496], [0], [], [], [], [], [], [], [], []
ORA-06512: 在 "C_T.T_P_T_M_IN", line 198
ORA-06512: 在 "C_T.P_T_MON", line 3
ORA-06512: 在 line 1
 
========= Dump FOR incident 1002559 (ORA 600 [ORA-00600: 内部错误代码, 参数: [ktsfbfmt:objdchk_kcbnew_3], [3], [5370496], [0], [], [], [], [], [], [], [], []
ORA-06]) ========
 
*** 2013-05-18 01:43:46.304
dbkedDefDump(): Starting incident DEFAULT dumps (flags=0x2, level=3, mask=0x0)
----- SQL Statement (None) -----
CURRENT SQL information unavailable - no cursor.
 
----- Call Stack Trace -----
calling              CALL     entry                argument VALUES IN hex      
location             TYPE     point                (? means dubious VALUE)     
-------------------- -------- -------------------- ----------------------------
skdstdst()+36        CALL     kgdsdst()            000000000 ? 000000000 ?
                                                   7FFF70CA4588 ? 000000001 ?
                                                   000000001 ? 000000002 ?
ksedst1()+98         CALL     skdstdst()           000000000 ? 000000000 ?
                                                   7FFF70CA4588 ? 000000001 ?
                                                   000000000 ? 000000002 ?
ksedst()+34          CALL     ksedst1()            000000000 ? 000000001 ?
                                                   7FFF70CA4588 ? 000000001 ?
                                                   000000000 ? 000000002 ?
dbkedDefDump()+2741  CALL     ksedst()             000000000 ? 000000001 ?
                                                   7FFF70CA4588 ? 000000001 ?
                                                   000000000 ? 000000002 ?
ksedmp()+36          CALL     dbkedDefDump()       000000003 ? 000000002 ?
                                                   7FFF70CA4588 ? 000000001 ?
                                                   000000000 ? 000000002 ?
ksfdmp()+64          CALL     ksedmp()             000000003 ? 000000002 ?
                                                   7FFF70CA4588 ? 000000001 ?
                                                   000000000 ? 000000002 ?
dbgexPhaseII()+1764  CALL     ksfdmp()             000000003 ? 000000002 ?
                                                   7FFF70CA4588 ? 000000001 ?
                                                   000000000 ? 000000002 ?
dbgexProcessError()  CALL     dbgexPhaseII()       2B1258896710 ? 2B1258C4C980 ?
+2675                                              7FFF70CB0900 ? 000000001 ?
                                                   000000000 ? 000000002 ?
dbgeExecuteForError  CALL     dbgexProcessError()  2B1258896710 ? 2B1258C4C980 ?
()+83                                              000000001 ? 000000000 ?
                                                   100000000 ? 000000002 ?
dbgePostErrorKGE()+  CALL     dbgeExecuteForError  2B1258896710 ? 2B1258C4C980 ?
2138                          ()                   000000001 ? 000000001 ?
                                                   000000000 ? 000000002 ?
dbkePostKGE_kgsf()+  CALL     dbgePostErrorKGE()   00BAF3FA0 ? 2B1258E7B7F8 ?
66                                                 000000258 ? 2B1258C4C980 ?
                                                   100000000 ? 000000002 ?
kgeade()+351         CALL     dbkePostKGE_kgsf()   00BAF3FA0 ? 2B1258E7B7F8 ?
                                                   000000258 ? 2B1258C4C980 ?
                                                   100000000 ? 000000002 ?
kgereml()+83         CALL     kgeade()             00BAF3FA0 ? 00BAF4150 ?
                                                   2B1258E7B7F8 ? 000000258 ?
                                                   100000000 ? 000000002 ?
jslvGetOCIError()+3  CALL     kgereml()            00BAF3FA0 ? 2B1258E7B7F8 ?
31                                                 000000258 ? 000000000 ?
                                                   100000000 ? 000000002 ?
jslvec_execcb()+226  CALL     jslvGetOCIError()    00BAF3FA0 ? 2B1258E7B7F8 ?
8                                                  000000258 ? 000000000 ?
                                                   100000000 ? 000000002 ?
jslvswu()+56         CALL     jslvec_execcb()      7FFF70CB2EAC ? 2B1258E7B7F8 ?
                                                   2B1258D0D8D8 ? 000000000 ?
                                                   100000000 ? 000000002 ?
jslve_execute0()+22  CALL     jslvswu()            000000053 ? 100000000 ?
48                                                 000000002 ? 000000000 ?
                                                   100000000 ? 000000002 ?
jslve_execute()+327  CALL     jslve_execute0()     7FFF70CB4DC4 ? 0000955DF ?
                                                   000000002 ? 7FFF70CB4DB0 ?
                                                   000000000 ? 28FFFFFFFF ?
rpiswu2()+1618       CALL     jslve_execute()      7FFF70CB4C60 ? 000000002 ?
                                                   7FFF70CB4DC4 ? 0000955DF ?
                                                   7FFF70CB4DB0 ? 28FFFFFFFF ?
kkjex1e()+374        CALL     rpiswu2()            7142C06C08 ? 000000000 ?
                                                   7FFF70CB4C80 ? 000000002 ?
                                                   7FFF70CB4CA0 ? 000000000 ?
kkjsexe()+706        CALL     kkjex1e()            7FFF70CB4DC4 ? 0000955DF ?
                                                   000000002 ? 7FFF70CB4DB0 ?
                                                   723E996390 ? 7FFF70CB4D18 ?
kkjrdp()+689         CALL     kkjsexe()            7FFF70CB4DC4 ? 0000955DF ?
                                                   000000002 ? 7FFF70CB4DB0 ?
                                                   723E996390 ? 7FFF70CB4D18 ?
opirip()+953         CALL     kkjrdp()             7FFF70CB4DC4 ? 0000955DF ?
                                                   000000002 ? 7FFF70CB4DB0 ?
                                                   723E996390 ? 7FFF70CB4D18 ?
opidrv()+598         CALL     opirip()             000000032 ? 000000004 ?
                                                   7FFF70CB6538 ? 7FFF70CB4DB0 ?
                                                   723E996390 ? 7FFF70CB4D18 ?
sou2o()+98           CALL     opidrv()             000000032 ? 000000004 ?
                                                   7FFF70CB6538 ? 7FFF70CB4DB0 ?
                                                   723E996390 ? 7FFF70CB4D18 ?
opimai_real()+261    CALL     sou2o()              7FFF70CB6510 ? 000000032 ?
                                                   000000004 ? 7FFF70CB6538 ?
                                                   723E996390 ? 7FFF70CB4D18 ?
ssthrdmain()+252     CALL     opimai_real()        000000000 ? 7FFF70CB6700 ?
                                                   000000004 ? 7FFF70CB6538 ?
                                                   723E996390 ? 7FFF70CB4D18 ?
main()+196           CALL     ssthrdmain()         000000003 ? 7FFF70CB6700 ?
                                                   000000001 ? 000000000 ?
                                                   723E996390 ? 7FFF70CB4D18 ?
__libc_start_main()  CALL     main()               000000003 ? 7FFF70CB68A0 ?
+244                                               000000001 ? 000000000 ?
                                                   723E996390 ? 7FFF70CB4D18 ?
_start()+36          CALL     __libc_start_main()  000A0AF38 ? 000000001 ?
                                                   7FFF70CB6898 ? 000000000 ?
                                                   723E996390 ? 000000003 ?
--------------------- Binary Stack Dump ---------------------

根据MOS文档,关于这个ORA-600错误的已知bug只有Bug 12540788 – ORA-600 [ktsfbfmt:objdchk_kcbnew_3] / memory corruption fetching across commit from a TEMPORARY table [ID 12540788.8],而客户的环境中确实大量的使用临时表,尤其是通过JOB进行批处理计算的时候。
唯一的疑点是,当前的数据库版本就是11.2.0.3,而这个错误影响的版本是11.2.0.2,在11.2.0.3中应该被修复。这已经是最近碰到的第三个疑似在11.2.0.3上修复的bug再次重现了。

Posted in BUG | Tagged , , | Leave a comment

ORA-600(kcbchg1_12)和ORA-600(kdifind:kcbget_24)错误

客户11.2.0.3 RAC for Aix上出现的错误。
在客户环境中,上述两个错误都出现了:
Sun May 19 01:20:41 2013
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_j003_32468.trc (incident=1002135):
ORA-00600: 内部错误代码, 参数: [kdifind:kcbget_24], [5296768], [1], [0], [1], [], [], [], [], [], [], []
ORA-06512: 在 “CP_TMS.TMP_P_TMS_MAIL_AGENCY_OUT”, line 386
ORA-06512: 在 line 2
Incident details in: /oracle/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_1002135/orcl1_j003_32468_i1002135.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
.
.
.
Thu May 23 05:46:23 2013
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_j020_622.trc (incident=1006416):
ORA-00600: 内部错误代码, 参数: [kcbchg1_12], [], [], [], [], [], [], [], [], [], [], []
ORA-06512: 在 “CP_TMS.TMP_P_TMS_MAIL_AGENCY_OUT”, line 384
ORA-06512: 在 line 2
Incident details in: /oracle/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_1006416/orcl1_j020_622_i1006416.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
可以看到,两个错误都发生在JOB调用过程中,且错误发生在同一个过程上。根据MOS文档,Bug 13464002 – ORA-600 [kcbchg1_12] or ORA-600 [kdifind:kcbget_24] [ID 13464002.8],Oracle在尝试进行非连续性的多块读时可能碰到这个错误。
确认影响的版本包括11.1.0.7、11.2.0.1、11.2.0.2和11.2.0.3,Oracle在11.2.0.3.4、11.2.0.4和12.1中修正了这个bug。此外还可以通过设置隐含参数“_db_file_no_contig_mblock_read_count”=0来避免这个错误的产生。

Posted in BUG | Tagged , , , | Leave a comment