统计信息收集出现DFS等待导致实例HANG死

客户10.2.0.4 RAC环境,出现大量的library cache lock和cursor: pin S wait on X等待,经分析是由于统计信息收集僵死导致的。
数据库在8点到9点期间,数据库两个节点都存在明显的cursor: pin S wait on X和library cache lock的等待:

Event

Waits

Time(s)

Avg   Wait(ms)

%   Total Call Time

Wait   Class

cursor:   pin S wait on X

1,573,056

30,651

19

146.2

Concurrency

library   cache lock

31,757

7,009

221

33.4

Concurrency

CPU   time

6,416

30.6

DFS   lock handle

12,381

2,979

241

14.2

Other

latch:   library cache

1,646

1,974

1,199

9.4

Concurrency

Event

Waits

Time(s)

Avg   Wait(ms)

%   Total Call Time

Wait   Class

cursor:   pin S wait on X

17,586,451

342,437

19

812.1

Concurrency

library   cache lock

63,657

30,153

474

71.5

Concurrency

CPU   time

3,820

9.1

db   file sequential read

241,761

1,094

5

2.6

User   I/O

inactive   session

1,105

1,090

986

2.6

Other

两个节点的等待现象基本一致,而节点1上还存在明显的DFS lock handle等待事件。
通过分析ASH信息,发现library cache lock和cursor: pin S wait on X等待基本上都是6点之后才开始出现:

SQL> SELECT trunc(sample_time, 'hh24') TIME, COUNT(*)
  2  FROM WRH$_ACTIVE_SESSION_HISTORY ash, wrh$_event_name en
  3  WHERE ash.event_id = en.event_id
  4  AND sample_time >= to_timestamp('20130703', 'yyyymmdd')
  5  AND event_name IN ('cursor: pin S wait on X', 'library cache lock')
  6  GROUP BY trunc(sample_time, 'hh24')
  7  ORDER BY 1;
TIME                  COUNT(*)
------------------- ----------
2013-07-03 00:00:00          4
2013-07-03 01:00:00          8
2013-07-03 02:00:00          3
2013-07-03 03:00:00          8
2013-07-03 04:00:00          9
2013-07-03 05:00:00         14
2013-07-03 06:00:00        348
2013-07-03 07:00:00       2138
2013-07-03 08:00:00      41576
2013-07-03 09:00:00      50108
2013-07-03 10:00:00       2452

而观察出现cursor: pin S wait on X和library cache lock等待会话的BLOCKING SESSION,发现大部分处于等待的会话被会话37锁定:

SQL> SELECT to_char(sample_time, 'hh24:mi:ss.ff') TIME, session_id sid, 
  2  decode(en.event_name, 'cursor: pin S wait on X', to_number(substr(ltrim(to_char(p2, '0xxxxxxxxxxxxxxx')), 1, 8), 'xxxxxxxx'), blocking_session) b_sid, 
  3  en.event_name  
  4  FROM WRH$_ACTIVE_SESSION_HISTORY ash, wrh$_event_name en
  5  WHERE ash.event_id = en.event_id(+)
  6  AND instance_number = 1
  7  AND sample_time >= to_timestamp('201307030800', 'yyyymmddhh24mi')
  8  AND sample_time <= to_timestamp('201307030801', 'yyyymmddhh24mi')
  9  ORDER BY 1;
TIME                      SID      B_SID EVENT_NAME
------------------ ---------- ---------- ------------------------------
08:00:09.569              457 4294967291 latch: library cache
08:00:09.569               52        981 cursor: pin S wait ON X
08:00:09.569              168         73 library cache LOCK
08:00:09.569             1763 4294967291 SQL*Net message FROM client
08:00:09.569              123 4294967295 kksfbc child completion
08:00:09.569              476 4294967291 kst: async disk IO
08:00:09.569              500 4294967292 db file sequential READ
08:00:09.569              327 4294967291 SQL*Net message TO client
08:00:09.569              233 4294967291 db file scattered READ
08:00:09.569               73         73 DFS LOCK handle
08:00:09.569             1889 4294967291 gc CURRENT block 2-way
08:00:09.569             1969        981 cursor: pin S wait ON X
08:00:09.569              857        981 cursor: pin S wait ON X
08:00:09.569              812        168 cursor: pin S wait ON X
08:00:09.569              981         73 library cache LOCK
08:00:19.669               52        981 cursor: pin S wait ON X
08:00:19.669              812        168 cursor: pin S wait ON X
08:00:19.669              857        981 cursor: pin S wait ON X
08:00:19.669             1969        981 cursor: pin S wait ON X
08:00:19.669              233 4294967291 db file scattered READ
08:00:19.669              500 4294967292 db file sequential READ
08:00:19.669             1763 4294967291 db file sequential READ
08:00:19.669              476 4294967291 kst: async disk IO
08:00:19.669              457 4294967291 kksfbc child completion
08:00:19.669               73         73 DFS LOCK handle
08:00:19.669              654 4294967291 SQL*Net message TO client
08:00:19.669              168         73 library cache LOCK
08:00:19.669              981         73 library cache LOCK
08:00:29.769             1969        981 cursor: pin S wait ON X
08:00:29.769               52        981 cursor: pin S wait ON X
08:00:29.769              457 4294967291 kksfbc child completion
08:00:29.769              981         73 library cache LOCK
08:00:29.769              168         73 library cache LOCK
08:00:29.769              500 4294967292 db file sequential READ
08:00:29.769               73         73 DFS LOCK handle
08:00:29.769              654 4294967291 SQL*Net message TO client
08:00:29.769              327 4294967291 SQL*Net message TO client
08:00:29.769             1763 4294967291 db file sequential READ
08:00:29.769              857        981 cursor: pin S wait ON X
08:00:29.769              123 4294967295 latch: library cache
08:00:29.769              196 4294967295 latch: library cache
08:00:29.769              233 4294967295 latch: library cache
08:00:29.769              812        168 cursor: pin S wait ON X
08:00:29.769              476 4294967291 kst: async disk IO
08:00:39.879              476 4294967291 kst: async disk IO
08:00:39.879              658 4294967291 db file sequential READ
08:00:39.879              500 4294967292 db file sequential READ
08:00:39.879             1969        981 cursor: pin S wait ON X
08:00:39.879              857        981 cursor: pin S wait ON X
08:00:39.879              812        168 cursor: pin S wait ON X
08:00:39.879               52        981 cursor: pin S wait ON X
08:00:39.879              981         73 library cache LOCK
08:00:39.879              168         73 library cache LOCK
08:00:39.879             1849 4294967291 SQL*Net message TO client
08:00:39.879              327 4294967291 SQL*Net message TO client
08:00:39.879              235 4294967291 rdbms ipc reply
08:00:39.879               19 4294967291 rdbms ipc reply
08:00:39.879               73         73 DFS LOCK handle
08:00:39.879              123 4294967294 enq: WF - contention
08:00:49.987              476 4294967291 kst: async disk IO
08:00:49.987              500 4294967292 db file sequential READ
08:00:49.987              382 4294967291 gc cr block busy
08:00:49.987             1969        981 cursor: pin S wait ON X
08:00:49.987              857        981 cursor: pin S wait ON X
08:00:49.987              812        168 cursor: pin S wait ON X
08:00:49.987              235 4294967295 kksfbc child completion
08:00:49.987              658 4294967295 latch: library cache
08:00:49.987               19 4294967291 latch: library cache
08:00:49.987              981         73 library cache LOCK
08:00:49.987              168         73 library cache LOCK
08:00:49.987             1889 4294967291 SQL*Net message FROM client
08:00:49.987               73         73 DFS LOCK handle
08:00:49.987               52        981 cursor: pin S wait ON X

可以判断,导致数据库实例HANG死的原因是因为SID为73的进程:

SQL> SELECT to_char(sample_time, 'hh24:mi:ss'), program, action, event_name 
  2  FROM wrh$_active_session_history ash, wrh$_event_name en
  3  WHERE session_id = 73
  4  AND ash.event_id = en.event_id(+)
  5  AND sample_time >= to_date('201307030550', 'yyyymmddhh24mi')
  6  AND sample_time <= to_date('201307030601', 'yyyymmddhh24mi')
  7  ORDER BY 1;
TO_CHAR( PROGRAM                                  ACTION                         EVENT_NAME
-------- ---------------------------------------- ------------------------------ ---------------------------
05:50:07 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               db file sequential READ
05:50:17 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               db file sequential READ
05:50:27 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:50:37 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:50:47 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               gc cr block 2-way
05:50:57 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:51:08 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:51:18 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               db file scattered READ
05:51:28 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               gc cr multi block request
05:51:38 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               direct path WRITE temp
05:51:48 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:51:58 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:52:08 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:52:18 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               db file scattered READ
05:52:28 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               db file sequential READ
05:52:38 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:52:49 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:52:59 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               db file scattered READ
05:53:09 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:53:19 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               db file scattered READ
05:53:29 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:53:39 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:53:49 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               gc CURRENT GRANT 2-way
05:53:59 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:54:09 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:54:20 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:54:30 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:54:40 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:54:50 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:55:00 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:55:10 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               db file sequential READ
05:55:20 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:55:30 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:55:40 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:55:51 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:56:01 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:56:11 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:56:21 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:56:31 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:56:41 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:56:51 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               gc CURRENT GRANT 2-way
05:57:01 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               library cache pin
05:57:11 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               DFS LOCK handle
05:57:21 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               DFS LOCK handle
05:57:32 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:57:42 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:57:52 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               DFS LOCK handle
05:58:02 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               db file sequential READ
05:58:12 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               db file sequential READ
05:58:22 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:58:32 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:58:42 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:58:52 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:59:03 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:59:13 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:59:23 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:59:33 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:59:43 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               db file sequential READ
05:59:53 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
06:00:03 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               DFS LOCK handle
06:00:13 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
06:00:23 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               DFS LOCK handle
06:00:34 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               DFS LOCK handle
06:00:44 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               DFS LOCK handle
06:00:54 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               DFS LOCK handle

根据MOS文档,Bug 6011045 – DBMS_STATS causes deadlock between ‘cursor: pin S wait on X’ and ‘library cache lock’ [ID 6011045.8],这个问题是RAC环境中Oracle收集统计信息的BUG,在自动收集数据字典信息时,可能会出现进程HANG死,并导致大量的library cache lock和cursor: pin S wait on X等待,此外还可能出现DFS lock handle以及row cache lock的等待。
这个BUG确认影响版本包括10.2.0.4和10.2.0.5,这个BUG Oracle在10.2.0.5.5 PSU中进行了修复。

This entry was posted in BUG and tagged , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *