ORA-600(kjbldrmrpstp:key)错误

客户10.2.0.3 RAC环境启动后碰到DRM问题导致实例崩溃。
错误信息为:

Mon Jan 19 12:42:58 2009
DATABASE Characterset IS ZHS16GBK
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started WITH pid=28, OS id=4496
Mon Jan 19 12:43:01 2009
Completed: ALTER DATABASE OPEN
Mon Jan 19 12:43:32 2009
Thread 2 advanced TO log SEQUENCE 2081
  CURRENT log# 3 seq# 2081 mem# 0: +DISKGROUP1/orcl1/onlinelog/group_3.258.628169029
Mon Jan 19 12:53:36 2009
Thread 2 advanced TO log SEQUENCE 2082
  CURRENT log# 17 seq# 2082 mem# 0: +DISKGROUP1/orcl1/onlinelog/group_17.289.628705767
Mon Jan 19 16:09:17 2009
Thread 2 advanced TO log SEQUENCE 2083
  CURRENT log# 18 seq# 2083 mem# 0: +DISKGROUP1/orcl1/onlinelog/group_18.290.628705771
Mon Jan 19 19:55:50 2009
Thread 2 advanced TO log SEQUENCE 2084
  CURRENT log# 19 seq# 2084 mem# 0: +DISKGROUP1/orcl1/onlinelog/group_19.291.628705777
Mon Jan 19 20:01:02 2009
Errors IN file /u01/app/oracle/admin/ORCL1/bdump/orcl12_lmon_2696.trc:
ORA-00600: internal error code, arguments: [kjbldrmrpst:pkey], [2026036], [786432], [7211917], [7220109], [], [], []
Mon Jan 19 20:01:03 2009
Errors IN file /u01/app/oracle/admin/ORCL1/bdump/orcl12_lmon_2696.trc:
ORA-00600: internal error code, arguments: [kjbldrmrpst:pkey], [2026036], [786432], [7211917], [7220109], [], [], []
Mon Jan 19 20:01:03 2009
LMON: terminating instance due TO error 481
Mon Jan 19 20:01:03 2009
Trace dumping IS performing id=[cdmp_20090119200103]
Mon Jan 19 20:01:03 2009
Shutting down instance (abort)
License high water mark = 10
Mon Jan 19 20:01:07 2009
Instance TERMINATED BY LMON, pid = 2696
Mon Jan 19 20:01:08 2009
Instance TERMINATED BY USER, pid = 14328

从错误信息上已经可以看出,这个问题和DRM有关。应该是实例重启后,有些对象的属主进行了重新分配,产生了DRM,并引发了这个bug。
Oracle类似的BUG很多,导致了600错误的第一个函数名称也都比较接近。
根据MOS文档ORA-600 [kjbldrmrpst:pkey] [ID 1489014.1],这个问题最接近的bug描述为Bug 14409183 ORA-600 [kjblpkeydrmqscchk:pkey] or similar / session hangs on “gc buffer busy acquire”,虽然后者这篇文档主要发生在11.2上,但是不排除10.2上有同样的问题。何况当前的问题显而易见是DRM问题导致的。
MOS给出的bug信息不是特别相符,且升级到11.2.0.3.4也不是特别靠谱的事情,因此禁止DRM还是目前这个问题的最佳接近方案。
对于无法将整个集群关闭重启的环境,下面的调高阈值的方式最佳接近方案:

SQL> ALTER system SET "_gc_affinity_limit" = 1000000 scope = BOTH;
System altered.
SQL> ALTER system SET "_gc_affinity_minimum"=10000000 scope = BOTH;
System altered.
This entry was posted in BUG and tagged , , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *