ORA-600(3689)错误

这是一个10.2.0.1到10.2.0.4上都存在的bug。
错误发生在DG的STANDBY数据库中:

Fri Oct 19 12:01:16 2012
Errors IN file /DATA/oracle/admin/vodapp/bdump/vodapp_mrp0_21304.trc:
ORA-00600: internal error code, arguments: [3689], [33], [], [], [], [], [], []
Errors WITH log /baceldba/archive/1_30374_730819916.dbf
MRP0: Background Media Recovery TERMINATED WITH error 600

导致这个错误的原因很简单,在主库执行了ALTER TABLESPACE DROP DATAFILE后,并不会导致DATAFILE的被删除,后随后在增加新的数据文件时,就可能导致备库出现ORA-600[3689]的错误。详细描述可参考文档Bug 5623467 – Corrupt redo from ALTER TABLESPACE DROP DATAFILE [ID 5623467.8]。
这个bug在10.2.0.5和11.1.0.6中被FIXED。而Oracle给出的解决方案是避免使用ALTER TABLESPACE DROP DATAFILE语句。虽然这个语法本身使用概率很低,但是一旦出现了这个错误,Oracle并没有提供任何有价值的方案。
将问题表空间的最新备份拷贝到备库,然后进行恢复应该可以避免这个错误对备库的影响。

Posted in BUG | Tagged , , , , , | Leave a comment

ORA-7445(kgskhighthreshold)错误

10.2.0.4 RAC环境的ASM实例出现这个ORA-7445错误。
错误信息为:

Fri DEC  3 19:49:49 2010
Errors IN file /opt/app/oracle/admin/+ASM/bdump/+asm2_lck0_19876.trc:
Fri DEC  3 19:50:44 2010
Errors IN file /opt/app/oracle/admin/+ASM/bdump/+asm2_ckpt_19863.trc:
ORA-07445: exception encountered: core dump [kgskhighthreshold()+32] [SIGSEGV] [Address NOT mapped TO object] [0x0000001F6] [] []
Fri DEC  3 19:51:00 2010
Trace dumping IS performing id=[cdmp_20101203195100]
Fri DEC  3 19:51:27 2010
ospid 19851: network interface WITH IP address 10.10.10.2 IS now running
Fri DEC  3 19:51:42 2010
Errors IN file /opt/app/oracle/admin/+ASM/bdump/+asm2_pmon_19843.trc:
ORA-00469: CKPT process TERMINATED WITH error
Fri DEC  3 19:51:42 2010
Errors IN file /opt/app/oracle/admin/+ASM/bdump/+asm2_lmon_19849.trc:
ORA-00469: CKPT process TERMINATED WITH error
Fri DEC  3 19:51:43 2010
PMON: terminating instance due TO error 469

显然这个ORA-7445的错误最终导致了实例的崩溃,而检查对应的CKPT进程,确发现,导致问题的原因是ORADEBUG命令:

*** 2010-12-03 19:49:49.146
*** SERVICE NAME:() 2010-12-03 19:49:47.380
*** SESSION ID:(36.1) 2010-12-03 19:49:47.380
Received ORADEBUG command 'dump errorstack 1' FROM process Unix process pid: 19845, image: 
*** 2010-12-03 19:49:49.164
ksedmp: internal OR fatal error
----- Call Stack Trace -----
calling              CALL     entry                argument VALUES IN hex      
location             TYPE     point                (? means dubious VALUE)     
-------------------- -------- -------------------- ----------------------------
惗?ksedst()+31          CALL     ksedst1()            000000000 ? 000000001 ?
                                                   7FBFFFB0F0 ? 7FBFFFB150 ?
                                                   7FBFFFB090 ? 000000000 ?
ksedmp()+610         CALL     ksedst()             000000000 ? 000000001 ?
                                                   7FBFFFB0F0 ? 7FBFFFB150 ?
                                                   7FBFFFB090 ? 000000000 ?
ksdxfdmp()+1118      CALL     ksedmp()             000000001 ? 000000001 ?
                                                   7FBFFFB0F0 ? 7FBFFFB150 ?
                                                   7FBFFFB090 ? 000000000 ?
ksdxcb()+1547        CALL     ksdxfdmp()           7FBFFFC240 ? 000000011 ?
                                                   000000003 ? 7FBFFFC380 ?
                                                   7FBFFFC2E0 ? 000000000 ?
sspuser()+111        CALL     ksdxcb()             000000002 ? 000000011 ?
                                                   000000001 ? 000000001 ?
                                                   7FBFFFC2E0 ? 000000000 ?
__funlockfile()+64   CALL     sspuser()            000000002 ? 000000011 ?
                                                   000000001 ? 000000001 ?
                                                   7FBFFFC2E0 ? 000000000 ?
__libc_poll()+47     signal   __funlockfile()      7FBFFFCE20 ? 000000002 ?
                                                   000000BB8 ?
                                                   FFFFFFFFFFFFFFFF ?
                                                   000000BB8 ? 000000000 ?
--------------------- Binary Stack Dump ---------------------

本身ORADEBUG就是Oracle未公开的工具,而在不恰当的时候去执行一些命令,导致Oracle出现异常也是非常正常的。
这个故障提醒我们,任何内部的工具都是双刃剑,使用不当就可能会造成严重的后果,因此在产生环境中使用一定要谨慎。

Posted in BUG | Tagged , , , , , , | Leave a comment

ORA-600(kjbldrmrpstp:key)错误

客户10.2.0.3 RAC环境启动后碰到DRM问题导致实例崩溃。
错误信息为:

Mon Jan 19 12:42:58 2009
DATABASE Characterset IS ZHS16GBK
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started WITH pid=28, OS id=4496
Mon Jan 19 12:43:01 2009
Completed: ALTER DATABASE OPEN
Mon Jan 19 12:43:32 2009
Thread 2 advanced TO log SEQUENCE 2081
  CURRENT log# 3 seq# 2081 mem# 0: +DISKGROUP1/orcl1/onlinelog/group_3.258.628169029
Mon Jan 19 12:53:36 2009
Thread 2 advanced TO log SEQUENCE 2082
  CURRENT log# 17 seq# 2082 mem# 0: +DISKGROUP1/orcl1/onlinelog/group_17.289.628705767
Mon Jan 19 16:09:17 2009
Thread 2 advanced TO log SEQUENCE 2083
  CURRENT log# 18 seq# 2083 mem# 0: +DISKGROUP1/orcl1/onlinelog/group_18.290.628705771
Mon Jan 19 19:55:50 2009
Thread 2 advanced TO log SEQUENCE 2084
  CURRENT log# 19 seq# 2084 mem# 0: +DISKGROUP1/orcl1/onlinelog/group_19.291.628705777
Mon Jan 19 20:01:02 2009
Errors IN file /u01/app/oracle/admin/ORCL1/bdump/orcl12_lmon_2696.trc:
ORA-00600: internal error code, arguments: [kjbldrmrpst:pkey], [2026036], [786432], [7211917], [7220109], [], [], []
Mon Jan 19 20:01:03 2009
Errors IN file /u01/app/oracle/admin/ORCL1/bdump/orcl12_lmon_2696.trc:
ORA-00600: internal error code, arguments: [kjbldrmrpst:pkey], [2026036], [786432], [7211917], [7220109], [], [], []
Mon Jan 19 20:01:03 2009
LMON: terminating instance due TO error 481
Mon Jan 19 20:01:03 2009
Trace dumping IS performing id=[cdmp_20090119200103]
Mon Jan 19 20:01:03 2009
Shutting down instance (abort)
License high water mark = 10
Mon Jan 19 20:01:07 2009
Instance TERMINATED BY LMON, pid = 2696
Mon Jan 19 20:01:08 2009
Instance TERMINATED BY USER, pid = 14328

从错误信息上已经可以看出,这个问题和DRM有关。应该是实例重启后,有些对象的属主进行了重新分配,产生了DRM,并引发了这个bug。
Oracle类似的BUG很多,导致了600错误的第一个函数名称也都比较接近。
根据MOS文档ORA-600 [kjbldrmrpst:pkey] [ID 1489014.1],这个问题最接近的bug描述为Bug 14409183 ORA-600 [kjblpkeydrmqscchk:pkey] or similar / session hangs on “gc buffer busy acquire”,虽然后者这篇文档主要发生在11.2上,但是不排除10.2上有同样的问题。何况当前的问题显而易见是DRM问题导致的。
MOS给出的bug信息不是特别相符,且升级到11.2.0.3.4也不是特别靠谱的事情,因此禁止DRM还是目前这个问题的最佳接近方案。
对于无法将整个集群关闭重启的环境,下面的调高阈值的方式最佳接近方案:

SQL> ALTER system SET "_gc_affinity_limit" = 1000000 scope = BOTH;
System altered.
SQL> ALTER system SET "_gc_affinity_minimum"=10000000 scope = BOTH;
System altered.
Posted in BUG | Tagged , , , , , , | Leave a comment

ORA-7445(qsmmixComputeClusteringFactor)错误

Oracle 11.2.0.2上SQL TUNING的bug。
数据库版本为11.2.0.2 RAC for Solaris,错误信息为:

Thu Jul 05 22:00:01 2012
BEGIN automatic SQL Tuning Advisor run FOR special tuning task  "SYS_AUTO_SQL_TUNING_TASK"
Thu Jul 05 22:28:35 2012
Exception [TYPE: SIGSEGV, Address NOT mapped TO object] [ADDR:0x4] [PC:0x102B3CC08, qsmmixComputeClusteringFactor()+1640] [flags: 0x0, COUNT: 1]
Errors IN file /opt/oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_j002_7688.trc  (incident=192450):
ORA-07445: exception encountered: core dump [qsmmixComputeClusteringFactor()+1640] [SIGSEGV] [ADDR:0x4] [PC:0x102B3CC08] [Address NOT mapped TO object] []
Incident details IN: /opt/oracle/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_192450/orcl1_j002_7688_i192450.trc
USE ADRCI OR Support Workbench TO package the incident.
See Note 411.1 at My Oracle Support FOR error AND packaging details.
Thu Jul 05 22:28:38 2012
Dumping diagnostic DATA IN directory=[cdmp_20120705222838], requested BY (instance=1, osid=7688 (J002)), summary=[incident=192450].
Thu Jul 05 22:28:41 2012
Sweep [inc][192450]: completed
Sweep [inc2][192450]: completed

不需要其他信息,上面的报错就足以判断,问题发生在SQL TUNING自动运行的JOB中。显然这是Oracle的bug。
果然MOS文档Bug 9746210 – ORA-7445 [qsmmixComputeClusteringFactor] from SQL tuning [ID 9746210.8]描述了这个问题,且确认影响版本就是11.2.0.2。Oracle在11.2.0.3和11.2.0.2.4中修正了这个问题。
这个问题对于数据库的运行本身也没有什么影响,可以直接忽略掉,甚至可以将SQL TUNING对应的SCHEDULER直接停止。

Posted in BUG | Tagged , , , | Leave a comment

抽取AWR信息遭遇ORA-39147错误

今天在使用awrextr.sql脚本导出AWR数据时碰到这个错误。
数据库版本为10.2.0.5 RAC for HP_UX,错误信息为:

USING the dump file prefix: awrdat_15222_15448
|
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|  The AWR EXTRACT dump file will be located
|  IN the following directory/file:
|   /orabak01/orabak
|   awrdat_15222_15448.dmp
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|  *** AWR EXTRACT Started ...
|
|  This operation will take a few moments. The
|  progress OF the AWR EXTRACT operation can be
|  monitored IN the following directory/file:
|   /orabak01/orabak
|   awrdat_15222_15448.log
|
Exception encountered IN AWR_EXTRACT
ORA-31626: job does NOT exist
ORA-31637: cannot CREATE job SYS_EXPORT_TABLE_01 FOR USER SYS
ORA-06512: at "SYS.DBMS_SYS_ERROR", line 95
ORA-06512: at "SYS.KUPV$FT_INT", line 672
ORA-39147: cannot migrate DATA Pump queue TABLE ownership TO this instance
 
BEGIN
*
ERROR at line 1:
ORA-31623: a job IS NOT attached TO this SESSION via the specified handle
ORA-06512: at "SYS.DBMS_SYS_ERROR", line 79
ORA-06512: at "SYS.DBMS_DATAPUMP", line 938
ORA-06512: at "SYS.DBMS_DATAPUMP", line 4946
ORA-06512: at "SYS.DBMS_SWRF_INTERNAL", line 691
ORA-06512: at "SYS.DBMS_SWRF_INTERNAL", line 1007
ORA-06512: at line 4

Oracle的awrextr.sql脚本,本质是利用数据泵导出AWR基表。从错误信息也可以看出,报错发生在DATA PUMP过程中。如果对数据泵有点了解,就会知道,Oracle的数据泵会使用队列表的特性,而ORA-39147错误就是队列表在RAC环境中才可能碰到的问题。
对队列表有一点了解,因此直接在另外的节点上完成了AWR的抽取工作。而回来之后查询了一下MOS,发现导致问题的原因很可能是另外一个节点上存在一个正在运行的数据泵操作。而Oracle的数据泵只允许同时在一个节点上执行。
Mos文档IMPDP Started In RAC Environment Fails With ORA-39147 [ID 879148.1]对这个问题进行了说明,可以通过SELECT STATE FROM DBA_DATAPUMP_JOBS来查看是否存在正在运行的数据泵操作。

Posted in ORACLE | Tagged , , , , , , , | Leave a comment

10.2出现reliable message信息导致进程hang死

已经在多个RAC环境碰到因为等待reliable message导致进程hang死的情况了。
这个问题在RAC环境比较常见,在很多客户中都碰到过这个问题,而数据库版本多集中在10.2.0.4,个别的版本为10.2.0.5。
而出现这个信息的进程也不太相同,最常见的的m000进程,这个进程的僵死会导致一个节点的AWR无法自动收集。
另外比较常见的就是导致高级队列机制异常,直接影响是数据泵导出无法正常运行,会导致进程挂起。
最近又碰到了类似的情况,在MOS文档中发现Bug 6148054 RAC hang waiting for “reliable message”文章描述的BUG与之前的情况非常类似,不过根据这个bug的描述,问题似乎不会在10.2.0.5上发生。其他方面现象都比较相符,而且之前碰到的大部分案例中,除了reliable message等待之外,还会有wait for unread message on broadcast channel等待。
这个bug确认的修复版本是11.1.0.6,从这个角度讲,10.2.0.5很可能也会有类似的问题。除了升级版本为,可以通过补丁7801939来解决这个问题。

Posted in BUG | Tagged , , | Leave a comment

11.2使用KEEP池导致ENQ: KO – Fast Object Checkpoint等待

前一段时间在客户测试TPCC的时候碰到这个问题,今天在MOS上找到问题的原因,简单记录一下。

在11.2中,如果表设置了KEEP池,而在初始化参数中没有指定DB_KEEP_CACHE_SIZE的值,就可能会造成数据库中出现明显的ENQ: KO – Fast Object Checkpoint的等待,同时还伴有reliable message的等待。

MOS文档ENQ: KO – Fast Object Checkpoint And Reliable Message Causing Bad Performance [ID 1377830.1]描述了这个问题,给出的解决方案是不使用KEEP池,改为使用DEFAULT池,或者设置DB_KEEP_CACHE_SIZE的值为非0。

而当时测试的过程中,问题与当前的现象很像,版本是11.2.0.2,在执行完压力测试后,所有程序断开后。如果执行关闭操作SHUTDOWN IMMEDIATE,此时就可以发现,数据库经历漫长的ENQ: KO – Fast Object Checkpoint等待,然后数据库才可以正常关闭。不过唯一的不同之处在于,当时测试环境中设置了DB_KEEP_CACHE_SIZE的值,因此问题和当前问题描述很像,但并不一样。

由于环境已经消失,现在无法验证这个问题,但是根据印象判断,这个问题可能确实和使用了KEEP池有关,在使用KEEP池之前使用默认的DEFAULT池时,数据库关闭并没有经历如此严重的等待。因此,在11.2中使用非DEFAULT池,可能会引发异常

 

Posted in BUG | Tagged , , , , | Leave a comment

ORA-7445(pevm_icd_call_common)错误

客户11.2 RAC环境出现ORA-7445[pevm_icd_call_common]错误。
详细错误信息为:

Mon Aug 20 17:31:49 2012
Exception [TYPE: SIGSEGV, Address NOT mapped TO object] [ADDR:0xD8] [PC:0x93F1E34, pevm_icd_call_common()+754] [flags: 0x0, COUNT: 1]
Redo thread 1 internally disabled at seq 9 (SMON)
Errors IN file /oracle/diag/rdbms/orcl/orcl2/trace/orcl2_m003_21741.trc  (incident=12433):
ORA-07445: exception encountered: core dump [pevm_icd_call_common()+754] [SIGSEGV] [ADDR:0xD8] [PC:0x93F1E34] [Address NOT mapped TO object] []
ORA-04067: NOT executed, package body "GGS.DDLREPLICATION" does NOT exist
ORA-06508: PL/SQL: could NOT find program unit being called: "GGS.DDLREPLICATION"
ORA-04067: NOT executed, package body "GGS.DDLREPLICATION" does NOT exist
ORA-06508: PL/SQL: could NOT find program unit being called: "GGS.DDLREPLICATION"
Incident details IN: /oracle/diag/rdbms/orcl/orcl2/incident/incdir_12433/orcl2_m003_21741_i12433.trc
USE ADRCI OR Support Workbench TO package the incident.
See Note 411.1 at My Oracle Support FOR error AND packaging details.
Dumping diagnostic DATA IN directory=[cdmp_20120820173151], requested BY (instance=2, osid=21741 (M003)), summary=[incident=12433].

根据MOS文档ORA-03113: During an Upgrade whille Running Catuppst.sql ORA-6508 “GGUSER.DDLREPLICATION” in alert.log [ID 1506156.1],导致问题的原因是OGG的触发器影响了数据库的升级脚本的执行。
在升级过程中,应该先将OGG的触发器禁止掉:alter trigger sys.GGS_DDL_TRIGGER_BEFORE disable 然后再运行升级脚本,否则在catuppst.sql执行时就会引发这个错误。

Posted in BUG | Tagged , , , , , , | Leave a comment

ORA-600(kghfrf1)错误

客户的10.2.0.4 FOR AIX数据库在刷新物化视图时碰到这个错误。
错误信息为:

Fri Oct  5 06:20:27 2012
Errors IN file /home/oracle/app/admin/orcl/udump/orcl_ora_4272180.trc:
ORA-00600: internal error code, arguments: [kghfrf1], [0x000000000], [], [], [], [], [], []

从TRACE文件可以看到,出现错误时在执行的语句是物化视图的刷新:

*** 2012-10-05 06:20:27.599
ksedmp: internal OR fatal error
ORA-00600: internal error code, arguments: [kghfrf1], [0x000000000], [], [], [], [], [], []
CURRENT SQL statement FOR this SESSION:
BEGIN u1.refre(:1) ; END;
----- PL/SQL Call Stack -----
  object      line  object
  handle    NUMBER  name
70000018ea40960        81  package body SYS.DBMS_SNAPSHOT
70000018ea40960      1727  package body SYS.DBMS_SNAPSHOT
70000018ea40960      2220  package body SYS.DBMS_SNAPSHOT
70000018ea40960      2457  package body SYS.DBMS_SNAPSHOT
70000018d8668a0       685  package body SYS.DBMS_IREFRESH
70000018ea4ff60       195  package body SYS.DBMS_REFRESH
7000001751170d8         3  PROCEDURE U1.REFRE
70000018eb80490         1  anonymous block

根据MOS文档ORA-600 [Kghfrf1] On Execution of MV_JOB_PROC Job [ID 816733.1],这个问题是在物化视图刷新时清理内存引发的错误。确认影响的版本为10.2.0.4和9.2.0.7,该问题在11.1.0.6和10.2.0.5中进行了FIXED。
由于该问题并非每次刷新都会引发,加上物化视图刷新的JOB有自动重试的功能,因此该问题在不频繁出现的情况下,也可以采取忽略的策略。

Posted in BUG | Tagged , , , , , | Leave a comment

ORA-600(15419)和ORA-6544(2603)错误

客户9208 RAC环境出现ORA-600(15419)和ORA-6544(2603)错误。
错误信息为:

Sun Feb  5 15:58:52 2012
Errors IN file /oracle/app/admin/orcl/udump/orcl2_ora_250852.trc:
ORA-00600: internal error code, arguments: [15419], [severe error during PL/SQL execution], [], [], [], [], [], []
ORA-06544: PL/SQL: internal error, arguments: [2603], [], [], [], [], [], [], []
ORA-06553: PLS-707: unsupported construct OR internal error [2603]
Sun Feb  5 15:58:53 2012
Trace dumping IS performing id=[cdmp_20120205155853]

从第一个ORA-600错误就可以判断,错误来自PL/SQL的执行,而从第二个错误判断,在PL/SQL执行过程中出现了参数为2603的错误。
查询MOS确认,该问题是由于工具PL/SQL DEVELOPER使用DEBUG模式编译PL/SQL代码造成的:ORA-600 [15419] And ORA-6544 [2603] From PL/SQL Developer [ID 436359.1]。
解决方法也很简单,升级PL/SQL DEVELOPER到与数据库兼容的版本,或者禁止在编译PL/SQL代码的时候添加DEBUG信息。

Posted in BUG | Tagged , , , , , , | Leave a comment