系统存在严重的latch: undo global data等待

客户10.2.0.5 RAC环境出现了严重的latch: undo global data等待。

问题时刻AWR的TOP如下:

Event

Waits

Time(s)

Avg Wait(ms)

% Total Call   Time

Wait Class

latch: undo   global data

6,245,400

1,372,583

220

22.0

Other

gc buffer busy

114,190,782

1,329,749

12

21.3

Cluster

enq: TX – row   lock contention

1,377,980

685,454

497

11.0

Application

CPU time

460,041

7.4

enq: TX – index   contention

602,648

285,683

474

4.6

Concurrency

 

等待最明显的是latch: undo global data和gc buffer busy,后者是RAC中比较常见的等待,也可以根据报告后面的SQL等待部分很容易定位到导致问题的SQL语句,而前者的等待并不常见。

根据MOS文档”LATCH: UNDO GLOBAL DATA” In The Top Wait Events [ID 1451536.1]描述,这个等待和隐含参数_undo_autotune设置为FALSE情况下的UNDO空间不足有关

当前数据库确实关闭了_undo_autotune功能。且LATCH undo global data最多的等待发生在ktusm_stealext: KSLBEGIN处,这说明会话在寻找新的UNDO EXTENTS时,不得不Steal未过期的UNDO EXTENTS。

解决方案有三个:减少UNDO_RETENTION参数设置的时间长度;增加UNDO_TABLESPACE的空间大小;将_undo_autotune隐含参数设置为TRUE。

 

Posted in BUG | Tagged , , , | Leave a comment

20130712 LSI渠道启动峰会

参加了LSI渠道启动峰会,还碰到了沃趣科技的CEO。
原则上讲云和恩墨不算是LSI的渠道商,不过之前与LSI合作对Nytro WarpDrive卡和Nytro MegaRAID对于Oracle数据库性能的提升。因此LSI把我们作为合作伙伴也邀请到了今天的渠道峰会的会场。
最近恰好也在一个客户处进行LSI的POC,客户的数据库和SQL都进行了优化,优化后数据库的主要压力落在IO上,因此考虑通过缓存的方式来进行优化。如果缓存的测试效果非常明显,到时候会将对比效果公布出来。
而沃趣科技之前一直和FusionIO合作,本次来参加LSI的会议,应该也是作为技术合作伙伴。和grassbell还是第一次见面,虽然早在04年就已经在ITPUB上相识,不过随着他到了阿里,此后就一直没有机会见面,没想到这次见面的机会还是LSI提供给我们的。

Posted in NEWS | Leave a comment

Oracle DBA实战攻略:运维管理、诊断优化、高可用与最佳实践——序

印象中很少帮人写序,提笔后才发现这也不是件轻松的事情。
初识周亮应该是2012年10月到杭州出差,经一个杭州的同事介绍认识的,虽然是第一次见面,但是聊得却很投机:一方面大家都是这个圈子里的人,虽然之前没有打过交道,但是有着很多共同的朋友,因此感觉比较亲切;另一方面是我们两个的相似点非常多,因此共同语言也就比较多。大家都是Oracle的DBA,都对Oracle技术感兴趣,也都在Oracle的圈子内积累了不少年头,即便是只聊技术,也能找到说不完的话题。更何况我们两个人的工作性质几乎完全一样。我们两个目前都是从事乙方运维的工作,我是2011年开始从事Oracle数据库的乙方的运维管理,而周亮的乙方运维经验则比我长得多,如果从这个角度讲,他的Oracle数据库运维经验要比我丰富得多,这无疑也使我们增加了更多的共同话题。于是当天晚上,从Oracle数据库聊到了具体的案例,从工作聊到了客户,从技术聊到了团队,估计当天如果在聊的晚一点,就差谈人生和理想了。
也就是在当晚的聊天过程中,我听说他正在写书,而且已经动笔几个月了。当时Eygle的新书刚刚出版不久,而我几个熟悉的朋友也恰好都在写书,所以这方面的话题也比较多,也就多聊了几句,当时对他的新书有了一个第一印象:这本书是根据案例和实践经验整理而成的,当时跟我的感觉应该和DBA手记系列比较相似。
之后的几个月,听说他仍然在笔耕不缀。乙方工作的辛苦自己是深有感触的,能在日常繁忙工作的基础上,把一些知识要点记录下来已经是很少有人能做到的事情了,而还能坚持不断的写作,这绝对是需要毅力才能完成的工作。因为写作和写BLOG是不同的,虽然二者可能都是需要每天挤出半个小时到一个小时的时间,前者要求的整块的时间,否则难以展开思路,而对于后者而言,如果有了什么思路或者碰到了什么案例,可以用很短的时间先记录下来,随后也可以通过零碎的时间进行实验和验证,说起来似乎是同样的一小时时间,前者的坚持和付出要远远高于后者。
到了最近他成书后邀请我写序,我才发现他的书并不是简单的案例集合,而是存在一条主线将各个知识点串联在一起,这在成书的困难程度上就要比DBA手记之类的案例集合至少上了一个台阶。
从严格意义上讲,我没有写过书,只是写过文章。虽然参与了《Oracle数据库性能优化》以及《DBA手记》等书的编写,但形式仍然是供稿方式。这几本书的最大特点在于没有一条明确的主线来穿起所有的内容,书中的各个章节独立,因此成书相对来说要容易得多。而如果要独立写一本书,就需要考虑完整的架构、背景知识、贯穿全书的主线、以及选取的案例是否合适等等,显然这要比纯案例的组合困难得多。而即便是案例类的书籍,成书过程也并不简单。先不说素材、案例的选取,知识点深入程度的把控这些对于技术含量要求很高的事情。只说整体书籍编写工作完成后,对全书的修订和审阅工作也不是一件轻松的事情。因为经历过几次这种不断的修订的工作,每次都是一个很痛苦的过程,以至于当现在为止还记忆犹新。而如果要我独立写一本书,要有明确的主线,要有详尽的基础知识介绍,要覆盖Oracle的主要体系结构点,还要有大量深入的案例作为佐证,最好还要有一些尚未公开的研究结果,光想想就已经足够了。因此,我一直很钦佩那些能独立完成一本书的人,无疑作者也是令我钦佩的大牛中的一员。
作者令我钦佩的不仅仅是技术上的积累,也不只是我提到的为了成书而付出的艰辛,而更令我钦佩的是其对技术几年如一日的持续专注的态度。最近微博上还有人讨论是否应该写书。国内的技术和出版的大环境,决定了靠写技术书籍赚钱绝对是Mission impossible。即使是Eygle这种几乎一年一本的速度,且每本都很畅销的情况,也完全不足以通过写书养活自己,那就更不用说其他人了。通过写书来出名也越来越难,其实很多人是因为作者本身的名气才买的书。因此有位网友说得很有道理,别人知道你是因为你做了什么,而不是你写了什么。因此想要靠写书来成就自己的名气也不是件容易的事情。在现阶段无论是收名还是获利,想通过写书的方式来实现,其投入产出比都非常不好的。但是也正因为如此,以名利为目的作者越来越少,而写书更多的变成了一些有技术追求的人,把自己的技术沉淀、总结和提升的过程。大浪淘沙始得金,我们欣喜的看到,近期出版的和即将要出版的几本Oracle书籍,无不是业内专业人士的呕心力作。而作者的这边《Oracle DBA实战攻略:运维管理、诊断优化、高可用与最佳实践》正是其中之一。
作者根据他多年的运维诊断经验,从数据库如何创建开始,循序渐进的介绍了数据库的启动关闭过程,如何配置监听并连接到数据库,如果对数据库空间进行管理和监控,SGA的调整和优化方法,数据库的CHECKPOINT和SCN机制与备份恢复,数据库性能优化的方法论以及Oracle Data Guard的配置和管理。仅看目录似乎都是非常基础的内容,似乎这是本针对Oracle初学者的入门级书籍,其实恰恰相反,所谓大道至简,能把最基础的东西写出新意并加入自己的理解本身就是一件非常见功底的事情。而且作者结合了大量的真实案例,把自己多年的宝贵经验融入其中,通过一些复杂案例的诊断过程来说明这些简单的原理和知识点,这正是作者高明的地方。更何况作者并没有简单的停留在案例诊断分析的层面上,而是根据大量案例的经验汇总,把问题的优化、诊断和解决提升到了方法论的层面上,这就不仅仅是通过几年经验积累就可以轻易达成的,必要要求作者不断的思考、分析、归纳和验证,才能上升到理论指导实践的层面。
最后希望周亮的《Oracle DBA实战攻略:运维管理、诊断优化、高可用与最佳实践》一书,可以帮助更多的数据库技术爱好者解决日常碰到的技术困难,给那些以日常操作入门的运维人员指出一条深入学习提高的道路。

Posted in BOOKS | Leave a comment

ORA-600(ktrgcm_3)错误

客户10.2.0.5 RAC数据库出现ORA-600[ktrgcm_3]错误。
错误信息如下:

Sun Jul 07 01:06:00 EAT 2013
Errors IN file /oracle/admin/orcl/udump/orcl1_ora_28947.trc:
ORA-00600: internal error code, arguments: [ktrgcm_3], [], [], [], [], [], [], []

详细TRACE如下:

*** ACTION NAME:() 2013-07-07 00:49:57.500
*** MODULE NAME:(g@orcl (TNS V1-V3)) 2013-07-07 00:49:57.500
*** SERVICE NAME:(orcl) 2013-07-07 00:49:57.500
*** SESSION ID:(3332.36419) 2013-07-07 00:49:57.500
*** 2013-07-07 00:49:57.500
ksedmp: internal OR fatal error
ORA-00600: internal error code, arguments: [ktrgcm_3], [], [], [], [], [], [], []
CURRENT SQL statement FOR this SESSION:
INSERT INTO T_B_T_IN (TRADE_ID,SUBSCRIBE_ID,BPM_ID,TRADE_TYPE_CODE,IN_MODE_CODE,PRIORITY,SUBSCRIBE_STATE, NEXT_DEAL_TAG,PRODUCT_ID,BRAND_CODE,USER_ID,CUST_ID,USECUST_ID,ACCT_ID,USER_DIFF_CODE, NET_TYPE_CODE,SERIAL_NUMBER,CUST_NAME,ACCEPT_DATE,ACCEPT_MONTH,TRADE_STAFF_ID, TRADE_DEPART_ID,TRADE_CITY_CODE,TRADE_EPARCHY_CODE,TERM_IP,EPARCHY_CODE,CITY_CODE, OLCOM_TAG,EXEC_TIME,FINISH_DATE,OPER_FEE,FOREGIFT,ADVANCE_PAY,INVOICE_NO,FEE_STATE, FEE_TIME,FEE_STAFF_ID,CANCEL_TAG,CANCEL_DATE,CANCEL_STAFF_ID,CANCEL_DEPART_ID, CANCEL_CITY_CODE,CANCEL_EPARCHY_CODE,CHECK_TYPE_CODE,CHK_TAG,AUDIT_TAG, AUDIT_BATCH_NO,ACTOR_NAME,ACTOR_CERTTYPEID,ACTOR_PHONE,ACTOR_CERTNUM, CONTACT,CONTACT_PHONE,CONTACT_ADDRESS,REMARK,IF_MAINTENANCE, PROVINCE_CODE,TRADE_PROVINCE_CODE,CANCEL_PROVINCE_CODE,RSRV_TAG1) SELECT A.TRADE_ID,SUBSCRIBE_ID,BPM_ID,TRADE_TYPE_CODE,IN_MODE_CODE,PRIORITY,SUBSCRIBE_STATE, NEXT_DEAL_TAG,A.PRODUCT_ID,NVL(D.BRAND_CODE,A.BRAND_CODE), A.USER_ID,A.CUST_ID,A.USECUST_ID,ACCT_ID,A.USER_DIFF_CODE, DECODE(A.PRODUCT_ID,'-1','99',NVL(D.NET_TYPE_CODE,A.NET_TYPE_CODE)),A.SERIAL_NUMBER,CUST_NAME,ACCEPT_DATE,A.ACCEPT_MONTH,TRADE_STAFF_ID, TRADE_DEPART_ID,TRADE_CITY_CODE,TRADE_EPARCHY_CODE,TERM_IP,A.EPARCHY_CODE,A.CITY_CODE, OLCOM_TAG,EXEC_TIME,FINISH_DATE,OPER_FEE,FOREGIFT,ADVANCE_PAY,INVOICE_NO,FEE_STATE, FEE_TIME,FEE_STAFF_ID, CASE WHEN B.TRADE_ID_A IS NOT NULL AND A.CANCEL_DATE IS NULL THEN DECODE(A.CANCEL_TAG,'0','4',A.CANCEL_TAG) ELSE CASE WHEN B.TRADE_ID_A IS NOT NULL AND TRUNC(A.ACCEPT_DATE,'DD')<TRUNC(A.CANCEL_DATE,'DD') THEN DECODE(A.CANCEL_TAG,'3','0',A.CANCEL_TAG) ELSE CASE WHEN TRUNC(A.ACCEPT_DATE,'DD')<TRUNC(A.CANCEL_DATE,'DD') THEN DECODE(A.CANCEL_TAG,'3','0',A.CANCEL_TAG) ELSE DECODE(A.CANCEL_TAG,'1','0','4','0',A.CANCEL_TAG) END END END CANCEL_TAG, CASE WHEN A.CANCEL_TAG IN ('1','4','3') AND TRUNC(A.ACCEPT_DATE,'DD')<TRUNC(A.CANCEL_DATE,'DD') THEN NULL ELSE A.CANCEL_DATE END CANCEL_DATE, CASE WHEN A.CANCEL_TAG IN ('1','4','3') AND TRUNC(A.ACCEPT_DATE,'DD')<TRUNC(A.CANCEL_DATE,'DD') THEN NULL ELSE A.CANCEL_STAFF_ID END CANCEL_STAFF_ID, CASE WHEN A.CANCEL_TAG IN ('1','4','3') AND TRUNC(A.ACCEPT_DATE,'DD')<TRUNC(A.CANCEL_DATE,'DD') THEN NULL ELSE A.CANCEL_DEPART_ID END CANCEL_DEPART_ID, CASE WHEN A.CANCEL_TAG IN ('1','4','3') AND TRUNC(A.ACCEPT_DATE,'DD')<TRUNC(A.CANCEL_DATE,'DD') THEN NULL ELSE A.CANCEL_CITY_CODE END CANCEL_CITY_CODE, CASE WHEN A.CANCEL_TAG IN ('1','4','3') AND TRUNC(A.ACCEPT_DATE,'DD')<TRUNC(A.CANCEL_DATE,'DD') THEN NULL ELSE A.CANCEL_EPARCHY_CODE END CANCEL_EPARCHY_CODE, CHECK_TYPE_CODE,CHK_TAG,AUDIT_TAG, AUDIT_BATCH_NO,ACTOR_NAME,ACTOR_CERTTYPEID,ACTOR_PHONE,ACTOR_CERTNUM, CONTACT,CONTACT_PHONE,CONTACT_ADDRESS,A.REMARK,IF_MAINTENANCE, F_SYS_GETVALUE('get_province_code',NULL,DECODE(LENGTH(A.EPARCHY_CODE),4,A.EPARCHY_CODE,A.TRADE_EPARCHY_CODE),NULL,NULL) PROVINCE_CODE, F_SYS_GETVALUE('get_province_code',NULL,A.TRADE_EPARCHY_CODE,NULL,NULL) TRADE_PROVINCE_CODE, F_SYS_GETVALUE('get_province_code',NULL,A.CANCEL_EPARCHY_CODE,NULL,NULL) CANCEL_PROVINCE_CODE, CASE WHEN C.NEW_STATE_CODE='H' THEN '9130' ELSE CASE WHEN C.OLD_STATE_CODE='H' THEN '9140' END END FROM TF_BH_TRADE A, (SELECT TRADE_ID_A,TRADE_ID_B FROM TF_B_TRADE_REL WHERE RELATION_TYPE_CODE='02' AND STATE IN ('0','A') )B,TF_B_TRADE_STATE_TRANS C,TD_B_PRODUCT D WHERE A.TRADE_ID=B.TRADE_ID_B(+) AND A.TRADE_ID=C.TRADE_ID(+) AND A.PRODUCT_ID = D.PRODUCT_ID(+) AND A.ACCEPT_MONTH = :B3 AND A.ACCEPT_DATE >= :B2 AND A.ACCEPT_DATE < :B1 AND A.TRADE_TYPE_CODE NOT IN ('116','616') AND A.NEXT_DEAL_TAG IN (SELECT VALUEC1 FROM TD_SD_COMMPARA WHERE PARAM_TYPE_CODE ='NEXT_DEAL_TAG') AND EXISTS (SELECT X.TRADE_ID FROM (SELECT B.TRADE_ID TRADE_ID FROM TF_B_TRADE_BATDEAL A,TF_BH_TRADE B WHERE A.TRADE_ID = B.TRADE_ID AND A.CANCEL_TAG = B.CANCEL_TAG AND A.STATE ='2' AND A.B_UPDATE_DATE >= :B2 AND A.B_UPDATE_DATE < :B1 AND A.BATCH_OPER_TYPE IN ('CREATEPREUSER','CREATEUSER','DISCNTCHG','ADDGRPMEMBER','BATREMOVEMEMBER','PRODUCTCHG','STATISTICS','BATCHARGEFEE','SCOREADJUST','CANCELPREUSER') MINUS SELECT TRADE_ID FROM T_B_T_IN ) X WHERE A.TRADE_ID = X.TRADE_ID)
----- PL/SQL Call Stack -----
  object      line  object
  handle    NUMBER  name
c00000142951a9f8       434  PROCEDURE U_S_1.P_S_D_T_IN
c0000013baacb568         1  anonymous block
----- Call Stack Trace -----
calling              CALL     entry                argument VALUES IN hex      
location             TYPE     point                (? means dubious VALUE)     
-------------------- -------- -------------------- ----------------------------
ksedst()+64          CALL     ksedst1()            000000000 ? 000000001 ?
ksedmp()+2176        CALL     ksedst()             000000000 ?
                                                   C000000000000D20 ?
                                                   4000000004066EE0 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ?
ksfdmp()+112         CALL     ksedmp()             000000003 ?
                                                   9FFFFFFFFFFE8EB0 ?
                                                   60000000000BA288 ?
                                                   9FFFFFFFFFFE9480 ?
                                                   C000000000000999 ?
                                                   40000000040AEF50 ?
kgerinv()+304        CALL     ksfdmp()             9FFFFFFFFFFE9A10 ?
                                                   000000003 ?
                                                   9FFFFFFFFFFE9490 ?
                                                   60000000000BA288 ?
                                                   C000000000000612 ?
                                                   4000000009903790 ?
kgeasnmierr()+144    CALL     kgerinv()            60000000000318D0 ?
                                                   4000000001AF87F0 ?
                                                   6000000000032988 ?
                                                   4000000001AF87F0 ?
                                                   9FFFFFFFFFFE9A50 ?
$cold_ktrgcm()+1188  CALL     kgeasnmierr()        60000000000318D0 ?
8                                                  60000000003139F8 ?
                                                   6000000000313A08 ?
                                                   6000000000032D00 ?
                                                   000000000 ? 000000000 ?
                                                   C0000014D3B02A54 ?
                                                   C0000014BC6E8AEC ?
ktrget()+1088        CALL     $cold_ktrgcm()       9FFFFFFFBF349F60 ?
                                                   60000000000BA288 ?
                                                   C000000000000C1D ?
                                                   4000000002FB1560 ?
                                                   00003811B ?
kdirfrs()+8160       CALL     ktrget()             9FFFFFFFBF349F60 ?
                                                   000000000 ?
                                                   C00000089B458014 ?
qerixFetchFastFullS  CALL     kdirfrs()            9FFFFFFFBF349F58 ?
can()+7504                                         60000000000BA288 ?
                                                   4000000003057DA0 ?
                                                   00003825F ?
                                                   9FFFFFFFBF34A070 ?
                                                   00003821F ? 000000000 ?
                                                   C000000000001430 ?
qergiFetch()+816     CALL     qerixFetchFastFullS  C0000013CB1A0CA0 ?
                              can()                4000000001CC4CE0 ?
                                                   9FFFFFFFFFFEB750 ?
                                                   000007FFF ? 000007FFF ?
                                                   C0000013CB1A0CA0 ?
rwsfcd()+256         CALL     qergiFetch()         C0000013CB1A08F8 ?
                                                   4000000001CC4CE0 ?
                                                   9FFFFFFFFFFEB750 ?
                                                   000007FFF ?
qerhjFetch()+912     CALL     rwsfcd()             9FFFFFFFBF34A698 ?
                                                   4000000001CC4CE0 ?
                                                   9FFFFFFFFFFEB750 ?
                                                   000007FFF ?
qerflFetch()+336     CALL     qerhjFetch()         C0000013CB1A0740 ?
                                                   4000000001CCEC70 ?
                                                   9FFFFFFFFFFEB820 ?
                                                   000007FFF ?
qersoFetch()+2080    CALL     qerflFetch()         C0000013CB1A05B0 ?
                                                   4000000001CCEED0 ?
                                                   C000001426EB7FA8 ?
                                                   000007FFF ?
                                                   60000000000BA288 ?
qerseFetch()+400     CALL     qersoFetch()         C000001426EB7FA8 ?
                                                   4000000001CCEBF0 ?
                                                   9FFFFFFFFFFEB880 ?
                                                   000007FFF ?
qervwFetch()+368     CALL     qerseFetch()         C000001426EB7AE8 ?
                                                   9FFFFFFFFFFEB8AC ?
                                                   9FFFFFFFFFFEB8AC ?
                                                   9FFFFFFFBF3DC5DA ?
rwsfcd()+256         CALL     qervwFetch()         C000001426EB79C8 ?
                                                   4000000001CC4CF0 ?
                                                   9FFFFFFFFFFEB8D0 ?
                                                   C000001426EB79C0 ?
qerhjFetch()+1152    CALL     rwsfcd()             9FFFFFFFBF3DC5F8 ?
                                                   4000000001CC4CF0 ?
                                                   9FFFFFFFFFFEB8D0 ?
                                                   000007FFF ?
rwsfcd()+256         CALL     qerhjFetch()         C000001426EB7710 ?
                                                   4000000001CC4D00 ?
                                                   9FFFFFFFFFFEB9A0 ?
                                                   000007FFF ?
qerhjFetch()+1728    CALL     rwsfcd()             9FFFFFFFBF3DC800 ?
                                                   4000000001CC4D00 ?
                                                   9FFFFFFFFFFEB9A0 ?
                                                   000007FFF ?
                                                   60000000000BA288 ?
qerjoFetch()+1168    CALL     qerhjFetch()         C0000013AB146D68 ?
                                                   000000000 ? 000000000 ?
                                                   000000001 ?
                                                   60000000000BA288 ?
                                                   C0000000000012AB ?
rwsfcd()+256         CALL     qerjoFetch()         C000001388B4B060 ?
                                                   4000000001CCEC20 ?
                                                   9FFFFFFFFFFEBA80 ?
                                                   000007FFF ?
qerhjFetch()+1728    CALL     rwsfcd()             9FFFFFFFBF3D7CA8 ?
                                                   4000000001CC4D00 ?
                                                   9FFFFFFFFFFEBAB0 ?
                                                   000007FFF ?
                                                   60000000000BA288 ?
qerjoFetch()+1168    CALL     qerhjFetch()         C0000013FF4C97D0 ?
                                                   000000000 ? 000000000 ?
                                                   000000001 ?
                                                   60000000000BA288 ?
                                                   C0000000000012AB ?
qerflFetch()+336     CALL     qerjoFetch()         C000001380167980 ?
                                                   4000000001CCEC70 ?
                                                   9FFFFFFFFFFEBBC0 ?
                                                   000000001 ?
rwsfcd()+256         CALL     qerflFetch()         C0000013801676A0 ?
                                                   000000000 ? 000000000 ?
                                                   000000001 ?
insfch()+304         CALL     rwsfcd()             9FFFFFFFBF3DAEF8 ?
                                                   000000000 ? 000000000 ?
                                                   000000001 ?
insdrv()+912         CALL     insfch()             9FFFFFFFB9F46F60 ?
                                                   C0000000000019BB ?
                                                   60000000000BA288 ?
                                                   40000000031630F0 ?
                                                   00003821D ?
                                                   60000000000C6FE0 ?
                                                   60000000000C6F18 ?
                                                   4000000001C116A0 ?
inscovexe()+1408     CALL     insdrv()             9FFFFFFFFFFEC390 ?
                                                   9FFFFFFFFFFEBBF0 ?
                                                   60000000000BA288 ?
                                                   40000000031B6160 ?
                                                   00003825B ?
                                                   9FFFFFFFFFFEBBF4 ?
insExecStmtExecIniE  CALL     inscovexe()          C0000013E1A08250 ?
ngine()+176                                        C0000013C3B02F08 ?
                                                   9FFFFFFFFFFECB40 ?
insexe()+1040        CALL     insExecStmtExecIniE  C0000013E1A08250 ?
                              ngine()              C0000013C3B02F08 ?
                                                   9FFFFFFFFFFED3D0 ?
                                                   60000000000BA288 ?
                                                   C000000000000FA5 ?
                                                   4000000003166C80 ?
opiexe()+7760        CALL     insexe()             C0000013BE5AB8C0 ?
                                                   9FFFFFFFFFFED3D0 ?
                                                   9FFFFFFFFFFECB50 ?
                                                   60000000000BA288 ?
                                                   9FFFFFFFFFFED0D0 ?
                                                   C0000000000025D3 ?
                                                   4000000002F87610 ?
                                                   00003821B ?
opipls()+3680        CALL     opiexe()             9FFFFFFFFFFEED70 ?
                                                   4000000003119420 ?
                                                   60000000000C6CE0 ?
                                                   9FFFFFFFFFFED0F0 ?
                                                   60000000000BA288 ?
                                                   00002DE7D ?
                                                   C00000000000224C ?
                                                   001121800 ?
opiodr()+2144        CALL     opipls()             9FFFFFFFFFFEF680 ?
                                                   4000000002F58A60 ?
                                                   00002E0B7 ?
                                                   9FFFFFFFFFFEEDD0 ?
                                                   60000000000BA288 ?
                                                   000000001 ?
rpidrus()+368        CALL     opiodr()             000000066 ? 000000006 ?
                                                   4000000001C14510 ?
                                                   0000046C0 ?
                                                   9FFFFFFFFFFEF690 ?
                                                   60000000000BA288 ?
skgmstack()+288      CALL     rpidrus()            9FFFFFFFFFFF1DD0 ?
                                                   9FFFFFFFFFFF1810 ?
                                                   60000000000BA288 ?
                                                   9FFFFFFFFFFF1D90 ?
                                                   C000000000000716 ?
                                                   4000000002F92F80 ?
                                                   00002E07F ?
                                                   9FFFFFFFFFFF1870 ?
rpidru()+240         CALL     skgmstack()          9FFFFFFFFFFF1DF0 ?
                                                   6000000000030F50 ?
                                                   00000F618 ?
                                                   4000000001CCBAD0 ?
                                                   9FFFFFFFFFFF1DD0 ?
rpiswu2()+976        CALL     rpidru()             00000057B ?
                                                   C000000000001026 ?
                                                   C000000000001026 ?
                                                   4000000002F8FD10 ?
                                                   9FFFFFFFFFFF1DE8 ?
                                                   9FFFFFFFFFFF2C10 ?
rpidrv()+2352        CALL     rpiswu2()            9FFFFFFFFFFF2B00 ?
                                                   4000000002F92360 ?
                                                   00002F933 ?
                                                   9FFFFFFFFFFF2590 ?
                                                   60000000000329A8 ?
                                                   C000000000001ABD ?
                                                   4000000000F32248 ?
                                                   60000000000C6CE0 ?
psddr0()+464         CALL     rpidrv()             000000008 ? 000000066 ?
                                                   9FFFFFFFFFFF3710 ?
                                                   000000038 ?
                                                   9FFFFFFFFFFF2B30 ?
                                                   60000000000BA288 ?
psdnal()+736         CALL     psddr0()             000000008 ? 000000066 ?
                                                   9FFFFFFFFFFF3710 ?
                                                   000000030 ?
pevm_EXECC()+832     CALL     psdnal()             9FFFFFFFFFFF54E0 ?
                                                   C0000014FF9104E0 ?
                                                   C0000000000011AA ?
                                                   9FFFFFFFBF3D3060 ?
                                                   C00000142951A9F8 ?
                                                   4000000003159470 ?
                                                   00002C835 ?
pfrinstr_EXECC()+16  CALL     pevm_EXECC()         9FFFFFFFBF3D9F40 ?
0                                                  9FFFFFFFBF3D30D0 ?
                                                   000000020 ?
pfrrun_no_tool()+19  CALL     pfrinstr_EXECC()     9FFFFFFFBF3D9F40 ?
2                                                  C0000013A9EE034C ?
                                                   9FFFFFFFBF3D9FA8 ?
pfrrun()+1376        CALL     pfrrun_no_tool()     9FFFFFFFBF3D9F40 ?
                                                   000002001 ?
                                                   9FFFFFFFBF3D9FA8 ?
                                                   60000000000BA288 ?
                                                   C000000000000A1C ?
                                                   4000000003180540 ?
                                                   9FFFFFFFBF3DA390 ?
                                                   9FFFFFFFBF3DA008 ?
plsql_run()+1328     CALL     pfrrun()             9FFFFFFFFFFF3798 ?
                                                   9FFFFFFFFFFF3780 ?
                                                   60000000000BA288 ?
                                                   9FFFFFFFFFFF4380 ?
                                                   9FFFFFFFFFFF4380 ?
                                                   C000000000000E23 ?
                                                   4000000002C52D50 ?

根据MOS文档,ORA-600[ktrgcm_3]存在三个已知bug,其中和Bug 7225204 – OERI [ktrgcm_3] can occur [ID 7225204.8]的描述非常相似,当前报错的SQL语句就是INSERT SELECT语句,且INSERT对应的表在SELECT中也会出现,这与BUG的描述完全一致。
观察错误语句的执行计划:

============
Plan TABLE
============
---------------------------------------------------------------------+----------------------
| Id | Operation                                | Name               | ROWS | Bytes | Cost |
---------------------------------------------------------------------+----------------------
| 0  | INSERT STATEMENT                         |                    |      |       | 618K |
| 1  |  FILTER                                  |                    |      |       |      |
| 2  |   NESTED LOOPS SEMI                      |                    |    1 |   345 | 618K |
| 3  |    HASH JOIN OUTER                       |                    |    1 |   326 | 618K |
| 4  |     NESTED LOOPS OUTER                   |                    |    1 |   301 | 618K |
| 5  |      HASH JOIN OUTER                     |                    |    1 |   282 | 618K |
| 6  |       HASH JOIN SEMI                     |                    |    1 |   263 | 618K |
| 7  |        PARTITION RANGE SINGLE            |                    |   16 |  4000 |    4 |
| 8  |         TABLE ACCESS BY LOCAL INDEX ROWID| TF_BH_TRADE        |   16 |  4000 |    4 |
| 9  |          INDEX RANGE SCAN                | IDX_TF_BH_TRADE_ACC|    1 |       |    3 |
| 10 |        VIEW                              |                    |  13K |  169K | 618K |
| 11 |         MINUS                            |                    |      |       |      |
| 12 |          SORT UNIQUE                     |                    |  13K |  534K |      |
| 13 |           FILTER                         |                    |      |       |      |
| 14 |            HASH JOIN                     |                    |  13K |  534K | 617K |
| 15 |             TABLE ACCESS FULL            | TF_B_TRADE_BATDEAL |  13K |  377K | 441K |
| 16 |             PARTITION RANGE ALL          |                    | 172M | 2063M | 174K |
| 17 |              INDEX FAST FULL SCAN        | PK_TF_BH_TRADE     | 172M | 2063M | 174K |
| 18 |          SORT UNIQUE                     |                    | 260K | 2599K |      |
| 19 |           INDEX FAST FULL SCAN           | IDX_T_B_T_ID       | 260K | 2599K |  231 |
| 20 |       TABLE ACCESS FULL                  | TF_B_T_STATE_TRANS |    1 |    19 |    2 |
| 21 |      TABLE ACCESS BY INDEX ROWID         | TD_B_PRODUCT       |    1 |    19 |    2 |
| 22 |       INDEX UNIQUE SCAN                  | PK_TD_B_PRODUCT    |    1 |       |    1 |
| 23 |     TABLE ACCESS FULL                    | TF_B_TRADE_REL     | 116K | 2888K |  169 |
| 24 |    TABLE ACCESS BY INDEX ROWID           | TD_SD_COMMPARA     |    8 |   152 |    2 |
| 25 |     INDEX RANGE SCAN                     | PK_TD_SD_COMMPARA  |    8 |       |    1 |
---------------------------------------------------------------------+----------------------

配合报错函数信息,不难确认,问题发生在对T_B_T_IN表的索引快速全扫描上。
但是这个错误影响10.2.0.4和11.1.0.7,Oracle在10.2.0.5中应该已经修正了这个错误。观察文档Bug 6157713 – Wrong result from ORA_ROWSCN [ID 6157713.8]的描述,虽然版本信息符合,但是当前并没有调用ORA_ROWSCN伪列,不过这个BUG的描述中,又指向了上面7225204的bug。
最后检查文档Bug 14076510 – ORA-600 [ktrgcm_3] in 10.2.0.5.3 – 10.2.0.5.7 [ID 14076510.8],可以看到这个bug描述在10.2.0.5.3之后出现的ktrgcm_3错误的原因,根据三篇文章的描述,以及当前的现象,感觉三篇文档描述的应该是都一个bug,Oracle在处理自关联查询时引发了这个错误。而这个bug并没有在10.2.0.5中被真正修复。
除了升级到10.2.0.5.8以及11.2外,如果可能,修改语句避免INSERT SELECT语句访问插入表。此外Oracle还提供了设置隐含参数”_row_cr”为FALSE的方式,这种方式会带来性能的影响,如果这个错误频繁发生,也可以考虑进行设置。

Posted in BUG | Tagged , , , , | Leave a comment

ORA-600(krbb2ec_stamp_mismtach)错误

备份归档日志导致的ORA-600错误。
错误信息如下:

Tue Aug 16 02:43:58 2011
ALTER SYSTEM ARCHIVE LOG
Tue Aug 16 02:44:01 2011
Thread 1 advanced TO log SEQUENCE 5940 (LGWR switch)
  CURRENT log# 7 seq# 5940 mem# 0: /dev/orcl3vg1/rdb3vg1_1_redo71
  CURRENT log# 7 seq# 5940 mem# 1: /dev/orcl3vg2/rdb3vg2_1_redo72
Tue Aug 16 03:43:44 2011
Thread 1 advanced TO log SEQUENCE 5941 (LGWR switch)
  CURRENT log# 8 seq# 5941 mem# 0: /dev/orcl3vg2/rdb3vg2_1_redo81
  CURRENT log# 8 seq# 5941 mem# 1: /dev/orcl3vg3/rdb3vg3_1_redo82
Tue Aug 16 03:47:06 2011
Errors IN file /oraclelog/admin/orcl3/udump/orcl31_ora_14006.trc:
ORA-00600: internal error code, arguments: [krbb2ec_stamp_mismtach], [13092], [758175879], [], [], [], [], []
Tue Aug 16 03:47:07 2011
Trace dumping IS performing id=[cdmp_20110816034707]

根据MOS文档Bug 6793372 – RMAN backup archivelog fails ORA-19571 / OERI[krbb2ec_stamp_mismtach] [ID 6793372.8],在使用RMAN备份归档路径时,可能会导致这个ORA-600[krbb2ec_stamp_mismtach]错误。
从错误发生之前出现的ALTER SYSTEM ARCHIVE LOG信息可以推断,问题发生之前,确实在进行数据库的备份操作。且当前数据库版本为10.2.0.4,是确认问题发生的版本之一,其他确认的版本包括10.1.0.5和11.1.0.6。
没有想到备份归档日志这种想到简单的操作也会导致ORA-600错误,而且严重怀疑Oracle的报错函数的名称出现了错误,理论上应该是krbb2ec_stamp_mismatch才对。
Oracle在11.2.0.1、11.1.0.7和10.2.0.5中解决了这个问题。不过由于这个错误并非每次重现,且出现概率很低,也可以选择忽略这个错误。

Posted in BUG | Tagged , , , | Leave a comment

ORA-600(koksadqb1)错误

使用utlrp.sql编译失效对象引发的ORA-600错误。
错误信息如下:

Sat DEC 19 05:29:59 2009
Completed: ALTER DATABASE OPEN
Sat DEC 19 05:30:20 2009
SERVER COMPONENT id=UTLRP_BGN: TIMESTAMP=2009-12-19 05:30:20
Sat DEC 19 05:31:09 2009
Errors IN file /oraclelog/admin/orcl3/bdump/orcl31_cjq0_24952.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable TO allocate 576 bytes OF shared memory ("shared pool","update seq$ set increment$=:...","sql area","ckydef : kkdlcky")
Sat DEC 19 05:31:10 2009
Errors IN file /oraclelog/admin/orcl3/bdump/orcl31_cjq0_24952.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable TO allocate 32 bytes OF shared memory ("shared pool","select job, nvl2(last_date, ...","sql area","tmp")
Sat DEC 19 05:31:14 2009
Errors IN file /oraclelog/admin/orcl3/bdump/orcl31_j001_25676.trc:
ORA-00600: internal error code, arguments: [koksadqb1], [4031], [], [], [], [], [], []
Sat DEC 19 05:31:14 2009
Errors IN file /oraclelog/admin/orcl3/bdump/orcl31_j008_25697.trc:
ORA-12012: error ON auto EXECUTE OF job 136625
ORA-04031: unable TO allocate ORA-04031: unable TO allocate 248 bytes OF shared memory ("shared pool","update seq$ set increment$=:...","Typecheck","qsmksol : qsmg_alloc_sol")
ORA-06512: at "SYS.UTL_RECOMP", line 636
ORA-06512: at line 1
 bytes OF shared memory ("","","","")
Sat DEC 19 05:31:15 2009
Errors IN file /oraclelog/admin/orcl3/bdump/orcl31_cjq0_24952.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable TO allocate 1048 bytes OF shared memory ("shared pool","select job, nvl2(last_date, ...","Typecheck","kgghteInit")
Sat DEC 19 05:31:23 2009
Trace dumping IS performing id=[cdmp_20091219053123]
Sat DEC 19 05:31:39 2009
SERVER COMPONENT id=UTLRP_END: TIMESTAMP=2009-12-19 05:31:39
Sat DEC 19 05:35:09 2009
SERVER COMPONENT id=UTLRP_BGN: TIMESTAMP=2009-12-19 05:35:09
SERVER COMPONENT id=UTLRP_END: TIMESTAMP=2009-12-19 05:35:18

这个ORA-600错误在METALINK上找不到任何记录,不过从之前的ORA-4031错误,以及这个错误的第二个参数可以判断,导致这个错误的原因就是ORA-4031错误。
根据错误之前的信息可以看到,数据库刚刚启动完毕,而马上就出现ORA-4031错误是很不正常的,检查了启动参数中共享池的配置,更是达到了3G以上,这就更没有道理出现ORA-4031错误了。
观察报错之前的上一条信息发现,原来在报错之前在运行UTLRP.SQL脚本对失效对象进行编译。这个现象和一个未确定的BUG描述Bug 8442907 : ORA-4031 RUNNING UTLRP.SQL OR ANY PX QUERY非常接近。
虽然第一次运行编译脚本失败,根据日志信息,马上进行的第二次UTLRP.SQL调用成功了,这说明无论这个ORA-600还是ORA-4031错误的出现都是比较偶然的,考虑到Oracle没有任何关于当前问题的描述,如果碰到了类似的问题,可以考虑在调用UTLRP的时候以串行方式运行。

Posted in BUG | Tagged , , , | Leave a comment

统计信息收集出现DFS等待导致实例HANG死

客户10.2.0.4 RAC环境,出现大量的library cache lock和cursor: pin S wait on X等待,经分析是由于统计信息收集僵死导致的。
数据库在8点到9点期间,数据库两个节点都存在明显的cursor: pin S wait on X和library cache lock的等待:

Event

Waits

Time(s)

Avg   Wait(ms)

%   Total Call Time

Wait   Class

cursor:   pin S wait on X

1,573,056

30,651

19

146.2

Concurrency

library   cache lock

31,757

7,009

221

33.4

Concurrency

CPU   time

6,416

30.6

DFS   lock handle

12,381

2,979

241

14.2

Other

latch:   library cache

1,646

1,974

1,199

9.4

Concurrency

Event

Waits

Time(s)

Avg   Wait(ms)

%   Total Call Time

Wait   Class

cursor:   pin S wait on X

17,586,451

342,437

19

812.1

Concurrency

library   cache lock

63,657

30,153

474

71.5

Concurrency

CPU   time

3,820

9.1

db   file sequential read

241,761

1,094

5

2.6

User   I/O

inactive   session

1,105

1,090

986

2.6

Other

两个节点的等待现象基本一致,而节点1上还存在明显的DFS lock handle等待事件。
通过分析ASH信息,发现library cache lock和cursor: pin S wait on X等待基本上都是6点之后才开始出现:

SQL&gt; SELECT trunc(sample_time, 'hh24') TIME, COUNT(*)
  2  FROM WRH$_ACTIVE_SESSION_HISTORY ash, wrh$_event_name en
  3  WHERE ash.event_id = en.event_id
  4  AND sample_time &gt;= to_timestamp('20130703', 'yyyymmdd')
  5  AND event_name IN ('cursor: pin S wait on X', 'library cache lock')
  6  GROUP BY trunc(sample_time, 'hh24')
  7  ORDER BY 1;
TIME                  COUNT(*)
------------------- ----------
2013-07-03 00:00:00          4
2013-07-03 01:00:00          8
2013-07-03 02:00:00          3
2013-07-03 03:00:00          8
2013-07-03 04:00:00          9
2013-07-03 05:00:00         14
2013-07-03 06:00:00        348
2013-07-03 07:00:00       2138
2013-07-03 08:00:00      41576
2013-07-03 09:00:00      50108
2013-07-03 10:00:00       2452

而观察出现cursor: pin S wait on X和library cache lock等待会话的BLOCKING SESSION,发现大部分处于等待的会话被会话37锁定:

SQL&gt; SELECT to_char(sample_time, 'hh24:mi:ss.ff') TIME, session_id sid, 
  2  decode(en.event_name, 'cursor: pin S wait on X', to_number(substr(ltrim(to_char(p2, '0xxxxxxxxxxxxxxx')), 1, 8), 'xxxxxxxx'), blocking_session) b_sid, 
  3  en.event_name  
  4  FROM WRH$_ACTIVE_SESSION_HISTORY ash, wrh$_event_name en
  5  WHERE ash.event_id = en.event_id(+)
  6  AND instance_number = 1
  7  AND sample_time &gt;= to_timestamp('201307030800', 'yyyymmddhh24mi')
  8  AND sample_time &lt;= to_timestamp('201307030801', 'yyyymmddhh24mi')
  9  ORDER BY 1;
TIME                      SID      B_SID EVENT_NAME
------------------ ---------- ---------- ------------------------------
08:00:09.569              457 4294967291 latch: library cache
08:00:09.569               52        981 cursor: pin S wait ON X
08:00:09.569              168         73 library cache LOCK
08:00:09.569             1763 4294967291 SQL*Net message FROM client
08:00:09.569              123 4294967295 kksfbc child completion
08:00:09.569              476 4294967291 kst: async disk IO
08:00:09.569              500 4294967292 db file sequential READ
08:00:09.569              327 4294967291 SQL*Net message TO client
08:00:09.569              233 4294967291 db file scattered READ
08:00:09.569               73         73 DFS LOCK handle
08:00:09.569             1889 4294967291 gc CURRENT block 2-way
08:00:09.569             1969        981 cursor: pin S wait ON X
08:00:09.569              857        981 cursor: pin S wait ON X
08:00:09.569              812        168 cursor: pin S wait ON X
08:00:09.569              981         73 library cache LOCK
08:00:19.669               52        981 cursor: pin S wait ON X
08:00:19.669              812        168 cursor: pin S wait ON X
08:00:19.669              857        981 cursor: pin S wait ON X
08:00:19.669             1969        981 cursor: pin S wait ON X
08:00:19.669              233 4294967291 db file scattered READ
08:00:19.669              500 4294967292 db file sequential READ
08:00:19.669             1763 4294967291 db file sequential READ
08:00:19.669              476 4294967291 kst: async disk IO
08:00:19.669              457 4294967291 kksfbc child completion
08:00:19.669               73         73 DFS LOCK handle
08:00:19.669              654 4294967291 SQL*Net message TO client
08:00:19.669              168         73 library cache LOCK
08:00:19.669              981         73 library cache LOCK
08:00:29.769             1969        981 cursor: pin S wait ON X
08:00:29.769               52        981 cursor: pin S wait ON X
08:00:29.769              457 4294967291 kksfbc child completion
08:00:29.769              981         73 library cache LOCK
08:00:29.769              168         73 library cache LOCK
08:00:29.769              500 4294967292 db file sequential READ
08:00:29.769               73         73 DFS LOCK handle
08:00:29.769              654 4294967291 SQL*Net message TO client
08:00:29.769              327 4294967291 SQL*Net message TO client
08:00:29.769             1763 4294967291 db file sequential READ
08:00:29.769              857        981 cursor: pin S wait ON X
08:00:29.769              123 4294967295 latch: library cache
08:00:29.769              196 4294967295 latch: library cache
08:00:29.769              233 4294967295 latch: library cache
08:00:29.769              812        168 cursor: pin S wait ON X
08:00:29.769              476 4294967291 kst: async disk IO
08:00:39.879              476 4294967291 kst: async disk IO
08:00:39.879              658 4294967291 db file sequential READ
08:00:39.879              500 4294967292 db file sequential READ
08:00:39.879             1969        981 cursor: pin S wait ON X
08:00:39.879              857        981 cursor: pin S wait ON X
08:00:39.879              812        168 cursor: pin S wait ON X
08:00:39.879               52        981 cursor: pin S wait ON X
08:00:39.879              981         73 library cache LOCK
08:00:39.879              168         73 library cache LOCK
08:00:39.879             1849 4294967291 SQL*Net message TO client
08:00:39.879              327 4294967291 SQL*Net message TO client
08:00:39.879              235 4294967291 rdbms ipc reply
08:00:39.879               19 4294967291 rdbms ipc reply
08:00:39.879               73         73 DFS LOCK handle
08:00:39.879              123 4294967294 enq: WF - contention
08:00:49.987              476 4294967291 kst: async disk IO
08:00:49.987              500 4294967292 db file sequential READ
08:00:49.987              382 4294967291 gc cr block busy
08:00:49.987             1969        981 cursor: pin S wait ON X
08:00:49.987              857        981 cursor: pin S wait ON X
08:00:49.987              812        168 cursor: pin S wait ON X
08:00:49.987              235 4294967295 kksfbc child completion
08:00:49.987              658 4294967295 latch: library cache
08:00:49.987               19 4294967291 latch: library cache
08:00:49.987              981         73 library cache LOCK
08:00:49.987              168         73 library cache LOCK
08:00:49.987             1889 4294967291 SQL*Net message FROM client
08:00:49.987               73         73 DFS LOCK handle
08:00:49.987               52        981 cursor: pin S wait ON X

可以判断,导致数据库实例HANG死的原因是因为SID为73的进程:

SQL&gt; SELECT to_char(sample_time, 'hh24:mi:ss'), program, action, event_name 
  2  FROM wrh$_active_session_history ash, wrh$_event_name en
  3  WHERE session_id = 73
  4  AND ash.event_id = en.event_id(+)
  5  AND sample_time &gt;= to_date('201307030550', 'yyyymmddhh24mi')
  6  AND sample_time &lt;= to_date('201307030601', 'yyyymmddhh24mi')
  7  ORDER BY 1;
TO_CHAR( PROGRAM                                  ACTION                         EVENT_NAME
-------- ---------------------------------------- ------------------------------ ---------------------------
05:50:07 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               db file sequential READ
05:50:17 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               db file sequential READ
05:50:27 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:50:37 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:50:47 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               gc cr block 2-way
05:50:57 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:51:08 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:51:18 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               db file scattered READ
05:51:28 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               gc cr multi block request
05:51:38 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               direct path WRITE temp
05:51:48 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:51:58 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:52:08 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:52:18 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               db file scattered READ
05:52:28 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               db file sequential READ
05:52:38 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:52:49 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:52:59 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               db file scattered READ
05:53:09 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:53:19 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               db file scattered READ
05:53:29 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:53:39 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:53:49 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               gc CURRENT GRANT 2-way
05:53:59 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:54:09 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:54:20 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:54:30 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:54:40 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:54:50 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:55:00 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:55:10 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               db file sequential READ
05:55:20 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:55:30 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:55:40 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:55:51 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:56:01 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:56:11 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:56:21 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:56:31 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:56:41 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:56:51 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               gc CURRENT GRANT 2-way
05:57:01 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               library cache pin
05:57:11 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               DFS LOCK handle
05:57:21 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               DFS LOCK handle
05:57:32 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:57:42 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:57:52 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               DFS LOCK handle
05:58:02 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               db file sequential READ
05:58:12 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               db file sequential READ
05:58:22 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:58:32 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:58:42 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:58:52 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:59:03 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:59:13 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:59:23 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:59:33 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
05:59:43 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               db file sequential READ
05:59:53 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
06:00:03 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               DFS LOCK handle
06:00:13 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               latch: library cache
06:00:23 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               DFS LOCK handle
06:00:34 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               DFS LOCK handle
06:00:44 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               DFS LOCK handle
06:00:54 oracle@orcldb5 (J002)                     GATHER_STATS_JOB               DFS LOCK handle

根据MOS文档,Bug 6011045 – DBMS_STATS causes deadlock between ‘cursor: pin S wait on X’ and ‘library cache lock’ [ID 6011045.8],这个问题是RAC环境中Oracle收集统计信息的BUG,在自动收集数据字典信息时,可能会出现进程HANG死,并导致大量的library cache lock和cursor: pin S wait on X等待,此外还可能出现DFS lock handle以及row cache lock的等待。
这个BUG确认影响版本包括10.2.0.4和10.2.0.5,这个BUG Oracle在10.2.0.5.5 PSU中进行了修复。

Posted in BUG | Tagged , , , , | Leave a comment

ORA-600(kposcws-2)错误

客户11.2.0.2 RAC环境出现ORA-600[kposcws-2]错误。
错误信息为:

Thu Sep 27 10:48:15 2012
Errors IN file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_34145026.trc  (incident=682292):
ORA-00600: 内部错误代码, 参数: [kposcws-2], [], [], [], [], [], [], [], [], [], [], []
Incident details IN: /u01/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_682292/orcl1_ora_34145026_i682292.trc

根据MOS文档Bug 6615409 – ORA-600 [kposcws-2] on scrollable cursor [ID 6615409.8],导致问题是由于使用了scrollable游标。
不过这个BUG应该在10.2.0.5和11.2.0.1中已经被修正,虽然问题在11.2.0.2中重现的可能性不大,但是根据已知的BUG描述,似乎BUG重现是最合理的解释。
由于没有11.2.0.2上问题出现的描述,因此也不会存在单独的补丁,唯一可以尝试的版本就是将数据库版本升级到11.2.0.3以上。

Posted in BUG | Tagged , , | Leave a comment

ORA-600(kollasg:client-side tmp lob)错误

客户11.2.0.3 RAC环境,出现ORA-600[kollasg:client-side tmp lob]错误。
错误信息为:

Thu Apr 28 20:54:53 2011
Errors IN file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_31916032.trc  (incident=148665):
ORA-00600: 内部错误代码, 参数: [kollasg:client-side tmp lob], [], [], [], [], [], [], [], [], [], [], []
Incident details IN: /u01/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_148665/orcl1_ora_31916032_i148665.trc
Thu Apr 28 20:54:59 2011
Trace dumping IS performing id=[cdmp_20110428205459]
Thu Apr 28 20:54:59 2011
Sweep [inc][148665]: completed
Sweep [inc2][148665]: completed

从错误信息上很容易判断,问题应该和客户端处理临时LOB有关。根据MOS文档ORA-00600 [kollasg:client-side tmp lob] Fetching Row With LOB Column in OCI Application [ID 1418135.1],在11.1以上版本中,如果读取一个没有被初始化过的LOB列,会引发这个600错误。
Oracle给出的解决方案是对于空的CLOB,使用EMPTY_CLOB函数对齐进行初始化。

Posted in BUG | Tagged , , , | Leave a comment

ORA-600(ksxp_rm_check0)错误

客户11.2.0.3 RAC的ASM实例出现ORA-600[ksxp_rm_check0]错误。
错误信息如下:

Tue Jul 02 19:22:59 2013
Errors IN file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_6488960.trc  (incident=165804):
ORA-04031: unable TO allocate 3000 bytes OF shared memory ("shared pool","select  INSTANCE_NUMBER , IN...","sga heap(1,0)","call")
USE ADRCI OR Support Workbench TO package the incident.
See Note 411.1 at My Oracle Support FOR error AND packaging details.
Tue Jul 02 19:22:59 2013
ERROR: Unable TO normalize symbol name FOR the following short stack (at offset 174):
dbgexProcessError()+180<-dbgeExecuteForError()+72<-dbgePostErrorKGE()+2048<-dbkePostKGE_kgsf()+68<-kgeadse()+380<-kgerinv_internal()+48<-kgerinv()+48<-kgeasnmierr()+72<-IPRA.$ksxp_rm_check()+188<-ksxp_rm_check()+80<-ksliwat()+9376<-kslwaitctx()+180<-kmgs_immediate_req()+2396<-ksmasg()+144<-kghnospc()+632<-kghalo()+1964<-ksmdacnk()+276<-ksmdget()+628<-kssadpm()+324<-ksucrp()+904<-ksvrdp()+156<-opirip()+724<-opidrv()+608<-sou2o()+136<-opimai_real()+188<-ssthrdmain()+268<-main()+204<-__start()+112
Errors IN file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_m000_22151498.trc  (incident=186272):
ORA-00600: internal error code, arguments: [ksxp_rm_check0], [0x110A6FE08], [0], [0x700000059D65FD8], [], [], [], [], [], [], [], []
Incident details IN: /u01/app/grid/diag/asm/+asm/+ASM2/incident/incdir_186272/+ASM2_m000_22151498_i186272.trc
USE ADRCI OR Support Workbench TO package the incident.
See Note 411.1 at My Oracle Support FOR error AND packaging details.
Tue Jul 02 19:23:01 2013
Process m000 died, see its trace file
Tue Jul 02 19:23:01 2013
Dumping diagnostic DATA IN directory=[cdmp_20130702192301], requested BY (instance=2, osid=22151498 (M000)), summary=[incident=186272].

从错误信息上判断,这个ORA-600错误和前面出现了ORA-4031应该有直接的关系。根据MOS文档Bug 12925089 – ORA-600 [ksxp_rm_check0] can occur on process startup [ID 12925089.8],当一个进程启动时除非了ORA-4031错误,就可能会导致ORA-600[ksxp_rm_check()]的错误。而当前是m000进程启动出现了ORA-4031错误。至于当前ASM的ORA-4031错误,是由于11.2.0.3默认的MEMORY_TARGET不足导致的。
这个错误影响的版本为11.2.0.2和11.2.0.3,Oracle在12.1.0.1中以及将要发布的11.2.0.4中解决了这个问题。其实要避免这个错误的根源是给ASM实例的MEMORY_TARGET分配足够的空间,至少应分配1536M以上。

Posted in BUG | Tagged , , , , , , , , | Leave a comment