ORA-600(4415)错误

客户的11.2的DATA GUARD数据库出现了这个错误。
一般DATA GUARD数据库出现错误的可能性不大,而ORA-600错误的可能性就更小了,而且一旦出现,多半意味着DATA GUARD环境损坏,轻则需要恢复,重则需要重建。
不过这个错误并没有那么大危害:

Fri Sep 16 17:57:13 2011
Errors IN file /u01/app/oracle/diag/rdbms/ora190/ora190/trace/ora190_ora_18743334.trc (incident=288185):
ORA-00600: 内部错误代码, 参数: [4415], [], [], [], [], [], [], [], [], [], [], []
Incident details IN: /u01/app/oracle/diag/rdbms/ora190/ora190/incident/incdir_288185/ora190_ora_18743334_i288185.trc
USE ADRCI OR Support Workbench TO package the incident.
See Note 411.1 at My Oracle Support FOR error AND packaging details.
System State dumped TO trace file /u01/app/oracle/diag/rdbms/ora190/ora190/incident/incdir_288185/ora190_ora_18743334_i288185.trc
Errors IN file /u01/app/oracle/diag/rdbms/ora190/ora190/trace/ora190_ora_18743334.trc (incident=288186):
ORA-00603: ORACLE 服务器会话因致命错误而终止
ORA-00600: 内部错误代码, 参数: [4415], [], [], [], [], [], [], [], [], [], [], []
Incident details IN: /u01/app/oracle/diag/rdbms/ora190/ora190/incident/incdir_288186/ora190_ora_18743334_i288186.trc
Fri Sep 16 17:57:20 2011
Dumping diagnostic DATA IN directory=[cdmp_20110916175720], requested BY (instance=1, osid=18743334), summary=[incident=288185].
Fri Sep 16 17:57:21 2011
Sweep [inc][288186]: completed
Sweep [inc][288185]: completed
Sweep [inc2][288185]: completed
opiodr aborting process UNKNOWN ospid (18743334) AS a RESULT OF ORA-603

查询metalink,发现导致问题的原因是对只读打开的STANDBY数据库执行了COMMIT操作,详细内容可以参考ID 9531380.8,这个Bug 9531380在11.2.0.3中将被解决。
查询详细信息,验证是否由于这个问题所致:

bash-3.2$ more incident/incdir_288186/ora1_ora_18743334_i288186.trc
Dump file /u01/app/oracle/diag/rdbms/ora1/ora1/incident/incdir_288186/ora1_ora_18743334_i288186.trc
Oracle DATABASE 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
WITH the Partitioning, Automatic Storage Management, OLAP, DATA Mining
AND REAL Application Testing options
ORACLE_HOME = /u01/app/oracle/product/11.2.0/dbhome_1
System name:    AIX
Node name:      node1
Release:        1
Version:        6
Machine:        00F6D0DF4C00
Instance name: ora1
Redo thread mounted BY this instance: 1
Oracle process NUMBER: 23
Unix process pid: 18743334, image: oracle@node1
*** 2011-09-16 17:57:20.043
*** SESSION ID:(1634.93) 2011-09-16 17:57:20.043
*** CLIENT ID:() 2011-09-16 17:57:20.043
*** SERVICE NAME:() 2011-09-16 17:57:20.043
*** MODULE NAME:(TOAD 9.7.2.5) 2011-09-16 17:57:20.043
*** ACTION NAME:() 2011-09-16 17:57:20.043
Dump continued FROM file: /u01/app/oracle/diag/rdbms/ora1/ora1/trace/ora1_ora_18743334.trc
ORA-00603: ORACLE 服务器会话因致命错误而终止
ORA-00600: 内部错误代码, 参数: [4415], [], [], [], [], [], [], [], [], [], [], []
========= Dump FOR incident 288186 (ORA 603) ========
*** 2011-09-16 17:57:20.051
dbkedDefDump(): Starting incident DEFAULT dumps (flags=0x2, level=3, mask=0x0)
----- SQL Statement (None) -----
CURRENT SQL information unavailable - no cursor.
----- Call Stack Trace -----
calling              CALL     entry                argument VALUES IN hex      
location             TYPE     point                (? means dubious VALUE)     
-------------------- -------- -------------------- ----------------------------
skdstdst()+40        bl       107c8d960            FFFFFFFFFFF1DF8 ? 000002004 ?
                                                   000000001 ? 000000003 ?
                                                   000000000 ? 000000002 ?
                                                   000000001 ? 000000000 ?
ksedst1()+104        CALL     skdstdst()           FFFFFFFFFFF0E00 ? 000002004 ?
                                                   110A45A80 ? 109FD7514 ?
                                                   110A45A80 ? 000000000 ?
                                                   FFFFFFFFFFF0F30 ? 700000007 ?
ksedst()+40          CALL     ksedst1()            3030000000000 ? 002050033 ?
                                                   109FD7508 ? 700000000025C ?
                                                   000000000 ? 000000000 ?
                                                   109FD6B68 ? 000000000 ?
dbkedDefDump()+2828  CALL     ksedst()             FFFFFFFFFFF0FE0 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 300000003 ?
ksedmp()+76          CALL     dbkedDefDump()       310A45A80 ? 110001050 ?
                                                   FFFFFFFFFFF15E0 ?
                                                   28404040FFFF17BC ?
                                                   100148408 ? 109651628 ?
                                                   FFFFFFFFFFF1630 ? 11064B0D8 ?
ksfdmp()+88          CALL     ksedmp()             000000000 ? 000000000 ?
                                                   009651643 ? 10A7D6B10 ?
                                                   200000000000000 ? 000000000 ?
                                                   110C0C3D8 ? 110A45A80 ?
dbgexPhaseII()+1212  CALL     ksfdmp()             000002004 ? 110A45A80 ?
                                                   000000000 ? FFFFFFFFFFF17A8 ?
                                                   FFFFFFFFFFF16D0 ?
                                                   FFFFFFFFFFF1DF8 ? 1001D0358 ?
                                                   110C0C3D8 ?
dbgexProcessError()  CALL     dbgexPhaseII()       110A45A80 ? 110C0A5E8 ?
+3604                                              0000465BA ? 200000000 ?
                                                   FFFFFFFFFFF23A8 ? 00000007F ?
                                                   FFFFFFFFFFF4FB0 ?
                                                   2480442200000000 ?
dbgeExecuteForError  CALL     dbgexProcessError()  110A45A80 ? 110C0C3D8 ?
()+72                                              14FB82240 ? 02EDA2378 ?
                                                   70000054FB820D0 ?
                                                   7000005232EA080 ? 000000000 ?
                                                   110C0E120 ?
dbgePostErrorKGE()+  CALL     dbgeExecuteForError  FFFFFFFFFFF5830 ? 000000000 ?
1152                          ()                   100135724 ? 000000000 ?
                                                   000000000 ? 11064B0D8 ?
                                                   FFFFFFFFFFF5840 ? 000000000 ?
dbkePostKGE_kgsf()+  CALL     dbgePostErrorKGE()   FFFFFFFFFFF63D0 ? 1100010F8 ?
64                                                 25BFFFF6450 ?
                                                   2820402800000000 ?
                                                   1000CC71C ? 000000002 ?
                                                   02424A228 ? 02424A228 ?
kgeade()+812         CALL     dbkePostKGE_kgsf()   100047FA0 ? FFFFFFFFFFFA634 ?
                                                   10980D050 ? 000000048 ?
                                                   FFFFFFFFFFFA230 ? 10969BAE8 ?
                                                   FFFFFFFFFFF6698 ? 10A0626E8 ?
kgefec()+208         CALL     kgeade()             FFFFFFFFFFF65A0 ? 11067C838 ?
                                                   000000001 ? 11064B0D8 ?
                                                   FFFFFFFFFFF65A0 ? 11064B0D8 ?
                                                   70000054FB820D0 ?
                                                   7000005232EA080 ?
ktc_die()+168        CALL     kgefec()             1100010F8 ? 110CA2518 ?
                                                   0FFFF6BA0 ? 10A0626E8 ?
                                                   70000054C772AD0 ? 10A0626E8 ?
                                                   10A06274C ? 000000000 ?
k2send()+9004        CALL     ktc_die()            218B2B4B0 ? 100000000 ?
                                                   1103870F8 ? 110CB8608 ?
                                                   110CB8008 ? 110CB8648 ?
                                                   FFFFFFFFFFF6AE0 ? 11010AD68 ?
xctRollbackTxn()+11  CALL     k2send()             300000001 ? 10975AC10 ?
48                                                 000000000 ? 000000000 ?
                                                   FFFFFFFFFFF7768 ?
                                                   FFFFFFFFFFF773C ?
                                                   700000000003660 ? 000000080 ?
kpoltxen()+228       CALL     xctRollbackTxn()     000000000 ? FFFFFFFFFFF87D0 ?
                                                   000000000 ? 000000000 ?
                                                   000000001 ? 000000000 ?
                                                   000000006 ? 07FFFFFFF ?
kpotxen()+2016       CALL     kpoltxen()           FFFFFFFFFFF8770 ? 000000048 ?
                                                   FFFFFFFFFFF8800 ? 1100F88B8 ?
                                                   0000003CF ? 000000000 ?
                                                   700000518B2B4B0 ? 000000000 ?
opiodr()+3608        CALL     kpotxen()            100000000 ? 1103757E8 ?
                                                   FFFFFFFFFFF89D0 ?
                                                   4820402000000000 ?
                                                   100D7C570 ? 9001000A0082608 ?
                                                   9001000A000D4C8 ? 11064B0D8 ?
ttcpip()+4628        CALL     opiodr()             68FFFF9F30 ? C00000438 ?
                                                   FFFFFFFFFFFA480 ? 000200048 ?
                                                   FFFFFFFFFFF9F00 ? 000000000 ?
                                                   FFFFFFFFFFF9E50 ? 1101042F8 ?
opitsk()+6956        CALL     ttcpip()             1101041A8 ? 110A49138 ?
                                                   FFFFFFFFFFFA400 ?
                                                   4224244400000000 ?
                                                   100113A34 ? 000000000 ?
                                                   FFFFFFFFFFFA460 ? 1100056F8 ?
opiino()+3000        CALL     opitsk()             110043D10 ? 000000000 ?
                                                   110A45A80 ? FFFFFFFFFFFC560 ?
                                                   FFFFFFFFFFFE5AC ? 000000101 ?
                                                   FFFFFFFFFFFDF30 ?
                                                   FFFFFFFFFFFE5AC ?
opiodr()+3608        CALL     opiino()             3C00000000 ?
                                                   FFFFFFFFFFFD065 ?
                                                   FFFFFFFFFFFE9D0 ? 110AC738D ?
                                                   FFFFFFFFFFFD0C0 ? 000000000 ?
                                                   90000000001094C ? 0E0DDF00D ?
opidrv()+1200        CALL     opiodr()             3C08456C78 ? 47530312F ?
                                                   FFFFFFFFFFFE9D0 ? 01064B0D8 ?
                                                   7264626D732F6F72 ?
                                                   11064B0D8 ?
                                                   3139302F74726163 ?
                                                   652F6F7261313930 ?
sou2o()+192          CALL     opidrv()             3C06401934 ? 400000000 ?
                                                   FFFFFFFFFFFE9D0 ?
                                                   2B00CF0000 ? 0001A9F48 ?
                                                   000000000 ? 9001000A0071F98 ?
                                                   11064B0D8 ?
opimai_real()+428    CALL     sou2o()              FFFFFFFFFFFEA40 ?
                                                   BADC0FFEE0DDF00D ?
                                                   90000000004D38C ?
                                                   BADC0FFEE0DDF00D ?
                                                   000000002 ? 9001000A0082608 ?
                                                   A0000000A000000 ? 10ACFAA60 ?
ssthrdmain()+340     CALL     opimai_real()        FFFFFFFFFFFFFFFF ?
                                                   9001000A0399148 ?
                                                   FFFFFFFFFFFEB30 ?
                                                   9001000A0082608 ?
                                                   900000000001368 ?
                                                   9001000A00091C0 ?
                                                   FFFFFFFFFFFEB50 ?
                                                   9001000A0082608 ?
main()+216           CALL     ssthrdmain()         2F00035C0 ? FFFFFFFFFFFEE88 ?
                                                   FFFFFFFFFFFEEF0 ?
                                                   9FFFFFFF000C4C0 ?
                                                   9FFFFFFF0000A30 ? 000000000 ?
                                                   000000000 ? 9FFFFFFF000C4C0 ?
__start()+112        CALL     main()               000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
--------------------- Binary Stack Dump ---------------------

从连接信息可以看出,是通过TOAD连接到数据库实例的,这说明事务可能是人为操作所致,虽然在SQL语句处没有任何信息,但是从Stack的dump中可以看到xctRollbackTxn函数,虽然Oracle并没有执行COMMIT,但是运行了ROLLBACK。在只读数据库中,COMMIT会导致错误的产生,那么回滚很可能导致相同的问题,在加上版本已经DATA GUARD中备库等环境,基本上可以确认问题了。
这个问题本身没有必要过多关注,因为在只读库上进行事务本身就存在问题。

Posted in BUG | Tagged , , , , , | Leave a comment

版本中删除对象导致当前和子版本中对象不可见

在文档中看到这部分的描述,不过有一个疑点不是很确认,于是验证了一下。
如果一个用户启用了版本,用户执行某个过程或函数时,如果在当前版本中找不到,自动会去父版本寻找,如果还找不到会去祖先版本中寻找,一直到ORA$BASE中都无法找到,才回报错对象不存在。
但是用户明确的删除操作会使得继承关系中断。一个用户如果删除当前版本中的对象,那么父版本中的对象并不会删除,只不过这个对象在当前版本中不在可见。
看到文档上的这个描述,我的疑问是,对于当前版本的子版本,能否判断父版本中对象是被删除,还是根本没有创建。换句话说,当前版本删除对象后,自版本能否绕过当前版本从父版本处继承对象。

-bash-3.2$ sqlplus test/test
SQL*Plus: Release 11.2.0.2.0 Production ON Thu Sep 15 21:04:58 2011
Copyright (c) 1982, 2010, Oracle.  ALL rights reserved.
Connected TO:
Oracle DATABASE 11g Enterprise Edition Release 11.2.0.2.0 - Production
WITH the Partitioning, OLAP, DATA Mining AND REAL Application Testing options
SQL> SET pages 100 LINES 120
SQL> CREATE USER u1 IDENTIFIED BY u1 DEFAULT tablespace users enable editions;
USER created.
SQL> GRANT CONNECT, resource, dba TO u1;
GRANT succeeded.
SQL> conn u1/u1
Connected.
SQL> SELECT sys_context('USERENV', 'CURRENT_EDITION_NAME') FROM dual;
SYS_CONTEXT('USERENV','CURRENT_EDITION_NAME')
------------------------------------------------------------------------
ORA$BASE
SQL> CREATE FUNCTION f1 RETURN varchar2 AS
  2  BEGIN
  3  RETURN 'Edition ORA$BASE';
  4  END;
  5  /
FUNCTION created.
SQL> SELECT f1 FROM dual;
F1
-----------------------------------------------------------------
Edition ORA$BASE
SQL> CREATE FUNCTION f2 RETURN varchar2 AS
  2  BEGIN
  3  RETURN 'Edition ORA$BASE';
  4  END;
  5  /
FUNCTION created.
SQL> SELECT f2 FROM dual;
F2
------------------------------------------------------------------
Edition ORA$BASE
SQL> CREATE edition e1;
Edition created.
SQL> ALTER SESSION SET edition = e1;
SESSION altered.
SQL> SELECT f1 FROM dual;
F1
-------------------------------------------------------------------
Edition ORA$BASE
SQL> SELECT f2 FROM dual;
F2
-------------------------------------------------------------------
Edition ORA$BASE
SQL> DROP FUNCTION f1;
FUNCTION dropped.
SQL> SELECT f1 FROM dual;
SELECT f1 FROM dual
       *
ERROR at line 1:
ORA-00904: "F1": invalid identifier
SQL> SELECT sys_context('USERENV', 'CURRENT_EDITION_NAME') FROM dual;
SYS_CONTEXT('USERENV','CURRENT_EDITION_NAME')
----------------------------------------------------------------
E1
SQL> CREATE edition e2 AS child OF e1;
Edition created.
SQL> ALTER SESSION SET edition = e2;  
SESSION altered.
SQL> SELECT f1 FROM dual;
SELECT f1 FROM dual
       *
ERROR at line 1:
ORA-00904: "F1": invalid identifier
SQL> SELECT f2 FROM dual;
F2
-----------------------------------------------------------------------
Edition ORA$BASE
SQL> ALTER SESSION SET edition = ora$base;
SESSION altered.
SQL> SELECT f1 FROM dual;
F1
-----------------------------------------------------------------------
Edition ORA$BASE

可以看到,在当前版本中删除对象后,父版本中对象仍然存在,但是继承关系中断,而当前版本的子版本也不可能再次继承该过程,当然版本中可以创建同名的对象,但是已经和父版本没有关系了。

Posted in ORACLE | Tagged , | Leave a comment

ITPUB名人堂第二期

感谢ITPUB将我选为名人堂第二期的嘉宾,其他废话就不多说了,下面是访谈内容。
杨总:
您好!很高兴有这个机会能对您进行采访,本次我们准备了以下一些问题需要您来解答,谢谢。
问题1:现在大家都是开始玩微博了,现在累计有两千多篇技术博客文章,您这么多年仍然坚持着每天至少一篇博客的好习惯,是什么样的一种动力让您这么多年一直坚持写下来呢?
答:其实我开始写博客的时候,博客已经流行一段时间了。一开始根本没有想过要申请一个博客,直到有一天自己通过查询文档解决了一个比较复杂的问题后,才发现自己面临一个问题:如果不把这个解决问题的步骤记录下来,那么过不了多长时间,很可能就会忘记一些关键的步骤,那么下次即使碰到同样的问题,对自己来说仍然可能需要重头开始。而如果将其保存在文本文件中,那么时间久了,查找起来就会很不方便,而当时的博客恰好可以满足我的需要,于是我的博客也就这样诞生了。
刚开始的时候手头现有的值得记录的东西较多,于是博客更新的就比较勤。有一段时间比较忙,博客的更新也就基本上停顿了。但是慢慢我发现,如果不去更新博客,那么自己就会相应的比较懒,而工作之外花在学习Oracle上的时间也会随之减少。意识到这一点后,我恢复了博客的更新,并给自己定下了每天更新一篇的目标,于是我的博客变成了督促我每天学习Oracle知识,总结Oracle经验的工具。刚开始的时候坚持可能确实比较困难,但是坚持的时间越长,就越不会轻易放弃。当坚持的时间超过了一年,更新博客就已经变成一种习惯了。

问题2:您写的博客文章包含的内容很丰富,现在有很多人都将您的博客作为oracle的百科全书了。通过您多年的DBA工作经验,对数据库的开发和维护您有哪些方面的体会?
答:其实我是从数据库开发做起,经历了开发DBA的阶段,最终成为一个管理DBA的。因此可能涉猎过的东西比较杂一些,显得博客里面的东西比较多。其实绝大部分东西还都是很基础的东西,至于“百科全书”更是不敢当了。要说是“百科全书”应该是Oracle的官方文档,是metalink。
数据库开发主要涉及建模、表的逻辑和物理结构设计、PL/SQL代码编写、性能优化等方面。而管理或维护DBA的任务包括逻辑备份、物理备份、恢复、迁移和升级等。虽然从工作范围上,二者可以明确区分,但是很多时候很难界定到底属于开发范畴还是管理范围。而且如果想要数据库的后期维护成本小,前期的设计非常重要。也就是说如果DBA在设计物理逻辑结构的时候多考虑一些后期维护工作,就可以把很多复杂且烦琐的维护工作消除在设计阶段。

问题3:凭借您这么多年的工作经历,目前ORACLE数据库在国内的使用在哪些领域,使用oracle数据库方面有什么优势,以及未来您觉得数据库的发展方向是什么?
答:Oracle目前这么大的市场占有率,只要是具有一定数据规模的行业,都会存在Oracle的身影。
如果要说Oracle数据库的优势,最主要还是体现在大数据量OLTP环境中,Oracle特有的锁机制和多版本读一致性,最大限度提升了系统所能承载的并发用户数。而对于小数据量的情况或OLAP环境中,Oracle并没有绝对的优势。
数据库的发展方向我无法预测,不过Oracle数据库的发展方向却比较明显。Oracle数据库会越来越智能、所集成的功能会越来越强大,而数据库和存储的一体化(Exadata)很可能是今后几年Oracle发展的主要方向。

问题4:现在有很多人都在说Oracle数据库现在越来越稳定,功能越来越先进,越来越自动,于是断言DBA这个行业将要逐渐消亡,请问您对此有何看法?DBA的前景如何?
答:说DBA行业将要逐渐消亡显然是杞人忧天,确实Oracle很稳定,而且随着新版本新功能的不断增加,使得原本很多需要人工设置的工作都可以自动完成了。但是我想问一个问题,大家可能都会使用类似crontab + shell + rman的方式每天自动备份数据库,但是有人会通过shell + rman来自动进行数据库的恢复吗?Broker早就支持DATA GUARD的STANDBY数据库自动FAILOVER,但是在产品环境中设置成自动FAILOVER的,我还从来都没有见过。
我到是觉得Oracle还远远不够自动。从8i到11g,看看Oracle的自动化做了哪些:内存空间的自动管理、UNDO空间自动管理、存储参数自动管理、RMAN备份和归档存放的自动管理、统计信息自动收集、AWR数据自动采集,最多在加上一个存储自动管理ASM。
同样看看从8i到11g,DBA所需要掌握的知识发生了哪些变化:RAC从最早的OPS发展到CLUSTER 到最新的GI,基本上一个大版本就会发生一次重大的结构上的改变;DATA GUARD从最早的 PHYSICAL发展到LOGICAL再到SNAPSHOT;数据复制从最早的ADVANCED REPLICATION发展到STREAM到现在GOLDENGATE;SQL提示从HINT到OUTLINE到SQL BASELINE;导入导出工具从EXP/IMP到EXPDP/IMPDP;统计信息工具从ANALYZE到DBMS_STATS;定时任务从JOB到SCHEDULER;性能数据从STATSPACK到AWR。这些还只不过是数据库已有功能的升级和变迁,在加上每个版本所增加的大量新特性,DBA到底是轻松了还是需要学习更多的知识就一目了然了吧。事实上,Oracle自动化的功能也不能掉以轻心,如果不了解实现的细节,一旦出现问题就可能造成严重的后果。
当然随着Oracle越来越自动化,以及很多图形工具使得日常管理越来越傻瓜化,可能会对初级DBA的职位有一定危险,如果Oracle都升级到了11g,你还抱着8或9的东西不放,那么被淘汰也是早晚的事情了。既然上了Oracle这个快行道,就义无反顾的走下去吧。

问题5:针对oracle数据库的学习,有哪些经验可以给大家分享一下?
答:如果能耐得下心,就去看Oracle的官方文档,从Concept看起,把Administrator、Backup and restore、Performance Tuning 等基础的东西看完,把里面所有能尝试的操作亲手做一遍,基本上就算是入门了。以后在有针对性的去看一下工具、高可用、开发、网络、安全之类的文档。这种方式的好处是基础比较扎实,而且获取信息的正确性更可控(官方文档同样存在错误,不过数量很少,初学者可无视),但是缺点也很明显,前期进度比较慢。因此前期可以去类似ITPUB这种专业论坛,多参与交流,多和别人沟通,有利用自己水平的更快提高。

问题6:通过看您的博客文章,发现您对oracle数据库的知识点掌握得很细,您对兴趣驱动型的知识学习和市场驱动型的知识学习,怎么评价?您觉得自己属于哪种类型?对将要进入这个行业的年轻人有什么寄予?
答:个人认为无论动机是什么,只要能保证积极主动的学习并且能保持相当长的一段时间,都会有所收获。只不过一般来说,如果是兴趣驱动的,那么主动性和持久性都会更好一些,如果学习、研究、探索都变成一种享受,那么想要提高水平只不过是水到渠成的事情了。
无论你是因为何种原因出于何种目的进入到Oracle数据库这个行业,希望你在Oracle学习的过程中培养出对Oracle的兴趣来,否则等待你的会是痛苦的且永不休止的学习过程。其实不只是数据库,任何一个行业都是一样的,想要获得成功而不想付出努力是行不通的。

Posted in NEWS | Leave a comment

11.2通过数据库链调用10.2过程报错

测试中无意发现了这个bug,在11.2数据库通过数据库链,执行10.2.0.1上的过程,结果出现ORA-06553: PLS-801: internal error [55916]的错误。
错误可以通过下面的例子重现:

-bash-3.2$ sqlplus / AS sysdba
SQL*Plus: Release 10.2.0.1.0 - Production ON Mon Sep 12 23:12:47 2011
Copyright (c) 1982, 2005, Oracle.  ALL rights reserved.
Connected TO:
Oracle DATABASE 10g Express Edition Release 10.2.0.1.0 – Production
SQL> CREATE USER test IDENTIFIED BY test DEFAULT tablespace users;
USER created.
SQL> SHOW parameter service
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
service_names                        string      XE
SQL> GRANT CONNECT, resource, dba TO test;
GRANT succeeded.
SQL> conn test/test
Connected.
SQL> CREATE PROCEDURE p1 AS
  2  BEGIN
  3  NULL;
  4  END;
  5  /
PROCEDURE created.

首先在10.2.0.1环境中创建一个存储过程。

SQL> CREATE DATABASE link xe CONNECT TO test IDENTIFIED BY test USING '192.168.0.20:1522/XE';
DATABASE link created.
SQL> EXEC p1@xe
BEGIN p1@xe; END;
*
ERROR at line 1:
ORA-04052: error occurred WHEN looking up remote object TEST.P1@XE
ORA-00604: error occurred at recursive SQL level 1
ORA-06544: PL/SQL: internal error, arguments: [55916], [], [], [], [], [], [], []
ORA-06553: PLS-801: internal error [55916]
ORA-02063: preceding 2 LINES FROM XE
SQL> SELECT * FROM dual@xe;
D
-
X
SQL> SELECT * FROM v$version;
BANNER
--------------------------------------------------------------------------------
Oracle DATABASE 11g Enterprise Edition Release 11.2.0.2.0 - Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS FOR Linux: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 – Production
SQL> SELECT * FROM v$version@xe;
BANNER
----------------------------------------------------------------
Oracle DATABASE 10g Express Edition Release 10.2.0.1.0 - Product
PL/SQL Release 10.2.0.1.0 - Production
CORE    10.2.0.1.0      Production
TNS FOR Linux: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production

在11.2中调用这个存储过程报错,尝试使用同义词或动态SQL,都无法绕过这个错误:

SQL> CREATE synonym s1 FOR p1@xe;
Synonym created.
SQL> EXEC s1
BEGIN s1; END;
*
ERROR at line 1:
ORA-04052: error occurred WHEN looking up remote object TEST.P1@XE
ORA-00604: error occurred at recursive SQL level 1
ORA-06544: PL/SQL: internal error, arguments: [55916], [], [], [], [], [], [], []
ORA-06553: PLS-801: internal error [55916]
ORA-02063: preceding 2 LINES FROM XE
SQL> BEGIN
  2  EXECUTE immediate 'begin p1@xe; end;';
  3  END;
  4  /
BEGIN
*
ERROR at line 1:
ORA-04052: error occurred WHEN looking up remote object TEST.P1@XE
ORA-00604: error occurred at recursive SQL level 2
ORA-06544: PL/SQL: internal error, arguments: [55916], [], [], [], [], [], [], []
ORA-06553: PLS-801: internal error [55916]
ORA-02063: preceding 2 LINES FROM XE
ORA-06512: at line 2

在metalink文档Bug 4511371中描述了这个问题:ORA-6544 / ORA-4052 using PLSQL between 10g and 11g。根据文档描述,在10.2.0.1中通过数据库链访问11.2的过程也会出现同样的错误,而解决方法除了升级到10.2.0.2及以上版本外,没有其他的解决方法。

Posted in BUG | Tagged , , , , , , | Leave a comment

从磁带恢复归档出现ORA-19870错误

从磁带中直接恢复归档日志时,出现了这个错误信息。
归档日志前后备份了两次,一次是在备份数据库后做的,另一次是凌晨7点左右进行的,其中第二次备份包含的归档日志比第一次多,最近的归档日志文件备份信息如下:

BS KEY  SIZE       Device TYPE Elapsed TIME Completion TIME    
------- ---------- ----------- ------------ -------------------
37761   3.18G      SBT_TAPE    00:01:22     2011-08-27 03:22:37
        BP KEY: 37761   STATUS: AVAILABLE  Compressed: NO  Tag: TAG20110827T032114
        Handle: al_43199_1_760245675   Media: 
  List OF Archived Logs IN backup SET 37761
  Thrd Seq     Low SCN    Low TIME            NEXT SCN   NEXT TIME
  ---- ------- ---------- ------------------- ---------- ---------
  1    132214  10200825285755 2011-08-27 00:35:39 10200825450980 2011-08-27 00:40:04  
  1    132215  10200825450980 2011-08-27 00:40:04 10200825451525 2011-08-27 00:40:06
  1    132216  10200825451525 2011-08-27 00:40:06 10200826640012 2011-08-27 01:05:42
  1    132217  10200826640012 2011-08-27 01:05:42 10200828085370 2011-08-27 02:08:11
  1    132218  10200828085370 2011-08-27 02:08:11 10200831622590 2011-08-27 03:05:09
  1    132219  10200831622590 2011-08-27 03:05:09 10200831803462 2011-08-27 03:21:02
  1    132220  10200831803462 2011-08-27 03:21:02 10200831803603 2011-08-27 03:21:12
BS KEY  SIZE       Device TYPE Elapsed TIME Completion TIME    
------- ---------- ----------- ------------ -------------------
37763   4.22G      SBT_TAPE    00:06:40     2011-08-27 07:12:26
        BP KEY: 37763   STATUS: AVAILABLE  Compressed: NO  Tag: TAG20110827T070545
        Handle: al_43201_1_760259146   Media: 
  List OF Archived Logs IN backup SET 37763
  Thrd Seq     Low SCN    Low TIME            NEXT SCN   NEXT TIME
  ---- ------- ---------- ------------------- ---------- ---------
  1    132214  10200825285755 2011-08-27 00:35:39 10200825450980 2011-08-27 00:40:04
  1    132215  10200825450980 2011-08-27 00:40:04 10200825451525 2011-08-27 00:40:06
  1    132216  10200825451525 2011-08-27 00:40:06 10200826640012 2011-08-27 01:05:42
  1    132217  10200826640012 2011-08-27 01:05:42 10200828085370 2011-08-27 02:08:11
  1    132218  10200828085370 2011-08-27 02:08:11 10200831622590 2011-08-27 03:05:09
  1    132219  10200831622590 2011-08-27 03:05:09 10200831803462 2011-08-27 03:21:02
  1    132220  10200831803462 2011-08-27 03:21:02 10200831803603 2011-08-27 03:21:12
  1    132221  10200831803603 2011-08-27 03:21:12 10200839298501 2011-08-27 05:59:17
  1    132222  10200839298501 2011-08-27 05:59:17 10200851993798 2011-08-27 07:05:40
  1    132223  10200851993798 2011-08-27 07:05:40 10200851993847 2011-08-27 07:05:44

而且这两个备份并不在同一个磁带中,前面的一份归档备份和全库备份同处于一个磁带中,因此在恢复的过程中,直接将这盘磁带取出,在另一台服务器上进行恢复。
全库的还原没有碰到问题,但是恢复归档日志则碰到了下面的错误:

RMAN> run
2> {
3>         ALLOCATE CHANNEL ch00 TYPE 'SBT_TAPE';
4>         SEND 'NB_ORA_SERV=nbusrv, NB_ORA_CLIENT=ccicdbsrv5';
5>         restore archivelog FROM SEQUENCE 132214  until SEQUENCE 132220;
6>         RELEASE CHANNEL ch00;
7> }
allocated channel: ch00
channel ch00: sid=1630 devtype=SBT_TAPE
channel ch00: Veritas NetBackup FOR Oracle - Release 6.5 (2007072323)
sent command TO channel: ch00
Starting restore at 2011-09-11 21:03:48
channel ch00: starting archive log restore TO DEFAULT destination
channel ch00: restoring archive log
archive log thread=1 SEQUENCE=132214
channel ch00: restoring archive log
archive log thread=1 SEQUENCE=132215
channel ch00: restoring archive log
archive log thread=1 SEQUENCE=132216
channel ch00: restoring archive log
archive log thread=1 SEQUENCE=132217
channel ch00: restoring archive log
archive log thread=1 SEQUENCE=132218
channel ch00: restoring archive log
archive log thread=1 SEQUENCE=132219
channel ch00: restoring archive log
archive log thread=1 SEQUENCE=132220
channel ch00: reading FROM backup piece al_43201_1_760259146
ORA-19870: error reading backup piece al_43201_1_760259146
ORA-19507: failed TO retrieve sequential file, handle="al_43201_1_760259146", parms=""
ORA-27029: skgfrtrv: sbtrestore returned error
ORA-19511: Error received FROM media manager layer, error text:
   Failed TO OPEN backup file FOR restore.
failover TO previous backup
channel ch00: starting archive log restore TO DEFAULT destination
channel ch00: restoring archive log
archive log thread=1 SEQUENCE=132214
channel ch00: restoring archive log
archive log thread=1 SEQUENCE=132215
channel ch00: restoring archive log
archive log thread=1 SEQUENCE=132216
channel ch00: restoring archive log
archive log thread=1 SEQUENCE=132217
channel ch00: restoring archive log
archive log thread=1 SEQUENCE=132218
channel ch00: restoring archive log
archive log thread=1 SEQUENCE=132219
channel ch00: restoring archive log
archive log thread=1 SEQUENCE=132220
channel ch00: reading FROM backup piece al_43199_1_760245675
ORA-19870: error reading backup piece al_43199_1_760245675
ORA-19507: failed TO retrieve sequential file, handle="al_43199_1_760245675", parms=""
ORA-27029: skgfrtrv: sbtrestore returned error
ORA-19511: Error received FROM media manager layer, error text:
   Failed TO process backup file <al_43199_1_760245675>
failover TO previous backup
released channel: ch00
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure OF restore command at 09/11/2011 21:04:30
RMAN-06026: SOME targets NOT found - aborting restore
RMAN-06025: no backup OF log thread 1 seq 132220 lowscn 10200831803462 found TO restore
RMAN-06025: no backup OF log thread 1 seq 132219 lowscn 10200831622590 found TO restore
RMAN-06025: no backup OF log thread 1 seq 132218 lowscn 10200828085370 found TO restore
RMAN-06025: no backup OF log thread 1 seq 132217 lowscn 10200826640012 found TO restore
RMAN-06025: no backup OF log thread 1 seq 132216 lowscn 10200825451525 found TO restore
RMAN-06025: no backup OF log thread 1 seq 132215 lowscn 10200825450980 found TO restore
RMAN-06025: no backup OF log thread 1 seq 132214 lowscn 10200825285755 found TO restore

由于要恢复的归档日志在两个备份集中存在,因此RMAN尝试了两次进行恢复。第一次寻找al_43201_1_760259146文件,由于这个文件不在当前的磁带中,恢复失败是正常的,但是al_43199_1_760245675文件就存放在当前的磁带中,而仍然出现无法获取文件的错误就不正常了。
考虑到恢复环境和备份环境并不一致,备份时使用的带库而当前仅包含一个磁带,且磁带的catalog信息很可能也没有导入,因此对于源数据的查找可能存在问题。
通过CHANGE BACKUP TAG ‘TAG20110827T070545’ UNAVAILABLE的方式,避免恢复过程中去寻找目前并不存在的备份。再次使用RMAN执行同样的恢复操作,这次恢复成功。
恢复应该在和备份等同条件下进行,否则至少应该保证带库的CATALOG数据导入,否则就可能出现这种跨磁带查找文件时出现的错误,而利用RMAN直接在当前的磁带中扫描文件则不受影响。

Posted in ORACLE | Tagged , , | Leave a comment

包中添加新过程应在尾部添加

在看高级应用开发手册时,在对象依赖部分,注意到这个细节。
其他对象依赖包中的过程或函数,如果这个过程或函数在包中的位置发生了变化,那么依赖当前过程或函数的对象会被置为INVALID状态。

SQL> CREATE OR REPLACE PACKAGE P_TEST AS
  2     PROCEDURE P1;
  3     FUNCTION F1 RETURN NUMBER;
  4  END;
  5  /
Package created.
SQL> CREATE OR REPLACE FUNCTION F2 RETURN NUMBER AS
  2  BEGIN
  3     RETURN P_TEST.F1;
  4  END;
  5  /
FUNCTION created.
SQL> SELECT OBJECT_NAME, STATUS      
  2  FROM USER_OBJECTS
  3  WHERE OBJECT_NAME IN ('P_TEST', 'F2');
OBJECT_NAME                    STATUS
------------------------------ -------
F2                             VALID
P_TEST                         VALID
SQL> CREATE OR REPLACE PACKAGE P_TEST AS
  2     PROCEDURE P1;
  3     FUNCTION F1 RETURN NUMBER;
  4     PROCEDURE P3;
  5  END;
  6  /
Package created.
SQL> SELECT OBJECT_NAME, STATUS
  2  FROM USER_OBJECTS
  3  WHERE OBJECT_NAME IN ('P_TEST', 'F2');
OBJECT_NAME                    STATUS
------------------------------ -------
F2                             VALID
P_TEST                         VALID
SQL> CREATE OR REPLACE PACKAGE P_TEST AS
  2     PROCEDURE P1;
  3     PROCEDURE P2;
  4     FUNCTION F1 RETURN NUMBER;
  5     PROCEDURE P3;
  6  END;
  7  /
Package created.
SQL> SELECT OBJECT_NAME, STATUS
  2  FROM USER_OBJECTS
  3  WHERE OBJECT_NAME IN ('P_TEST', 'F2');
OBJECT_NAME                    STATUS
------------------------------ -------
F2                             INVALID
P_TEST                         VALID

如果将新增过程置于包的尾部,则不会影响依赖关系。而如果将新增过程置于某些已有过程或函数的前面,则所有依赖这些过程或函数的对象将被置于INVALID状态,这是由于依赖关系和过程或函数在包中的位置有关。
不过这种位置只是过程或函数的排序,和具体所在行数无关,而且只和过程或函数有关,如果新增变量则不会产生影响:

SQL> ALTER FUNCTION F2 COMPILE;
FUNCTION altered.
SQL> SELECT OBJECT_NAME, STATUS
  2  FROM USER_OBJECTS
  3  WHERE OBJECT_NAME IN ('P_TEST', 'F2');
OBJECT_NAME                    STATUS
------------------------------ -------
F2                             VALID
P_TEST                         VALID
SQL> CREATE OR REPLACE PACKAGE P_TEST AS
  2     G_NUM NUMBER;
  3     PROCEDURE P1;
  4     PROCEDURE P2;
  5     FUNCTION F1 RETURN NUMBER;
  6     PROCEDURE P3;
  7  END;
  8  /
Package created.
SQL> SELECT OBJECT_NAME, STATUS
  2  FROM USER_OBJECTS
  3  WHERE OBJECT_NAME IN ('P_TEST', 'F2');
OBJECT_NAME                    STATUS
------------------------------ -------
F2                             VALID
P_TEST                         VALID
Posted in ORACLE | Tagged , , | Leave a comment

Oracle Exadata的TABLE ACCESS STORAGE FULL执行计划

这个TABLE ACCESS STORAGE FULL的执行计划只有在ORACLE EXADATA上才回出现。
Oracle在Exadata上增加了一个硬件Exadata Programmable Storage Server,使得在存储系统可以变得更加智能。以往在进行全表扫描时,即使存在过滤条件,也需要将全部数据读到数据库服务器端,才能过滤掉无用的数据。但是通过这个硬件和存储软件的配合,使得这种过滤直接在存储层进行,而返回给数据库服务器的则是查询需要的结果。一方面在存储直接过滤提高访问性能,另一方面使得返回个服务器的数据量大大下降,这也是Exadata进行全表扫描性能优异的重要原因之一。
在昨天练手的时候,记录了一下这个执行计划,而这个执行计划在自己的测试环境中是不可能出现的:

SQL> SELECT COUNT(*) FROM t;
  COUNT(*)
---------- 
  49527761
Elapsed: 00:00:02.28
Execution Plan
----------------------------------------------------------    
Plan hash VALUE: 2966233522
---------------------------------------------------------------------------
| Id  | Operation                  | Name | ROWS  | Cost (%CPU)| TIME     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |      |     1 |   114K  (1)| 00:22:59 |
|   1 |  SORT AGGREGATE            |      |     1 |            |          |
|   2 |   TABLE ACCESS STORAGE FULL| T    |   228M|   114K  (1)| 00:22:59 |
---------------------------------------------------------------------------
Note
-----
   - dynamic sampling used FOR this statement (level=2)
Statistics
----------------------------------------------------------     
          0  recursive calls
          0  db block gets
     418736  consistent gets
          0  physical reads
          0  redo SIZE
        529  bytes sent via SQL*Net TO client
        524  bytes received via SQL*Net FROM client
          2  SQL*Net roundtrips TO/FROM client
          0  sorts (memory)
          0  sorts (disk)
          1  ROWS processed
SQL> SELECT COUNT(*) FROM t WHERE owner = 'TEST';
  COUNT(*)
----------
       611
Elapsed: 00:00:02.83
Execution Plan
----------------------------------------------------------
Plan hash VALUE: 2966233522
-----------------------------------------------------------------------------------
| Id  | Operation                  | Name | ROWS  | Bytes | Cost (%CPU)| TIME     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |      |     1 |    17 |   115K  (2)| 00:23:07 |
|   1 |  SORT AGGREGATE            |      |     1 |    17 |            |          |
|*  2 |   TABLE ACCESS STORAGE FULL| T    |  6670 |   110K|   115K  (2)| 00:23:07 |
-----------------------------------------------------------------------------------
Predicate Information (IDENTIFIED BY operation id):
---------------------------------------------------
   2 - storage("OWNER"='TEST')
       FILTER("OWNER"='TEST')
Note
-----
   - dynamic sampling used FOR this statement (level=2)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
     418736  consistent gets
          0  physical reads
          0  redo SIZE
        527  bytes sent via SQL*Net TO client
        524  bytes received via SQL*Net FROM client
          2  SQL*Net roundtrips TO/FROM client
          0  sorts (memory)
          0  sorts (disk)
          1  ROWS processed

可以看到Predicate Information中,访问和过滤条件分别是storage(“OWNER”=’TEST’)和filter(“OWNER”=’TEST’),这说明限制条件被推到了存储层执行,也正是这个原因,使得Oracle估算的访问行数没有太大的偏差。

Posted in ORACLE | Tagged , | Leave a comment

Oracle Exadata试用

有点标题党的嫌疑,但是也没有办法,想了半天找不到一个合适的题目。
其实是一个客户从Oracle预订了EXADATA测试环境,在进行测试,我只不过跑去练练手感受一下而已,满打满算也就是一个多小时的时间。
这是一个半配的X2环境,4节点RAC,后台配置了7个存储主机。
由于是客户的环境,且客户的测试才开始,显然不能进行对数据库有太大影响的测试,事实上,RMAN备份、大数据量的EXPDP/IMPDP、SQLLOARDER等能体现EXADATA性能优势的测试都没有进行,只是通过awrextr.sql调用EXPDP导出了一份AWR数据,基本上对于数据库没有压力可言。
因此大部分测试都是运行一些插入和查询的SQL而已,这个单进程的操作是不太可能给数据库造成多大压力的,总体感觉就是确实全表扫描速度很快,一个几千万的表,运行COUNT(*)只需要2、3秒的时间,而此时这个表中还不断的插入数据。这种性能的提升,显然与EXADATA强劲的硬件架构是分不开的。
可惜测试时间太短,啥时候能接一个Oracle Exadata性能评测的活就好了。BTW:听说Oracle可能会推出一个Mini版的Exadata,看来Exadata离我们的举例越来越近了。

Posted in NEWS | Leave a comment

DBCA建库导致已有数据库出现ORA-27140错误

通过DBCA在服务器上创建一个额外的实例,结束后以外发现,原有的数据库实例出现了异常。
简单描述一下系统环境,Oracle 11.2.0.2 for Linux x86-64,随后又安装了GRID,创建了ASM磁盘组。原有的数据库有部分表空间和REDO存放在ASM磁盘组中。
为了测试,通过DBCA创建了额外的实例,事实上,通过DBCA又先后创建了两个数据库实例,第二个数据库实例创建后一切正常,将第二个实例关闭,通过DBCA创建第三个数据库实例时,导致服务器上的第一个实例出现了异常。
检查实例1上的告警日志,可以发现大量下面的错误信息:

2011-08-30 10:20:54.502000 +08:00
Process W000 died, see its trace file
2011-08-30 10:20:58.518000 +08:00
Process W000 died, see its trace file
2011-08-30 10:21:02.524000 +08:00
Process W000 died, see its trace file
2011-08-30 10:21:06.527000 +08:00
Process W000 died, see its trace file
Errors IN file /u01/app/oracle/diag/rdbms/fhacdb1/fhacdb1/trace/fhacdb1_j000_19030.trc:
ORA-27140: attach TO post/wait facility failed
ORA-27300: OS system dependent operation:invalid_egid failed WITH STATUS: 1
ORA-27301: OS failure message: Operation NOT permitted
ORA-27302: failure occurred at: skgpwinit6
ORA-27303: additional information: startup egid = 1000 (oinstall), CURRENT egid = 1200 (dba)
2011-08-30 10:21:07.536000 +08:00
Process J000 died, see its trace file
kkjcre1p: unable TO spawn jobq slave process
Errors IN file /u01/app/oracle/diag/rdbms/fhacdb1/fhacdb1/trace/fhacdb1_cjq0_12308.trc:
Errors IN file /u01/app/oracle/diag/rdbms/fhacdb1/fhacdb1/trace/fhacdb1_j000_19036.trc:
ORA-27140: attach TO post/wait facility failed
ORA-27300: OS system dependent operation:invalid_egid failed WITH STATUS: 1
ORA-27301: OS failure message: Operation NOT permitted
ORA-27302: failure occurred at: skgpwinit6
ORA-27303: additional information: startup egid = 1000 (oinstall), CURRENT egid = 1200 (dba)

数据库并非简单的报错而已,实际上即使是SQLPLUS / AS SYSDBA登录数据库,进行任何操作也会显示没有连接到数据库,因此对于实例1来说,只能执行SHUTDOWN ABORT关闭实例并重新启动。
查询METALINK发现不少类似的情况,导致这个错误产生的原因就是ORA_CRS_HOME/bin下的oracle可执行文件或者ORACLE_HOME/bin下的oracle可执行文件发生修改所致。虽然从随后的ORA-2730X信息中无法判断是GRID用户下还是ORACLE用户下的oracle文件被修改,但是可以确定的是,oracle可执行文件的组信息被修改,原本应该是oinstall,而现在变成了dba。
分别检查ORA_CRS_HOME/bin和ORACLE_HOME/bin下的oracle文件,发现是ORACLE目录下的ORACLE_HOME/bin/oracle发生了改变:

[oracle@dbserver1 ~]$ cd $ORACLE_HOME/bin
[oracle@dbserver1 bin]$ ls -l oracle
-rwsr-s--x 1 oracle dba 228881982 Jul 26 16:55 oracle
[oracle@dbserver1 bin]$ ls -l |grep dba
-rwsr-s--x 1 oracle dba      228881982 Jul 26 16:55 oracle
-rwxr-xr-x 1 oracle oinstall     32372 Jul 26 16:55 osdbagrp
-rwxr-xr-x 1 oracle oinstall     35677 Aug 20  2010 osdbagrp0

可以看到,只有oracle文件变成了dba组,bin目录下所有其他文件仍然是oinstall组。为了对比,在另外一个服务器上,有一台同时搭建的相同环境的oracle,检查oracle文件的组信息:

[oracle@dbserver2 ~]$ cd $ORACLE_HOME/bin
[oracle@dbserver2 bin]$ ls -l oracle
-rwsr-s--x. 1 oracle oinstall 217733745 Apr 27 14:02 oracle
[oracle@dbserver2 bin]$ ls -l |grep dba
-rwxr-xr-x. 1 oracle oinstall     32372 Apr 27 14:02 osdbagrp
-rwxr-xr-x. 1 oracle oinstall     35677 Aug 20  2010 osdbagrp0

在这个服务器上,oracle文件的组属性仍然是oinstall,这说明节点1上的oracle组属性确实被修改了。
前面提到了,利用DBCA添加第二个实例的时候,数据库并未出现问题,而添加第三个实例的时候,才导致oracle组信息变化,那么同样是dbca的操作,为什么有所区别呢。对比第二个和第三个实例的区别,问题应该和是否使用ASM有关。第二个实例的所有文件都存储在本地硬盘,而第三个实例的所有文件都保存在ASM磁盘组中,甚至会和实例1共用一些磁盘组,应该就是这个差别导致了问题的产生。
检查ASM告警日志:

2011-08-30 10:21:08.853000 +08:00
NOTE: client al32utf8:al32utf8 registered, osid 19095, mbr 0x1
2011-08-30 10:53:27.195000 +08:00
NOTE: client al32utf8:al32utf8 deregistered
2011-08-30 10:53:31.248000 +08:00
Starting background process ASMB
ASMB started WITH pid=42, OS id=23251
NOTE: client +ASM:+ASM registered, osid 23253, mbr 0x1
2011-08-30 10:53:36.804000 +08:00
NOTE: ASMB process exiting due TO lack OF ASM file activity FOR 5 seconds
2011-08-30 10:53:51.372000 +08:00
NOTE: client al32utf8:al32utf8 registered, osid 23441, mbr 0x1

可以看到,就是第三个实例注册到ASM的时刻,在实例1上出现了错误。应该就是这个时刻,oracle文件的组信息发生了变化。
检查一下实例3上的告警日志:

2011-08-30 10:20:53.722000 +08:00
Adjusting the DEFAULT VALUE OF parameter parallel_max_servers
FROM 960 TO 135 due TO the VALUE OF parameter processes (150)
Starting ORACLE instance (normal)
2011-08-30 10:20:55.149000 +08:00
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Shared memory segment FOR instance monitoring created
Picked latch-free SCN scheme 3
2011-08-30 10:20:56.823000 +08:00
USING LOG_ARCHIVE_DEST_1 parameter DEFAULT VALUE AS /u01/app/oracle/product/11.2.0/dbhome_1/dbs/arch
Autotune OF undo retention IS turned ON.
IMODE=BR
ILAT =29
LICENSE_MAX_USERS = 0
SYS auditing IS disabled
Starting up:
Oracle DATABASE 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
WITH the Partitioning, OLAP, DATA Mining AND REAL Application Testing options.
USING parameter settings IN client-side pfile /u01/app/oracle/admin/al32utf8/pfile/init.ora ON machine dbserver1
System parameters WITH non-DEFAULT VALUES:
processes = 150
memory_target = 6432M
db_block_size = 8192
compatible = "11.2.0.0.0"
db_create_file_dest = "+DATA"
undo_tablespace = "UNDOTBS1"
remote_login_passwordfile= "EXCLUSIVE"
db_domain = ""
dispatchers = "(PROTOCOL=TCP) (SERVICE=al32utf8XDB)"
local_listener = "LISTENER_AL32UTF8"
audit_file_dest = "/u01/app/oracle/admin/al32utf8/adump"
audit_trail = "DB"
db_name = "al32utf8"
open_cursors = 300
diagnostic_dest = "/u01/app/oracle"
2011-08-30 10:21:06.647000 +08:00
PMON started WITH pid=2, OS id=19032
PSP0 started WITH pid=3, OS id=19034
2011-08-30 10:21:07.721000 +08:00
VKTM started WITH pid=4, OS id=19038 at elevated priority
VKTM running at (1)millisec PRECISION WITH DBRM quantum (100)ms
GEN0 started WITH pid=5, OS id=19042
DIAG started WITH pid=6, OS id=19044
DBRM started WITH pid=7, OS id=19046
DIA0 started WITH pid=8, OS id=19048
MMAN started WITH pid=9, OS id=19050
DBW0 started WITH pid=10, OS id=19052
DBW1 started WITH pid=11, OS id=19054
DBW2 started WITH pid=12, OS id=19056
LGWR started WITH pid=13, OS id=19058
CKPT started WITH pid=14, OS id=19060
SMON started WITH pid=15, OS id=19062
RECO started WITH pid=16, OS id=19064
MMON started WITH pid=17, OS id=19066
MMNL started WITH pid=18, OS id=19068
starting up 1 dispatcher(s) FOR network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
starting up 1 shared server(s) ...
ORACLE_BASE FROM environment = /u01/app/oracle
CREATE DATABASE "al32utf8"
MAXINSTANCES 8
MAXLOGHISTORY 1
MAXLOGFILES 16
MAXLOGMEMBERS 3
MAXDATAFILES 100
DATAFILE SIZE 1024M AUTOEXTEND ON NEXT 10240K MAXSIZE UNLIMITED
EXTENT MANAGEMENT LOCAL
SYSAUX DATAFILE SIZE 2048M AUTOEXTEND ON NEXT 10240K MAXSIZE UNLIMITED
SMALLFILE DEFAULT TEMPORARY TABLESPACE TEMP TEMPFILE SIZE 8192M AUTOEXTEND ON NEXT 640K MAXSIZE UNLIMITED
SMALLFILE UNDO TABLESPACE "UNDOTBS1" DATAFILE SIZE 8192M AUTOEXTEND ON NEXT 5120K MAXSIZE UNLIMITED
CHARACTER SET AL32UTF8
NATIONAL CHARACTER SET AL16UTF16
LOGFILE GROUP 1 SIZE 1024M,
GROUP 2 SIZE 1024M,
GROUP 3 SIZE 1024M
USER SYS IDENTIFIED BY *USER SYSTEM IDENTIFIED BY
Starting background process ASMB
ASMB started WITH pid=22, OS id=19092
2011-08-30 10:21:08.901000 +08:00
Starting background process RBAL
RBAL started WITH pid=23, OS id=19105
NOTE: initiating MARK startup
Starting background process MARK
MARK started WITH pid=24, OS id=19108
NOTE: MARK has subscribed
2011-08-30 10:21:12.325000 +08:00
NOTE: Loaded library: System
SUCCESS: diskgroup DATA was mounted
DATABASE mounted IN Exclusive Mode
Lost WRITE protection disabled
ERROR: failed TO establish dependency BETWEEN DATABASE al32utf8 AND diskgroup resource ora.DATA.dg
2011-08-30 10:21:36.862000 +08:00
Successful mount OF redo thread 1, WITH mount id 3103441300
Assigning activation ID 3103441300 (0xb8fac194)
Thread 1 opened at log SEQUENCE 1
CURRENT log# 1 seq# 1 mem# 0: +DATA/al32utf8/onlinelog/group_1.265.760530073
Successful OPEN OF redo thread 1
MTTR advisory IS disabled because FAST_START_MTTR_TARGET IS NOT SET
SMON: enabling cache recovery
processing ?/rdbms/admin/dcore.bsq
CREATE tablespace SYSTEM datafile SIZE 1024M AUTOEXTEND ON NEXT 10240K MAXSIZE UNLIMITED
EXTENT MANAGEMENT LOCAL online
2011-08-30 10:21:44.431000 +08:00
Completed: CREATE tablespace SYSTEM datafile SIZE 1024M AUTOEXTEND ON NEXT 10240K MAXSIZE UNLIMITED

可以确认实例1出现问题的时刻,正是实例3启动并在ASM磁盘组中创建表空间的时刻。
了解了问题的原委后,这个问题并不难解决,只需要重启实例1,就可以解决了。但是这个问题对于包含ASM的产品环境而言,存在很大的隐患,使用DBCA建立新库,会影响现有正在运行的数据库,这是一个埋藏比较深,且有可能造成严重后果的bug。
这个错误应该只会影响11.2以上版本,11.1以前的ASM在ORACLE_HOME目录下,应该不会导致这个问题的产生。

Posted in BUG | Tagged , , , , , , , , | Leave a comment

ORA-600(ksfdsyncdata2)错误

客户的告警日志中发现了这个ORA-600错误。
详细的错误信息为:

Wed Jan 12 22:17:15 2011
KCF: WRITE/OPEN error block=0x3f6689 online=1
file=2 /TEMP/B1EDB/temp02.dbf
error=27072 txt: 'HPUX-ia64 Error: 28: No space left on device
Additional information: 4
Additional information: 4155017
Additional information: -1'
Automatic tempfile offline due TO WRITE error ON
file 2: /TEMP/B1EDB/temp02.dbf
Wed Jan 12 22:17:16 2011
Errors IN file /oracle/db/admin/B1EDB/bdump/b1edb_j001_13014.trc:
ORA-00600: internal error code, arguments: [ksfdsyncdata2], [0xC000000335E58668], [], [], [], [], [], []
ORA-00372: file 302 cannot be modified at this TIME
ORA-01110: DATA file 302: '/TEMP/B1EDB/temp02.dbf'
ORA-00372: file 302 cannot be modified at this TIME
ORA-01110: DATA file 302: '/TEMP/B1EDB/temp02.dbf'
Wed Jan 12 22:17:18 2011
Errors IN file /oracle/db/admin/B1EDB/bdump/b1edb_j000_12908.trc:
ORA-12012: error ON auto EXECUTE OF job 8952
ORA-00376: file ORA-00376: file 302 cannot be READ at this TIME
ORA-01110: DATA file 302: '/TEMP/B1EDB/temp02.dbf'
ORA-06512: at "SYS.PRVT_ADVISOR", line 1624
ORA-06512: at "SYS.DBMS_ADVISOR", line 186
ORA-06512: at "SYS.DBMS_SPACE", line 1500
ORA-06512: at "SYS.DBMS_SPACE", line 1566
cannot be READ at this TIME

这个ORA-600错误本身比较罕见,在整个metalink也找不到说明,不过结合系统的错误信息,到是可以分析出系统的问题。
显然在ORA-600错误出现之前,出现了临时表空间不足的问题。Oracle对于临时表空间,并不会想普通表空间那样将整个数据文件进行初始化,而是在真正用到的时候再去分配。这里的问题是,临时文件所在操作系统目录上空间已经被用光,而当出现排序或连接需要分配临时段时,在写临时文件时,出现了错误。

/oracle/db/admin/B1EDB/bdump/b1edb_j001_13014.trc
Oracle DATABASE 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
WITH the Partitioning, OLAP, DATA Mining AND REAL Application Testing options
ORACLE_HOME = /oracle/db
System name: HP-UX
Node name: EDB
Release: B.11.23
Version: U
Machine: ia64
Instance name: B1EDB
Redo thread mounted BY this instance: 1
Oracle process NUMBER: 20
Unix process pid: 13014, image: oracle@EDB (J001)
 
*** ACTION NAME:(GATHER_STATS_JOB) 2011-01-12 22:17:16.421
*** MODULE NAME:(DBMS_SCHEDULER) 2011-01-12 22:17:16.421
*** SERVICE NAME:(SYS$USERS) 2011-01-12 22:17:16.421
*** SESSION ID:(590.24179) 2011-01-12 22:17:16.421
*** 2011-01-12 22:17:16.421
ksedmp: internal OR fatal error
ORA-00600: internal error code, arguments: [ksfdsyncdata2], [0xC000000335E58668], [], [], [], [], [], []
ORA-00372: file 302 cannot be modified at this TIME
ORA-01110: DATA file 302: '/TEMP/B1EDB/temp02.dbf'
ORA-00372: file 302 cannot be modified at this TIME
ORA-01110: DATA file 302: '/TEMP/B1EDB/temp02.dbf'
CURRENT SQL statement FOR this SESSION:
SELECT i.obj#,i.ts#,i.file#,i.block#,i.intcols,i.type#,i.flags,i.property,i.pctfree$,i.initrans,i.maxtrans,i.blevel,i.leafcnt,i.distkey,i.lblkkey,i.dblkkey,i.clufac,i.cols,i.analyzetime,i.samplesize,i.dataobj#,nvl(i.degree,1),nvl(i.instances,1),i.rowcnt,MOD(i.pctthres$,256),i.indmethod#,i.trunccnt,nvl(c.unicols,0),nvl(c.deferrable#+c.valid#,0),nvl(i.spare1,i.intcols),i.spare4,i.spare2,i.spare6,decode(i.pctthres$,NULL,NULL,MOD(trunc(i.pctthres$/256),256)),ist.cachedblk,ist.cachehit,ist.logicalread FROM ind$ i, ind_stats$ ist, (SELECT enabled, MIN(cols) unicols,MIN(to_number(bitand(defer,1))) deferrable#,MIN(to_number(bitand(defer,4))) valid# FROM cdef$ WHERE obj#=:1 AND enabled > 1 GROUP BY enabled) c WHERE i.obj#=c.enabled(+) AND i.obj# = ist.obj#(+) AND i.bo#=:1 ORDER BY i.obj#
----- PL/SQL Call Stack -----
  object      line  object
  handle    NUMBER  name
c00000033d7f01d8       519  package body SYS.DBMS_STATS_INTERNAL
c00000033d7f01d8       531  package body SYS.DBMS_STATS_INTERNAL
c00000033ee5e870      8533  package body SYS.DBMS_STATS
c00000033ee5e870      8580  package body SYS.DBMS_STATS
c00000033ee5e870      8672  package body SYS.DBMS_STATS
c00000033ee5e870      8752  package body SYS.DBMS_STATS
c00000033ee5e870     12415  package body SYS.DBMS_STATS
c00000033ee5e870     12892  package body SYS.DBMS_STATS
c00000033ee5e870     13345  package body SYS.DBMS_STATS
c00000033ee5e870     19409  package body SYS.DBMS_STATS
----- Call Stack Trace -----
calling              CALL     entry                argument VALUES IN hex      
location             TYPE     point                (? means dubious VALUE)     
-------------------- -------- -------------------- ----------------------------
ksedst()+64          CALL     ksedst1()            000000000 ? 000000001 ?
ksedmp()+2176        CALL     ksedst()             000000000 ?
                                                   C000000000000C9F ?
                                                   400000000404CC40 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ?
ksfdmp()+48          CALL     ksedmp()             000000003 ?
kgerinv()+304        CALL     ksfdmp()             C000000000000612 ?
                                                   000000003 ?
                                                   40000000094FF110 ?
                                                   00001830B ? 000000000 ?
                                                   000000000 ?
kgesinv()+64         CALL     kgerinv()            6000000000031370 ?
                                                   4000000001A4B3E0 ?
                                                   6000000000032428 ?
                                                   4000000001A4B3E0 ?
                                                   9FFFFFFFFFFC5700 ?
ksesin()+176         CALL     kgesinv()            6000000000031370 ?
                                                   60000000001E85A0 ?
                                                   60000000001E85B0 ?
                                                   60000000000327A0 ?
                                                   9FFFFFFFFFFC5700 ?
$cold_ksfdsyncdata(  CALL     ksesin()             4000000000F09090 ?
)+256                                              000000001 ?
                                                   60000000000C2790 ?
                                                   9FFFFFFFFFFC5700 ?
                                                   60000000000C2EC0 ?
                                                   60000000000B5E18 ?
                                                   C000000000000286 ?
                                                   C000000000000998 ?
kcflsync()+160       CALL     $cold_ksfdsyncdata(  C0000003356BF050 ?
                              )                    C00000000000038C ?
                                                   4000000002549860 ?
                                                   C000000335E586B8 ?
                                                   C0000003356BF070 ?

显然,这是JOB调用的DBMS_STATS收集统计信息的会话,从错误发生的时间上看,导致临时空间不足的多半也是收集统计信息的过程。显然这个ORA-600的错误,正是由于DBMS_STATS碰到了临时空间不足所导致的。
这个错误没有什么危害,只要避免空间不足的问题,就可以避免这个ORA-600的产生。

Posted in BUG | Tagged , , , , | Leave a comment