20130626 Oracle12c终于发布

翘盼已久的Oracle 12c今天终于发布了。
Oracle的12c将要发布的消息已经传出来将近1年时间了,就在大家认为是不是要等到OOW的时候,Oracle 12.1.0终于发布了。
第一批发布了三个平台,不用猜也知道肯定是Linux和Solaris,包括Linux X86-64、Solaris Sparc 64和Solaris X86-64。
同时在线文档也发布出来,可以通过http://www.oracle.com/pls/db121/homepage进行在线访问。估计最近一段时间,大部分新特性的文章会不断的涌现出来。

Posted in NEWS | Tagged , , | Leave a comment

私有网络接口丢失导致实例崩溃

客户10.2.0.4 RAC数据库出现网络异常,导致实例崩溃并伴随大量ORA-27300错误。
详细错误信息为:

Wed Nov 21 16:37:36 2012
Errors IN file /u01/oracle/app/admin/orcl/udump/orcl2_ora_29173.trc:
ORA-00603: ORACLE server SESSION TERMINATED BY fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:if_not_found failed WITH STATUS: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpvaddr9
ORA-27303: additional information: requested interface 10.0.1.2 NOT found. CHECK output FROM ifconfig command
Wed Nov 21 16:37:36 2012
Errors IN file /u01/oracle/app/admin/orcl/udump/orcl2_ora_29198.trc:
ORA-00603: ORACLE server SESSION TERMINATED BY fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:if_not_found failed WITH STATUS: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpvaddr9
ORA-27303: additional information: requested interface 10.0.1.2 NOT found. CHECK output FROM ifconfig command
Wed Nov 21 16:37:56 2012
Trace dumping IS performing id=[cdmp_20121121163746]
Wed Nov 21 16:38:00 2012
ospid 28424: network interface WITH IP address 10.0.1.2 no longer operational
requested interface 10.0.1.2 NOT found. CHECK output FROM ifconfig command
Wed Nov 21 16:38:07 2012
Error: KGXGN aborts the instance (6)
Wed Nov 21 16:38:07 2012
Errors IN file /u01/oracle/app/admin/orcl/bdump/orcl2_lmon_28422.trc:
ORA-29702: error occurred IN Cluster GROUP Service operation
LMON: terminating instance due TO error 29702
Wed Nov 21 16:38:07 2012
Errors IN file /u01/oracle/app/admin/orcl/bdump/orcl2_lms1_28430.trc:
ORA-29702: error occurred IN Cluster GROUP Service operation
Wed Nov 21 16:38:07 2012
Errors IN file /u01/oracle/app/admin/orcl/bdump/orcl2_lms3_28438.trc:
ORA-29702: error occurred IN Cluster GROUP Service operation
.
.
.
Wed Nov 21 16:38:09 2012
Errors IN file /u01/oracle/app/admin/orcl/bdump/orcl2_j000_28635.trc:
ORA-29702: error occurred IN Cluster GROUP Service operation
ORA-29702: error occurred IN Cluster GROUP Service operation
Wed Nov 21 16:38:09 2012
Errors IN file /u01/oracle/app/admin/orcl/bdump/orcl2_mman_28450.trc:
ORA-29702: error occurred IN Cluster GROUP Service operation
Wed Nov 21 16:38:09 2012
Errors IN file /u01/oracle/app/admin/orcl/bdump/orcl2_asmb_28496.trc:
ORA-15064: communication failure WITH ASM instance
ORA-03113: end-of-file ON communication channel
Wed Nov 21 16:38:10 2012
Errors IN file /u01/oracle/app/admin/orcl/bdump/orcl2_pmon_28416.trc:
ORA-29702: error occurred IN Cluster GROUP Service operation
Wed Nov 21 16:38:10 2012
Errors IN file /u01/oracle/app/admin/orcl/bdump/orcl2_smon_28462.trc:
ORA-29702: error occurred IN Cluster GROUP Service operation
Wed Nov 21 17:25:50 2012
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Interface TYPE 1 eth1 10.0.1.0 configured FROM OCR FOR USE AS a cluster interconnect
Interface TYPE 1 eth0 172.18.19.0 configured FROM OCR FOR USE AS  a public interface

显然导致RAC节点宕机的问题来自操作系统或硬件层。导致出现ORA-27504错误的原因是操作系统相关的ORA-27300、ORA-27301、ORA-27302和ORA-27303错误。而这些错误明确的之处私有网络接口的地址无法找到,而操作系统命令ifconfig命令输出结果异常。
Oracle的网络心跳依赖于私有网络,而出现了这个问题,导致数据库节点崩溃也是情理之中的了。
显然这不应该算作Oracle的bug,Oracle给出的错误信息已经清晰的指明了问题的原因。找到导致操作系统层面网络接口失效的原因才是解决问题的关键。

Posted in ORACLE | Tagged , , , , , , , , , , , | Leave a comment

ORA-600(kjbrref:pkey)错误

客户数据库碰到这个ORA-600错误,导致实例崩溃。
详细错误信息为:

Fri Nov 23 23:38:46 2012
minact-scn: Master returning AS live inst:2 has inc# mismatch instinc:4 cur:6 errcnt:0
Errors IN file /oracle/diag/rdbms/orcl/orcl1/trace/orcl1_lms1_2097994.trc  (incident=1520217):
ORA-00600: internal error code, arguments: [kjbrref:pkey], [7232], [107], [249516], [0], [], [], [], [], [], [], []
Incident details IN: /oracle/diag/rdbms/orcl/orcl1/incident/incdir_1520217/orcl1_lms1_2097994_i1520217.trc
USE ADRCI OR Support Workbench TO package the incident.
See Note 411.1 at My Oracle Support FOR error AND packaging details.
Fri Nov 23 23:38:55 2012
Errors IN file /oracle/diag/rdbms/orcl/orcl1/trace/orcl1_lms0_2032456.trc  (incident=1520209):
ORA-00600: internal error code, arguments: [kjbrref:pkey], [192], [191], [249516], [0], [], [], [], [], [], [], []
Incident details IN: /oracle/diag/rdbms/orcl/orcl1/incident/incdir_1520209/orcl1_lms0_2032456_i1520209.trc
USE ADRCI OR Support Workbench TO package the incident.
See Note 411.1 at My Oracle Support FOR error AND packaging details.
Fri Nov 23 23:38:55 2012
Errors IN file /oracle/diag/rdbms/orcl/orcl1/trace/orcl1_lms3_2359346.trc  (incident=1520233):
ORA-00600: internal error code, arguments: [kjbrref:pkey], [192], [187], [249516], [0], [], [], [], [], [], [], []
Incident details IN: /oracle/diag/rdbms/orcl/orcl1/incident/incdir_1520233/orcl1_lms3_2359346_i1520233.trc
USE ADRCI OR Support Workbench TO package the incident.
See Note 411.1 at My Oracle Support FOR error AND packaging details.
Fri Nov 23 23:38:56 2012
Dumping diagnostic DATA IN directory=[cdmp_20121123233855], requested BY (instance=2, osid=2623306 (LMS1)), summary=[incident=920114].
Dumping diagnostic DATA IN directory=[cdmp_20121123233857], requested BY (instance=1, osid=2097994 (LMS1)), summary=[incident=1520217].
Errors IN file /oracle/diag/rdbms/orcl/orcl1/trace/orcl1_lms3_2359346.trc:
ORA-00600: internal error code, arguments: [kjbrref:pkey], [192], [187], [249516], [0], [], [], [], [], [], [], []
LMS3 (ospid: 2359346): terminating the instance due TO error 484
Fri Nov 23 23:39:00 2012
ORA-1092 : opitsk aborting process
Fri Nov 23 23:39:00 2012
License high water mark = 128
System state dump requested BY (instance=1, osid=2359346 (LMS3)), summary=[abnormal instance termination].
System State dumped TO trace file /oracle/diag/rdbms/orcl/orcl1/trace/orcl1_diag_1638792.trc
Instance TERMINATED BY LMS3, pid = 2359346
USER (ospid: 3539564): terminating the instance
Instance TERMINATED BY USER, pid = 3539564

在ORA-600错误出现之前,出现了minact-scn: Master returning as live inst的信息,说明数据库出现了DRM。显然这个问题和DRM有直接关联。
根据MOS的文档Bug 12834027 – ORA-600 [kjbmprlst:shadow] / ORA-600 [kjbrasr:pkey] with RAC read mostly locking [ID 12834027.8],这个问题方式在RAC环境是read-mostly object locking导致的,而解决方案就是禁止DRM或禁止read-mostly object locking:将”_gc_read_mostly_locking”隐含参数设置为FALSE。
这个问题确认影响的版本为11.2.0.2和11.2.0.3,Oracle在11.2.0.2.8和11.2.0.3.1中fixed了这个bug。

Posted in BUG | Tagged , , , , , , | Leave a comment

V$SESSION记录的BLOCKING_SESSION错误

记录一个很诡异的问题,在进行一个测试时碰到的,V$SESSION中记录的BLOCKING_SESSION信息是错误的,可惜这个问题当时没有注意,后续也无法再重现出来,只好作为一个现象记录一下,Oracle给出的一些信息看来也是不完全可信的。
在会话1执行:

[tingkun.yang@node1 ~]$ sqlplus test/test
SQL*Plus: Release 10.2.0.5.0 - Production ON Sun Nov 25 04:04:00 2012
Copyright (c) 1982, 2010, Oracle. ALL Rights Reserved.
Connected TO:
Oracle DATABASE 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
WITH the Partitioning, REAL Application Clusters, OLAP, DATA Mining
AND REAL Application Testing options
SQL> SET pages 100 LINES 140
SQL> CREATE TABLE t_lock (id NUMBER, name varchar2(30), gender varchar2(8));
TABLE created.
SQL> INSERT INTO t_lock VALUES (1, 'a', 'm');
1 ROW created.
SQL> INSERT INTO t_lock VALUES (2, 'b', 'f');
1 ROW created.
SQL> commit;
Commit complete.
SQL> UPDATE t_lock SET name = 'c' WHERE id = 1;
1 ROW updated.
SQL> SELECT sid FROM v$mystat WHERE rownum = 1;
SID
----------
       151

在会话2执行同一行的更新,被锁定:

[tingkun.yang@node1 ~]$ sqlplus test/test
SQL*Plus: Release 10.2.0.5.0 - Production ON Sun Nov 25 04:03:27 2012
Copyright (c) 1982, 2010, Oracle. ALL Rights Reserved.
 
Connected TO:
Oracle DATABASE 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
WITH the Partitioning, REAL Application Clusters, OLAP, DATA Mining
AND REAL Application Testing options
SQL> SET pages 100 LINES 120
SQL> UPDATE t_lock SET name = 'd' WHERE id = 1;

在会话3上执行对V$SESSION的查询,当时没有太注意:

SQL> conn / AS sysdba
Connected.
SQL> SELECT sid, event, blocking_session FROM v$session WHERE usernamE = 'TEST';
       SID EVENT                                                            BLOCKING_SESSION
---------- ---------------------------------------------------------------- ----------------
       131 SQL*Net message FROM client
       151 SQL*Net message FROM client
SQL> SELECT sid, event, blocking_session FROM v$session WHERE usernamE = 'TEST';
       SID EVENT                                                            BLOCKING_SESSION
---------- ---------------------------------------------------------------- ----------------
       131 enq: TX - ROW LOCK contention                                                 161
       151 SQL*Net message FROM client

随后在会话1执行了ROLLBACK:

SQL> roll;
ROLLBACK complete.

检查了会话2的连接信息:

1 ROW updated.
SQL> SELECT sid FROM v$mystat WHERE rownum = 1;
       SID
----------
       131

这时才发现,之前查询的V$SESSION中的BLOCKING_SESSION信息是错误的。当前数据库只有两个测试会话,分别是131和151,而BLOCKING_SESSION给出的结果居然是161。虽然V$SESSION中还有BLOCKING_SESSION_STATUS列来表示当前的BLOCKING_SESSION是否有意义,但是一般而言,如果BLOCKING_SESSION_STATUS的值不是VALID,那么BLOCKING_SESSION的值会是NULL,但是像这种值本身就是错误的,还是第一次碰到。
由于没有办法重现,现在确定不了Oracle为什么会出现这种问题,只能先作为一个异常案例记录一下,如果以后发现类似的错误,再进行分析。不过这提示一点,Oracle视图的返回结果经常作为我们诊断、分析问题的依据,但是如果结果本身不可信,那么在此基础上推导出来的结果很可能也是不可信的。

Posted in BUG | Tagged , | Leave a comment

查询外部表出现无法打开日志文件的错误

在运行ORACLE_LOADER类型的外部表查询时报错,无法打开外部表的日志文件。
查询外部表出现KUP-4040错误:
数据库安装在操作系统oracle用户下,默认组为oinstall,当前使用的是操作系统用户tingkun.yang,同样是oinstall组,因此当前用户可以用SYSDBA身份登录数据库。
当前用户tingkun.yang运行sqlplus连接数据库,创建目录和外部表,在读取外部表时报错:

[tingkun.yang@enmoteam2 ~]$ id
uid=505(tingkun.yang) gid=501(oinstall) groups=501(oinstall),502(dba)
[tingkun.yang@enmoteam2 ~]$ ps -ef|grep lgwr
oracle 1552 1 0 Sep07 ? 00:04:14 ora_lgwr_enmot2
505 28312 28185 0 22:52 pts/0 00:00:00 grep lgwr
[tingkun.yang@enmoteam2 ~]$ sqlplus olasuser/olasuser
SQL*Plus: Release 11.2.0.3.0 Production ON Sat Nov 24 23:24:18 2012
Copyright (c) 1982, 2011, Oracle. ALL rights reserved.
Connected TO:
Oracle DATABASE 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
WITH the Partitioning, OLAP, DATA Mining AND REAL Application Testing options
SQL> CREATE directory d_alert AS '/home/tingkun.yang/sqlldr';
Directory created.
SQL> CREATE TABLE t_alert
2 (text varchar2(4000))
3 organization external
4 (TYPE oracle_loader
5 DEFAULT directory d_alert
6 access parameters
7 (records delimited BY newline
8 FIELDS (text (1:255) CHAR))
9 location ('alert_enmot2.log'));
TABLE created.
Elapsed: 00:00:00.31
SQL> SELECT * FROM t_alert WHERE rownum = 1;
SELECT * FROM t_alert WHERE rownum = 1
*
ERROR at line 1:
ORA-29913: error IN executing ODCIEXTTABLEOPEN callout
ORA-29400: DATA cartridge error
error opening file /home/tingkun.yang/sqlldr/T_ALERT_28321.log

和上一篇文章有所区别,外部表的路径设置在当前用户的目录下,而不是oracle用户的目录,当执行外部表的查询是,由于oracle用户没有在当前用户目录下的写权限,导致了无法打开日志文件的错误。
将当前用户设置为对oinstall组可读写,问题解决:

SQL> host
[tingkun.yang@enmoteam2 ~]$ cd ~/..
[tingkun.yang@enmoteam2 home]$ ls -l |grep tingkun
drwx------ 6 tingkun.yang oinstall 4096 Nov 22 14:04 tingkun.yang
[tingkun.yang@enmoteam2 home]$ chmod 770 tingkun.yang/
[tingkun.yang@enmoteam2 home]$ exit
exit
SQL> SELECT * FROM t_alert WHERE rownum = 1;
TEXT
--------------------------------------------------
Fri Aug 10 13:39:20 2012

其实根据这两次的错误的差别已经可以分析出结果,对于外部表访问而言,除了数据库中必须具备的DIRECTORY的读写权限外,需要oracle用户拥有对DIRECTORY对应的操作系统目录的写权限,以及当前访问的操作系统用户对于DIRECTORY对应的操作系统目录的读权限。

Posted in ORACLE | Tagged , , , , , | Leave a comment

查询外部表出现KUP-4040错误

在运行ORACLE_LOADER类型的外部表查询时出现这个错误。
数据库的情况比较复杂,简单描述一下。数据库是安装在操作系统oracle用户下,默认组为oinstall,当前使用的是操作系统用户tingkun.yang,同样是oinstall组,因此当前用户可以用SYSDBA身份登录数据库。
以当前用户tingkun.yang运行sqlplus连接数据库,创建目录和外部表,在读取外部表时报错:

[tingkun.yang@enmoteam2 ~]$ id
uid=505(tingkun.yang) gid=501(oinstall) groups=501(oinstall),502(dba)
[tingkun.yang@enmoteam2 ~]$ ps -ef|grep lgwr
oracle 1552 1 0 Sep07 ? 00:04:14 ora_lgwr_enmot2
505 28312 28185 0 22:52 pts/0 00:00:00 grep lgwr
[tingkun.yang@enmoteam2 ~]$ sqlplus olasuser/olasuser
SQL*Plus: Release 11.2.0.3.0 Production ON Sat Nov 24 22:53:06 2012
Copyright (c) 1982, 2011, Oracle. ALL rights reserved.
 
Connected TO:
Oracle DATABASE 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
WITH the Partitioning, OLAP, DATA Mining AND REAL Application Testing options
SQL> CREATE directory d_alert AS '/home/oracle/sqlldr';
Directory created.
SQL> CREATE TABLE t_alert
2 (text varchar2(4000))
3 organization external
4 (TYPE oracle_loader
5 DEFAULT directory d_alert
6 access parameters
7 (records delimited BY newline
8 FIELDS (text (1:255) CHAR))
9 location ('alert_enmot2.log'));
TABLE created.
SQL> SELECT * FROM t_alert WHERE rownum = 1;
SELECT * FROM t_alert WHERE rownum = 1
*
ERROR at line 1:
ORA-29913: error IN executing ODCIEXTTABLEOPEN callout
ORA-29400: DATA cartridge error
KUP-04040: file alert_enmot2.log IN D_ALERT NOT found

查询MOS发现所有的已知问题都和当前的现象不同,该文件已经存在于指定的目录下,甚至在这个目录下,错误日志已经生成:

[oracle@enmoteam2 sqlldr]$ pwd
/home/oracle/sqlldr
[oracle@enmoteam2 sqlldr]$ ls -l
total 664
-rw-rw-rw- 1 oracle oinstall 674964 Nov 22 14:06 alert_enmot2.log
-rw-r--r-- 1 oracle oinstall 95 Nov 24 22:54 T_ALERT_28321.log
[oracle@enmoteam2 sqlldr]$ more T_ALERT_28321.log 
 
LOG file opened at 11/24/12 22:54:52
KUP-04040: file alert_enmot2.log IN D_ALERT NOT found

经过多次测试发现,如果使用oracle用户启动sqlplus,则可以读取t_alert表:

[oracle@enmoteam2 ~]$ id
uid=502(oracle) gid=501(oinstall) groups=501(oinstall),502(dba),506(asmdba)
[oracle@enmoteam2 ~]$ sqlplus olasuser/olasuser
SQL*Plus: Release 11.2.0.3.0 Production ON Sat Nov 24 22:58:57 2012
Copyright (c) 1982, 2011, Oracle. ALL rights reserved.
Connected TO:
Oracle DATABASE 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
WITH the Partitioning, OLAP, DATA Mining AND REAL Application Testing options
SQL> SELECT * FROM t_alert WHERE rownum = 1;
TEXT
----------------------------------------------------
Fri Aug 10 13:39:20 2012

一直认为,这种DIRECTORY的读取,使用的是oracle用户的身份和权限,和登录的操作系统用户无关。没有想到,Oracle是根据操作系统用户是否有权限访问这个文件作为外部表访问的条件。但是外部表的日志并没有受权限的控制而已经产生了。

Posted in ORACLE | Tagged , , , , , | Leave a comment

ORA-600(kfioTranslateIO03)和ORA-600(17090)错误

客户的11.2 RAC环境出现这两个ORA-600错误。
详细的错误信息为:

Mon Nov 19 19:32:56 2012
ALTER SYSTEM SET local_listener=' (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=10.135.128.17)(PORT=1521))))' SCOPE=MEMORY SID='orcl1';
NOTE: Loaded library: System 
ORA-15025: could NOT OPEN disk "/dev/rhdisk10"
ORA-27041: unable TO OPEN file
IBM AIX RISC System/6000 Error: 13: Permission denied
Additional information: 11
ORA-15025: could NOT OPEN disk "/dev/rhdisk11"
ORA-27041: unable TO OPEN file
IBM AIX RISC System/6000 Error: 13: Permission denied
Additional information: 11
.
.
.
ORA-15025: could NOT OPEN disk "/dev/rhdisk9"
ORA-27041: unable TO OPEN file
IBM AIX RISC System/6000 Error: 13: Permission denied
Additional information: 11
SUCCESS: diskgroup DATA was mounted
Errors IN file /picclife/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ckpt_3342750.trc  (incident=720249):
ORA-00600: internal error code, arguments: [kfioTranslateIO03], [], [], [], [], [], [], [], [], [], [], []
Incident details IN: /picclife/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_720249/orcl1_ckpt_3342750_i720249.trc
USE ADRCI OR Support Workbench TO package the incident.
See Note 411.1 at My Oracle Support FOR error AND packaging details.
NOTE: dependency BETWEEN DATABASE orcl AND diskgroup resource ora.DATA.dg IS established
Dumping diagnostic DATA IN directory=[cdmp_20121119193259], requested BY (instance=1, osid=3342750 (CKPT)), summary=[incident=720249].
Errors IN file /picclife/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ckpt_3342750.trc  (incident=720250):
ORA-00600: internal error code, arguments: [17090], [], [], [], [], [], [], [], [], [], [], []
Incident details IN: /picclife/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_720250/orcl1_ckpt_3342750_i720250.trc
Mon Nov 19 19:33:00 2012
Dumping diagnostic DATA IN directory=[cdmp_20121119193300], requested BY (instance=1, osid=3342750 (CKPT)), summary=[incident=720250].
USE ADRCI OR Support Workbench TO package the incident.
See Note 411.1 at My Oracle Support FOR error AND packaging details.
ERROR: unrecoverable error ORA-600 raised IN ASM I/O path; terminating process 3342750 
Mon Nov 19 19:33:02 2012
PMON (ospid: 2884748): terminating the instance due TO error 469
System state dump requested BY (instance=1, osid=2884748 (PMON)), summary=[abnormal instance termination].
System State dumped TO trace file /picclife/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_diag_1442204.trc
Dumping diagnostic DATA IN directory=[cdmp_20121119193302], requested BY (instance=1, osid=2884748 (PMON)), summary=[abnormal instance termination].
Mon Nov 19 19:33:02 2012
ORA-1092 : opitsk aborting process
Mon Nov 19 19:33:03 2012
License high water mark = 2
Instance TERMINATED BY PMON, pid = 2884748
USER (ospid: 4458354): terminating the instance
Instance TERMINATED BY USER, pid = 4458354

根据MOS文档ORA-00600 [kfioTranslateIO03] [17090] [ID 1336846.1],导致这个问题的原因在于oracle可执行文件的所在操作系统组必需要有ASM磁盘文件的读写权限。而这个错误就是由于grid和oracle用户的组设置不当导致的。
Oracle推荐的解决方案为,使用grid用户登录,进入GRID_HOME/bin目录,执行下面的SQL语句:

$ ./setasmgidwrap o=$ORACLE_HOME/bin/oracle

其中$ORACLE_HOME为oracle用户下安装的ORACLE目录。

Posted in BUG | Tagged , , , , , , , , | Leave a comment

ORA-600(2901)错误

客户在创建表空间时,出现这个错误。
详细错误信息为:

SQL> CREATE TABLESPACE T_DATA DATAFILE '/dev/tdat01' SIZE 13369336K 
  2  EXTENT MANAGEMENT LOCAL 
  3  SEGMENT SPACE MANAGEMENT AUTO;
CREATE TABLESPACE HS_ARCH_DATA DATAFILE '/dev/tdat01' SIZE 13369336K 
* 
ERROR at line 1: 
ORA-00600: internal error code, arguments: [2901], [1671166], [1671167], [], [], [], [], []

对应的详细TRACE信息为:

*** 2012-11-20 16:25:39.981 
ksedmp: internal OR fatal error 
ORA-00600: internal error code, arguments: [2901], [1671166], [1671167], [], [], [], [], [] 
CURRENT SQL statement FOR this SESSION: 
CREATE TABLESPACE T_DATA DATAFILE '/dev/tdat01' SIZE 13369336K 
EXTENT MANAGEMENT LOCAL 
SEGMENT SPACE MANAGEMENT AUTO 
----- Call Stack Trace ----- 
calling              CALL     entry                argument VALUES IN hex 
location             TYPE     point                (? means dubious VALUE) 
-------------------- -------- -------------------- ---------------------------- 
ksedst+001c          bl       ksedst1              000000000 ? 110074DE8 ? 
ksedmp+0290          bl       ksedst               104C20490 ? 
ksfdmp+02d8          bl       03F34EF4 
kgeriv+0108          bl       _ptrgl 
kgesiv+0080          bl       kgeriv               1104BD230 ? 000000010 ? 
                                                   000000000 ? 1109724D0 ? 
                                                   000000000 ? 
ksesic2+0060         bl       kgesiv               6368646174303100 ?

这个ORA-600[2901]的错误经常出现在创建表空间或添加数据文件的过程中,当Oracle尝试初始化数据文件时发现数据文件的空间不满足创建语句的要求时,就会引发这个错误。
其第二个参数为数据文件可用的BLOCK数,而第三个参数为当前需要创建的数据文件所需的BLOCK数:

SQL> SELECT 13369336*1024/8/1024 FROM dual;
13369336*1024/8/1024
--------------------
             1671167

解决这个问题很简单,只需要将数据文件缩小一点就可以了。相比较导致这个问题的原因更有技术含量。
根据客户的主机工程师反馈,在AIX上创建VG的时候没有将类型设置为BIG类型,导致在创建裸设备时mklv指定的-T O参数没有生效,这样LVCB就占用了裸设备的第一个块,从而导致Oracle创建数据文件时空间不足,从而引发了ORA-600错误。

Posted in BUG | Tagged , , , , , | Leave a comment

ORA-600(k2qgil:gtx)错误

客户11.2.0.3 RAC出现了ORA-600[k21gil:gtx]错误。
错误信息为:

Sat Sep 22 18:45:52 2012
Auto-tuning: Shutting down background process GTX1
Sat Sep 22 19:00:20 2012
Auto-tuning: Starting background process GTX1
Starting background process GTX1
Sat Sep 22 19:00:20 2012
GTX1 started WITH pid=584, OS id=21107 
Sat Sep 22 19:02:41 2012
Auto-tuning: Starting background process GTX2
Starting background process GTX2
Sat Sep 22 19:02:42 2012
GTX2 started WITH pid=482, OS id=25715 
Sat Sep 22 19:12:48 2012
Auto-tuning: Shutting down background process GTX2
Sat Sep 22 19:23:12 2012
Auto-tuning: Shutting down background process GTX1
Sat Sep 22 19:23:15 2012
Errors IN file /opt/app/oracle/product/11.2.0/diag/rdbms/orcl/orcl1/trace/orcl1_ora_3848.trc  (incident=316569):
ORA-00600: internal error code, arguments: [k2qgil:gtx], [1], [1], [], [], [], [], [], [], [], [], []
Incident details IN: /opt/app/oracle/product/11.2.0/diag/rdbms/orcl/orcl1/incident/incdir_316569/orcl1_ora_3848_i316569.trc
USE ADRCI OR Support Workbench TO package the incident.
See Note 411.1 at My Oracle Support FOR error AND packaging details.
Sat Sep 22 19:23:20 2012
Dumping diagnostic DATA IN directory=[cdmp_20120922192320], requested BY (instance=1, osid=3848), summary=[incident=316569].
Sat Sep 22 19:23:21 2012
Sweep [inc][316569]: completed
Sweep [inc2][316569]: completed
Sat Sep 22 19:34:22 2012
Auto-tuning: Starting background process GTX1
Starting background process GTX1
Sat Sep 22 19:34:22 2012
GTX1 started WITH pid=568, OS id=27173

这个错误在MOS上只有一个11.1.0.7上的错误记录,Oracle人为需要进一步信息进行判断,没有明确的bug描述。根据错误信息不难判断,问题发生在GTX进程启动时。报错函数也体现了问题和GTX进程有关。而Oracle的GTX进程是Global Transaction Process,用来维护CLUSTER中XA全局事务的全局信息。
在报错发生后,Oracle后续成功启动了GTX进程,说明报错只是偶然性发生。既然Oracle目前没有任何的BUG描述,这个错误只能暂时忽略,目前看这个bug对系统的影响是非常有限的。

Posted in BUG | Tagged , , , , , , | Leave a comment

ORA-7445(kclfusion_1)错误

客户10.2.0.4 RAC环境在启动时出现ORA-7445错误。
错误信息为:

Wed Jun 27 05:10:51 2012
Starting ORACLE instance (normal)
Wed Jun 27 05:11:24 2012
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Interface TYPE 1 aggr1 192.168.168.0 configured FROM OCR FOR USE AS a cluster interconnect
Interface TYPE 1 bge0 172.25.145.0 configured FROM OCR FOR USE AS  a public interface
Picked latch-free SCN scheme 3
USING LOG_ARCHIVE_DEST_10 parameter DEFAULT VALUE AS USE_DB_RECOVERY_FILE_DEST
Autotune OF undo retention IS turned ON. 
LICENSE_MAX_USERS = 0
SYS auditing IS disabled
ksdpec: called FOR event 13740 prior TO event GROUP initialization
Starting up ORACLE RDBMS Version: 10.2.0.4.0.
System parameters WITH non-DEFAULT VALUES:
  processes                = 1024
  sessions                 = 1131
  __shared_pool_size       = 11895046144
  __large_pool_size        = 16777216
  __java_pool_size         = 16777216
  __streams_pool_size      = 16777216
  spfile                   = /orcldata/oradata/orcl/spfileorcl.ora
  filesystemio_options     = SetAll
  sga_target               = 33554432000
  control_files            = /orcldata/oradata/orcl/control01.ctl, /orcldata/oradata/orcl/control02.ctl, /orcldata/oradata/orcl/control03.ctl
  db_block_size            = 8192
  __db_cache_size          = 21592276992
  compatible               = 10.2.0.3.0
  log_archive_format       = %t_%s_%r.dbf
  db_file_multiblock_read_count= 16
  cluster_database         = TRUE
  cluster_database_instances= 2
  db_recovery_file_dest    = /orcldata/archdata
  db_recovery_file_dest_size= 209715200000
  thread                   = 1
  instance_number          = 1
  undo_management          = AUTO
  undo_tablespace          = UNDOTBS1
  remote_login_passwordfile= EXCLUSIVE
  db_domain                = 
  dispatchers              = (PROTOCOL=TCP) (SERVICE=orclXDB)
  local_listener           = (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.25.145.88)(PORT=1521)))
  remote_listener          = LISTENERS_ORCL
  job_queue_processes      = 10
  background_dump_dest     = /oracle/admin/orcl/bdump
  user_dump_dest           = /oracle/admin/orcl/udump
  core_dump_dest           = /oracle/admin/orcl/cdump
  audit_file_dest          = /oracle/admin/orcl/adump
  db_name                  = orcl
  open_cursors             = 300
  pga_aggregate_target     = 8388608000
Cluster communication IS configured TO USE the following interface(s) FOR this instance
  192.168.168.3
Wed Jun 27 05:11:27 2012
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
PMON started WITH pid=2, OS id=12150
DIAG started WITH pid=3, OS id=12152
PSP0 started WITH pid=4, OS id=12154
LMON started WITH pid=5, OS id=12156
LMD0 started WITH pid=6, OS id=12158
LMS0 started WITH pid=7, OS id=12160
LMS1 started WITH pid=8, OS id=12164
LMS2 started WITH pid=9, OS id=12168
LMS3 started WITH pid=10, OS id=12172
LMS4 started WITH pid=11, OS id=12176
LMS5 started WITH pid=12, OS id=12180
LMS6 started WITH pid=13, OS id=12184
LMS7 started WITH pid=14, OS id=12188
MMAN started WITH pid=15, OS id=12192
DBW0 started WITH pid=16, OS id=12194
DBW1 started WITH pid=17, OS id=12196
DBW2 started WITH pid=18, OS id=12198
DBW3 started WITH pid=19, OS id=12200
LGWR started WITH pid=20, OS id=12202
CKPT started WITH pid=21, OS id=12204
SMON started WITH pid=22, OS id=12206
RECO started WITH pid=23, OS id=12208
CJQ0 started WITH pid=24, OS id=12210
MMON started WITH pid=25, OS id=12212
Wed Jun 27 05:11:28 2012
starting up 1 dispatcher(s) FOR network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
MMNL started WITH pid=26, OS id=12214
Wed Jun 27 05:11:28 2012
starting up 1 shared server(s) ...
Wed Jun 27 05:11:34 2012
Errors IN file /oracle/admin/orcl/bdump/orcl1_lmon_12156.trc:
ORA-07445: exception encountered: core dump [kjfmact()+12] [SIGSEGV] [Address NOT mapped TO object] [0x000000D5C] [] []
USER: terminating instance due TO error 29702
Instance TERMINATED BY USER, pid = 11923

导致这个ORA-7445错误的原因是由于出现了ORA-29702错误。该现象与MOS文档Bug 11684221 : ORA-29702 AND ORA-304 WHILE STARTING THE INSTANCES描述的非常相似。基本上可以认为是是CLUSTER层面出现了异常导致实例启动报错。

ORA-29702: error occurred IN Cluster GROUP Service operation 
Cause: An unexpected error occurred while performing a CGS operation.
Action: Verify that the LMON process IS still active. Also, CHECK the Oracle LMON trace files FOR errors.

而随后的第二次启动并没有发生同样的问题,说明CLUSTER层面的异常只是一次性的,已经被解决。这个错误和CLUSTER的某些不正常的状态有关,不但出现概率很低,而且不会重现,可以安全的忽略掉。

Posted in BUG | Tagged , , | Leave a comment