ORA-600(kzdugt)错误

创建用户时产生ORA-600(kzdugt)错误。
错误信息为:

Mon Mar 12 12:06:58 CST 2012
Errors IN file /u01/app/oracle/admin/orcl10g/udump/orcl10g_ora_22193.trc:
ORA-00600: internal error code, arguments: [kzdugt], [], [], [], [], [], [], []

详细的信息为:

Dump file /u01/app/oracle/admin/orcl10g/udump/orcl10g_ora_22193.trc
Oracle DATABASE 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
WITH the Partitioning, OLAP, DATA Mining AND REAL Application Testing options
ORACLE_HOME = /u01/app/oracle/product/10.2.0/db_1
System name:    Linux
Node name:      hpserver2.enmotech.com
Release:        2.6.32-100.28.5.el6.x86_64
Version:        #1 SMP Wed Feb 2 18:40:23 EST 2011
Machine:        x86_64
Instance name: orcl10g
Redo thread mounted BY this instance: 1
Oracle process NUMBER: 13
Unix process pid: 22193, image: oracle@hpserver2.enmotech.com (TNS V1-V3)
*** ACTION NAME:() 2012-03-12 12:06:58.209
*** MODULE NAME:(sqlplus@hpserver2.enmotech.com (TNS V1-V3)) 2012-03-12 12:06:58.209
*** SERVICE NAME:(SYS$USERS) 2012-03-12 12:06:58.209
*** SESSION ID:(31.660) 2012-03-12 12:06:58.209
*** 2012-03-12 12:06:58.209
ksedmp: internal OR fatal error
ORA-00600: internal error code, arguments: [kzdugt], [], [], [], [], [], [], []
CURRENT SQL statement FOR this SESSION:
CREATE USER reptdb
IDENTIFIED BY *******DEFAULT tablespace users
----- Call Stack Trace -----
calling              CALL     entry                argument VALUES IN hex      
location             TYPE     point                (? means dubious VALUE)     
-------------------- -------- -------------------- ----------------------------
ssd_unwind_bp: unhandled instruction at 0x76ab50 instr=f
ksedst()+31          CALL     ksedst1()            000000000 ? 000000001 ?
                                                   7FFFFD029C60 ? 7FFFFD029CC0 ?
                                                   7FFFFD029C00 ? 000000000 ?
ksedmp()+610         CALL     ksedst()             000000000 ? 000000001 ?
                                                   7FFFFD029C60 ? 7FFFFD029CC0 ?
                                                   7FFFFD029C00 ? 000000000 ?
ksfdmp()+63          CALL     ksedmp()             000000003 ? 000000001 ?
                                                   7FFFFD029C60 ? 7FFFFD029CC0 ?
                                                   7FFFFD029C00 ? 000000000 ?
kgerinv()+161        CALL     ksfdmp()             006AE9A40 ? 000000003 ?
                                                   7FFFFD029C60 ? 7FFFFD029CC0 ?
                                                   7FFFFD029C00 ? 000000000 ?
kgesinv()+33         CALL     kgerinv()            006AE9A40 ? 007F5EE48 ?
                                                   7FFFFD029CC0 ? 7FFFFD029C00 ?
                                                   000000000 ? 000000000 ?
ksesin()+211         CALL     kgesinv()            006AE9A40 ? 007F5EE48 ?
                                                   7FFFFD029CC0 ? 7FFFFD029C00 ?
                                                   000000000 ? 000000000 ?
kzdugt()+1435        CALL     ksesin()             006AE9A40 ? 007F5EE48 ?

这个错误是在创建用户时产生的,其对应的bug描述为:ORA-00600 [kzdugt] While Creating An User [ID 734801.1]。
导致问题的原因仍然是数据字典的不一致,如果从USER$查询MAX(USER#)的值大于_NEXT_USER对应的USER#的值,就会导致这个问题。
接近方法除了可以直接更新_NEXT_USER对应的USER#的值外,还可以通过PL/SQL循环动态创建用户并捕获错误,使得_NEXT_USER的值最终超过USER$其他所有的记录,就可以避免这个错误了。

Posted in BUG | Tagged , , , | Leave a comment

分配LOB空间失败导致ORA-600(kddummy_blkchk)错误

测试环境发现一个ORA-600错误,简单重现了一下。
错误信息如下:

Fri Mar 2 23:28:49 2012
Corrupt Block Found
TSN = 4, TSNAME = USERS
RFN = 4, BLK = 395, RDBA = 16777611
OBJN = 0, OBJD = 53647, OBJECT = /6ee738c_AccessorUtilities, SUBOBJECT =
SEGMENT OWNER = PUBLIC, SEGMENT TYPE = Invalid TYPE
Fri Mar 2 23:32:48 2012
Errors IN file /home/xxx/db/admin/ora1020410/udump/ora1020410_ora_13198.trc:
ORA-00600: internal error code, arguments: [kddummy_blkchk], [4], [419], [18038], [], [], [], []

对应的TRACE文件信息为:

*** 2012-03-02 23:28:48.194
ksedmp: internal OR fatal error
ORA-00600: internal error code, arguments: [kddummy_blkchk], [4], [395], [18038], [], [], [], []
CURRENT SQL statement FOR this SESSION:
ALTER TABLE fnd_lobs MODIFY LOB (file_data) (allocate extent (SIZE 4000m))
----- Call Stack Trace -----
calling              CALL     entry                argument VALUES IN hex      
location             TYPE     point                (? means dubious VALUE)     
-------------------- -------- -------------------- ----------------------------
ksedst()+31          CALL     ksedst1()            000000000 ? 000000001 ?
                                                   7FFF0F118E10 ? 7FFF0F118E70 ?
                                                   7FFF0F118DB0 ? 000000000 ?
ksedmp()+610         CALL     ksedst()             000000000 ? 000000001 ?
                                                   7FFF0F118E10 ? 7FFF0F118E70 ?
                                                   7FFF0F118DB0 ? 000000000 ?
ksfdmp()+21          CALL     ksedmp()             000000003 ? 000000001 ?
                                                   7FFF0F118E10 ? 7FFF0F118E70 ?
                                                   7FFF0F118DB0 ? 000000000 ?
kgerinv()+161        CALL     ksfdmp()             000000003 ? 000000001 ?
                                                   7FFF0F118E10 ? 7FFF0F118E70 ?
                                                   7FFF0F118DB0 ? 000000000 ?
kseinpre()+56        CALL     kgerinv()            0068CAD20 ? 007247B10 ?
                                                   7FFF0F118E70 ? 7FFF0F118DB0 ?
                                                   000000000 ? 000000000 ?
ksesin()+177         CALL     kseinpre()           7FFF0F118E70 ? 7FFF0F118DB0 ?
                                                   7FFF0F119B70 ? 7FFF0F118DB0 ?
                                                   000000000 ? 000000000 ?
kco_blkchk()+1451    CALL     ksesin()             0058F5B08 ? 000000003 ?
                                                   000000000 ? 000000004 ?
                                                   000000000 ? 00000018B ?
kcoapl()+1411        CALL     kco_blkchk()         7FFF0F11B600 ? 000000000 ?
                                                   7FFF0F11B668 ? 000002000 ?
                                                   000000000 ? 000000001 ?
kcbapl()+336         CALL     kcoapl()             7FFF0F11B600 ? 0A7C8A000 ?
                                                   000000001 ? 000000004 ?
                                                   000002000 ? 000000000 ?
kcrfw_redo_gen()+12  CALL     kcbapl()             7FFF0F11B600 ? 0A7FE0968 ?
402                                                7F924BFF6E14 ? 000000000 ?
                                                   000000000 ? 000000000 ?
kcbchg1_main()+5258  CALL     kcrfw_redo_gen()     000000002 ? 7FFF0F11A4C8 ?
                                                   7FFF0F11A650 ? 7F924BFF6E14 ?
                                                   000000000 ? 000000001 ?
kcbchg1()+125        CALL     kcbchg1_main()       000000000 ? 000000002 ?
                                                   7FFF0F11AD08 ? 7FFF0F11AD38 ?
                                                   000000000 ? 000000000 ?
ktuchg()+1533        CALL     kcbchg1()            000000000 ? 000000002 ?

查询了一下MOS,可以看到和文档Bug 8198906 – OERI [kddummy_blkchk] / OERI [5467] for an aborted transaction of allocating extents [ID 8198906.8]描述的内容最为接近,在分配EXTENTS的过程中被中止,会导致这个错误的产生,而确认影响的版本就是当前的版本。
为了验证确实属于这个问题,下面尝试重现问题:

SQL> SELECT * FROM V$VERSION;
BANNER
----------------------------------------------------------------
Oracle DATABASE 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE 10.2.0.4.0 Production
TNS FOR Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production
SQL> CREATE TABLE T_LOB (ID NUMBER, CONTENTS CLOB);
TABLE created.
SQL> ALTER TABLE T_LOB MODIFY LOB (CONTENTS) (ALLOCATE EXTENT (SIZE 1M));
TABLE altered.
SQL> SELECT TABLESPACE_NAME FROM USER_TABLES WHERE TABLE_NAME = 'T_LOB';
TABLESPACE_NAME
------------------------------
USERS
SQL> SELECT SUM(BYTES)/1024/1024 FROM DBA_FREE_SPACE WHERE TABLESPACE_NAME = 'USERS';
SUM(BYTES)/1024/1024
--------------------
            6616.375
SQL> CONN TEST/TEST
Connected.
SQL> ALTER TABLE T_LOB MODIFY LOB (CONTENTS) (ALLOCATE EXTENT (SIZE 4096M));
TABLE altered.
SQL> ALTER TABLE T_LOB MODIFY LOB (CONTENTS) (ALLOCATE EXTENT (SIZE 8192M));
ALTER TABLE T_LOB MODIFY LOB (CONTENTS) (ALLOCATE EXTENT (SIZE 8192M))
*
ERROR at line 1:
ORA-00607: Internal error occurred while making a CHANGE TO a DATA block
ORA-00600: internal error code, arguments: [kddummy_blkchk], [4], [273907],
[18038], [], [], [], []

可以看到如果分配EXTENT的大小不超过表空间的大小,那么操作可以成功,而如果分配EXTENT的大小超过了表空间的大小,则Oracle分配操作失败,这就引发了bug的产生,并导致了ORA-600[kddummy_blkchk]。
Oracle在10.2.0.5和11.2.0.1中fixed了这个bug,而这个bug影响的版本比较广,10.2.0.4、10.2.0.3、9.2.0.8和9.2.0.6这几个常用版本均在确认影响的版本内。

Posted in BUG | Tagged , , , | Leave a comment

删除表空间报错ORA-600(ktssdrp1)错误

测试环境的新建表空间在删除时报错。
数据库创建一个新的表空间后执行删除,出现了ORA-600的错误信息:

SQL> SELECT tablespace_name FROM dba_tablespaces;
TABLESPACE_NAME
------------------------------
SYSTEM
UNDOTBS
SYSAUX
TEMP
TBS_1
GGX
USERS
PERFSTAT
STATSPACK
ALA_TEST
10 ROWS selected.
SQL> DROP tablespace ala_test including contents AND datafiles;
DROP tablespace ala_test including contents AND datafiles
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [ktssdrp1], [9], [13], [11], [], [], [], []
错误信息为:
Tue Mar 13 18:11:14 CST 2012
DROP tablespace ala_test including contents AND datafiles
Tue Mar 13 18:11:14 CST 2012
Errors IN file /u01/app/oracle/admin/orcl10g/udump/orcl10g_ora_7838.trc:
ORA-00600: internal error code, arguments: [ktssdrp1], [9], [13], [11], [], [], [], []
Tue Mar 13 18:11:18 CST 2012
ORA-600 signalled during: DROP tablespace ala_test including contents AND datafiles...

详细信息为:

Dump file /u01/app/oracle/admin/orcl10g/udump/orcl10g_ora_7838.trc
Oracle DATABASE 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
WITH the Partitioning, OLAP, DATA Mining AND REAL Application Testing options
ORACLE_HOME = /u01/app/oracle/product/10.2.0/db_1
System name:    Linux
Node name:      hpserver2.enmotech.com
Release:        2.6.32-100.28.5.el6.x86_64
Version:        #1 SMP Wed Feb 2 18:40:23 EST 2011
Machine:        x86_64
Instance name: orcl10g
Redo thread mounted BY this instance: 1
Oracle process NUMBER: 13
Unix process pid: 7838, image: oracle@hpserver2.enmotech.com (TNS V1-V3)
*** ACTION NAME:() 2012-03-13 18:11:14.802
*** MODULE NAME:(sqlplus@hpserver2.enmotech.com (TNS V1-V3)) 2012-03-13 18:11:14.802
*** SERVICE NAME:(SYS$USERS) 2012-03-13 18:11:14.802
*** SESSION ID:(30.124) 2012-03-13 18:11:14.802
*** 2012-03-13 18:11:14.802
ksedmp: internal OR fatal error
ORA-00600: internal error code, arguments: [ktssdrp1], [9], [13], [11], [], [], [], []
CURRENT SQL statement FOR this SESSION:
DROP TABLE "APP"."SERVICE" cascade constraints purge
----- Call Stack Trace -----
calling              CALL     entry                argument VALUES IN hex
location             TYPE     point                (? means dubious VALUE)
-------------------- -------- -------------------- ----------------------------
ssd_unwind_bp: unhandled instruction at 0x3d12d3e instr=f
ssd_unwind_bp: unhandled instruction at 0x1344b61 instr=f
ksedst()+31          CALL     ksedst1()            000000000 ? 000000001 ?
                                                   7FFFA35C05C0 ? 7FFFA35C0620 ?
                                                   7FFFA35C0560 ? 000000000 ?
ksedmp()+610         CALL     ksedst()             000000000 ? 000000001 ?
                                                   7FFFA35C05C0 ? 7FFFA35C0620 ?
                                                   7FFFA35C0560 ? 000000000 ?
ksfdmp()+63          CALL     ksedmp()             000000003 ? 000000001 ?
                                                   7FFFA35C05C0 ? 7FFFA35C0620 ?
                                                   7FFFA35C0560 ? 000000000 ?
kgerinv()+161        CALL     ksfdmp()             006AE9A40 ? 000000003 ?
                                                   7FFFA35C05C0 ? 7FFFA35C0620 ?
                                                   7FFFA35C0560 ? 000000000 ?
kgeasnmierr()+163    CALL     kgerinv()            006AE9A40 ? 008754D28 ?
                                                   7FFFA35C0620 ? 7FFFA35C0560 ?
                                                   000000000 ? 000000000 ?
ktssdrp_segment()+2  CALL     kgeasnmierr()        006AE9A40 ? 008754D28 ?
289                                                7FFFA35C0620 ? 7FFFA35C0560 ?
                                                   000000000 ? 000000009 ?
dtbdrp()+1736        CALL     ktssdrp_segment()    7FFFA35C2998 ? 008754D28 ?
                                                   7FFFA35C0620 ? 7FFFA35C0560 ?
                                                   000000000 ? 000000009 ?
dtbdrv()+2732        CALL     dtbdrp()             7FFFA35C2998 ? 0779C1148 ?
                                                   074FB4318 ? 7FFFA35C2CB0 ?
                                                   000000000 ? 7FFFA35C2CE8 ?
opiexe()+13175       CALL     dtbdrv()             7FFFA35C2998 ? 0779C1148 ?
                                                   000000000 ? 7FFFA35C2CB0 ?
                                                   000000000 ? 7FFFA35C2CE8 ?
opiosq0()+3398       CALL     opiexe()             000000004 ? 000000000 ?
                                                   7FFFA35C3F2C ? 000000005 ?
                                                   000000000 ? 7FFFA35C2CE8 ?
opiosq()+14          CALL     opiosq0()            000000003 ? 00000000F ?
                                                   7FFFA35C5600 ? 000000000 ?
                                                   000000000 ? 000000041 ?
opiodr()+1184        CALL     opiosq()             000000003 ? 00000000F ?
                                                   7FFFA35C5600 ? 000000000 ?
                                                   000000000 ? 000000041 ?
ssd_unwind_bp: unhandled instruction at 0x24168a7 instr=f
rpidrus()+196        CALL     opiodr()             00000004A ? 00000000F ?
                                                   7FFFA35C5600 ? 000000005 ?
                                                   005BEBA90 ? 000000041 ?
skgmstack()+158      CALL     rpidrus()            7FFFA35C4E58 ? 00000000F ?
                                                   7FFFA35C5600 ? 000000005 ?
                                                   005BEBA90 ? 000000041 ?
rpidru()+116         CALL     skgmstack()          7FFFA35C4E30 ? 006AE9620 ?
                                                   00000F618 ? 002419628 ?
                                                   7FFFA35C4E58 ? 000000041 ?
rpiswu2()+409        CALL     rpidru()             7FFFA35C54F8 ? 006AE9620 ?
                                                   00000F618 ? 002419628 ?
                                                   7FFFA35C4E58 ? 000000041 ?
rpidrv()+1516        CALL     rpiswu2()            07F1AEA58 ? 000000000 ?
                                                   7FFFA35C54D8 ? 000000002 ?
                                                   7FFFA35C5540 ? 000000000 ?
rpispl()+406         CALL     rpidrv()             000000005 ? 00000004A ?
                                                   7FFFA35C5600 ? 000000008 ?
                                                   7FFFA35C5540 ? 000000000 ?
tbsdrac()+4341       CALL     rpispl()             000000005 ? 000000000 ?
                                                   0087315C0 ? 000000041 ?
                                                   000000000 ? 000000000 ?
dtsdrv()+2882        CALL     tbsdrac()            000000009 ? 000000000 ?
                                                   000000000 ? 000000001 ?
                                                   000000000 ? 000000000 ?
opiexe()+14347       CALL     dtsdrv()             000000009 ? 000000000 ?
                                                   000000000 ? 000000001 ?
                                                   000000000 ? 000000000 ?
opiosq0()+3398       CALL     opiexe()             000000004 ? 000000000 ?
                                                   7FFFA35C7B3C ? 000000002 ?
                                                   000000000 ? 000000000 ?
kpooprx()+318        CALL     opiosq0()            000000003 ? 00000000E ?
                                                   7FFFA35C7E68 ? 0000000A4 ?
                                                   000000000 ? 600000039 ?
kpoal8()+783         CALL     kpooprx()            7FFFA35CB04C ? 7FFFA35C9078 ?
                                                   000000039 ? 000000001 ?
                                                   000000000 ? 600000039 ?
opiodr()+1184        CALL     kpoal8()             00000005E ? 000000017 ?
                                                   7FFFA35CB048 ? 000000001 ?
                                                   000000001 ? 600000039 ?
ttcpip()+1226        CALL     opiodr()             00000005E ? 000000017 ?
                                                   7FFFA35CB048 ? 000000000 ?
                                                   005BEBDB0 ? 600000039 ?
opitsk()+1310        CALL     ttcpip()             006AF1FD0 ? 0054A67A0 ?
                                                   7FFFA35CB048 ? 000000000 ?
                                                   7FFFA35CAB48 ? 7FFFA35CB1B0 ?
opiino()+1024        CALL     opitsk()             000000003 ? 000000000 ?
                                                   7FFFA35CB048 ? 000000001 ?
                                                   000000000 ? 6AF000900000001 ?
opiodr()+1184        CALL     opiino()             00000003C ? 000000004 ?
                                                   7FFFA35CC248 ? 000000001 ?
                                                   000000000 ? 6AF000900000001 ?
opidrv()+548         CALL     opiodr()             00000003C ? 000000004 ?
                                                   7FFFA35CC248 ? 000000000 ?
                                                   005BEB860 ? 6AF000900000001 ?
sou2o()+114          CALL     opidrv()             00000003C ? 000000004 ?
                                                   7FFFA35CC248 ? 000000000 ?
                                                   005BEB860 ? 6AF000900000001 ?
opimai_real()+163    CALL     sou2o()              7FFFA35CC220 ? 00000003C ?
                                                   000000004 ? 7FFFA35CC248 ?
                                                   005BEB860 ? 6AF000900000001 ?
main()+116           CALL     opimai_real()        000000002 ? 7FFFA35CC2B0 ?
                                                   000000004 ? 7FFFA35CC248 ?
                                                   005BEB860 ? 6AF000900000001 ?
__libc_start_main()  CALL     main()               000000002 ? 7FFFA35CC2B0 ?
+253                                               000000004 ? 7FFFA35CC248 ?
                                                   005BEB860 ? 6AF000900000001 ?
_start()+41          CALL     __libc_start_main()  00072D134 ? 000000002 ?
                                                   7FFFA35CC408 ? 000000000 ?
                                                   005BEB860 ? 6AF000900000001 ?
--------------------- Binary Stack Dump ---------------------

可以看到,报错发生在删除一个表上。而这个表在数据库中应该是不存在的,或者说即使这个表存在,也不可能建立在这个表空间。那么导致问题的原因应该只有一个,数据字典出现了不一致的状态。

SQL> SELECT ts#, COUNT(*) FROM seg$ GROUP BY ts#;
       TS#   COUNT(*)
---------- ----------
         1         11
         6         20
         2       1126
         4         57
         5          5
         0       1429
6 ROWS selected.
SQL> SELECT ts#, COUNT(*) FROM tab$ GROUP BY ts#;
       TS#   COUNT(*)
---------- ----------
         6        429
        11         97
         2        247
         5          5
         4         48
         8         30
         7          2
         0        736
         9         10
        12          2
10 ROWS selected.

可以看到,在SEG$中不存在表空间编号大于6的对象,但是在TAB$中确存在很多。

SQL> SELECT ts#, name FROM ts$ WHERE name = 'ALA_TEST';
       TS# NAME
---------- ------------------------------
         9 ALA_TEST
SQL> SELECT ts#, name FROM ts$;
       TS# NAME
---------- ------------------------------
         0 SYSTEM
         1 UNDOTBS
         2 SYSAUX
         3 TEMP
         4 TBS_1
         5 GGX
         6 USERS
         7 PERFSTAT
         8 STATSPACK
         9 ALA_TEST
10 ROWS selected.

可以看到,表空间目前编号只到9,而TAB$中记录的表空间的编号以及到了12。显然这就是导致ORA-600错误的原因。Oracle在删除表空间的时候发现有表存在,但是根据段信息又找不到对应的记录,导致删除无法完成,从而引发了这个错误。
至于数据字典不一致的产生应该是测试环境通过imp导入tab$引起的,对于测试环境而言,这个错误很容易清除掉,直接删除不一致的记录即可,但是对于产品环境而言,不推荐这种方式:

SQL> SELECT ts#, COUNT(*) FROM ind$ GROUP BY ts#;
       TS#   COUNT(*)
---------- ----------
         6        239
        11         61
        13        103
         2        332
         4          9
         8         31
         0        858
7 ROWS selected.
SQL> DELETE tab$ WHERE ts# = 9;
10 ROWS deleted.
SQL> commit;
Commit complete.
SQL> DROP tablespace ala_test including contents AND datafiles;
Tablespace dropped.
Posted in ORACLE | Tagged , , , , | Leave a comment

数据泵造成的数据损失

客户尝试导入一个分区表的个别几个分区,结果却损失了整个分区表的数据。
下面通过一个例子来在线这个问题:
SQL> CREATE TABLE T_PART PARTITION BY RANGE (CREATED)
2 (PARTITION P1 VALUES LESS THAN (TO_DATE(‘2012-1-1’, ‘YYYY-MM-DD’)),
3 PARTITION P2 VALUES LESS THAN (TO_DATE(‘2012-2-1’, ‘YYYY-MM-DD’)),
4 PARTITION P3 VALUES LESS THAN (TO_DATE(‘2012-3-1’, ‘YYYY-MM-DD’)),
5 PARTITION P4 VALUES LESS THAN (TO_DATE(‘2012-4-1’, ‘YYYY-MM-DD’)),
6 PARTITION PMAX VALUES LESS THAN (MAXVALUE))
7 AS SELECT * FROM DBA_OBJECTS;
Table created.
SQL> host expdp test directory=d_output dumpfile=t_part.dp logfile=t_part.log tables=t_part:p3, t_part:p4
Export: Release 11.2.0.3.0 – Production on Tue Mar 13 12:17:00 2012
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
Password:
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 – 64bit Production
With the Partitioning, Oracle Label Security and Real Application Testing options
Starting “TEST”.”SYS_EXPORT_TABLE_01″: test/******** directory=d_output dumpfile=t_part.dp logfile=t_part.log tables=t_part:p3, t_part:p4
Estimate in progress using BLOCKS method…
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Total estimation using BLOCKS method: 16 MB
Processing object type TABLE_EXPORT/TABLE/TABLE
. . exported “TEST”.”T_PART”:”P3″ 29.89 KB 193 rows
. . exported “TEST”.”T_PART”:”P4″ 65 KB 425 rows
Master table “TEST”.”SYS_EXPORT_TABLE_01″ successfully loaded/unloaded
******************************************************************************
Dump file set for TEST.SYS_EXPORT_TABLE_01 is:
/home/oracle/t_part.dp
Job “TEST”.”SYS_EXPORT_TABLE_01″ successfully completed at 12:17:17
客户尝试导入几个分区,但是这些分区并不为空,于是在导入时指定了错误的TABLE_EXISTS_ACTION参数为REPLACE:
SQL> host impdp test directory=d_output dumpfile=t_part.dp logfile=t_part_imp.log tables=t_part:p3 table_exists_action=replace
Import: Release 11.2.0.3.0 – Production on Tue Mar 13 14:51:55 2012
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
Password:
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 – 64bit Production
With the Partitioning, Oracle Label Security and Real Application Testing options
Master table “TEST”.”SYS_IMPORT_TABLE_01″ successfully loaded/unloaded
Starting “TEST”.”SYS_IMPORT_TABLE_01″: test/******** directory=d_output dumpfile=t_part.dp logfile=t_part_imp.log tables=t_part:p3 table_exists_action=replace
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
. . imported “TEST”.”T_PART”:”P3″ 29.89 KB 193 rows
Job “TEST”.”SYS_IMPORT_TABLE_01″ successfully completed at 14:52:00
虽然导入的时候仅指定了一个分区,但是Oracle并不会对这一个分区执行TRUNCATE操作,而是将整个表DROP掉,然后利用数据泵中的源数据重建。
SQL> SELECT COUNT(*) FROM T_PART PARTITION (P1);
COUNT(*)
———-
0
SQL> SELECT COUNT(*) FROM T_PART PARTITION (P2);
COUNT(*)
———-
0
SQL> SELECT COUNT(*) FROM T_PART PARTITION (P3);
COUNT(*)
———-
193
SQL> SELECT COUNT(*) FROM T_PART PARTITION (P4);
COUNT(*)
———-
0
SQL> SELECT COUNT(*) FROM T_PART PARTITION (PMAX);
COUNT(*)
———-
0
这个表中只有导入的分区记录存在,原则上impdp对应的数据泵导出文件中的记录也都是可以恢复的,但是表中其他的分区数据则完全丢失。
而且Oracle在删除的时候还使用PURGE选项,使得分区表在删除的时候并没有被放到回收站中,而是直接从数据库中被清除。从这一点上讲,Oracle应该去掉PURGE语句,或者至少给出一个选项,毕竟DROP TABLE对于系统来说有风险的。
最近碰到两三次的问题都是和imp以及impdp有关,因此只要是涉及到数据库修改的,还是要谨慎处之。

Posted in ORACLE | Tagged , , , | Leave a comment

利用SCHEDULER调用shell脚本

10g的SCHEDULER的一个重要的功能是可以调用操作系统命令或SHELL脚本。而在10g以前,这个功能只能通过外部存储过程来实现。
下面是一个简单的例子,首先编辑一个test.sh脚本:

#!/usr/bin/ksh
echo abc >> /home/xxx/a.txt

注意,#!/usr/bin/ksh是必须的,否则JOB运行会出现ORA-27369错误。
给这个shell设置执行权限:

$ chmod 744 test.sh

下面就可以在数据库中建立PROGRAM:

SQL> BEGIN
  2     DBMS_SCHEDULER.CREATE_JOB(
  3             JOB_NAME => 'J_TEST', 
  4             JOB_TYPE => 'EXECUTABLE', 
  5             JOB_ACTION => '/home/xxx/test.sh', 
  6             ENABLED => TRUE, 
  7             AUTO_DROP => TRUE);
  8  END;
  9  /
PL/SQL PROCEDURE successfully completed.
SQL> SELECT * FROM DBA_SCHEDULER_JOBS WHERE JOB_NAME = 'J_TEST';
no ROWS selected
SQL> COL JOB_NAME FORMAT A30
SQL> SELECT TO_CHAR(LOG_DATE, 'YYYY-MM-DD HH24:MI:SS.FF TZH:TZM'), 
  2     JOB_NAME, 
  3     STATUS
  4  FROM DBA_SCHEDULER_JOB_RUN_DETAILS
  5  WHERE JOB_NAME = 'J_TEST';
TO_CHAR(LOG_DATE,'YYYY-MM-DDHH24:MI: JOB_NAME                       STATUS
------------------------------------ ------------------------------ ----------------------
2012-03-12 18:41:16.275361 +08:00    J_TEST                         SUCCEEDED

检查shell运行信息:

SQL> EXIT
Disconnected FROM Oracle DATABASE 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
WITH the Partitioning, OLAP, DATA Mining AND REAL Application Testing options
[xxx@hpserver2 ~]$ more a.txt 
abc
[xxx@hpserver2 ~]$ ls -l a.txt 
-rw-r--r-- 1 xxx oinstall 4 Mar 12 18:41 a.txt

通过这个简单的方法,就可以实现操作系统命令的调用。

Posted in ORACLE | Tagged , , , , | Leave a comment

分区表相关视图没有过滤删除表

一直认为,表被删除后,普通的视图中是查询不到表信息的,只有从RECYCLBIN相关视图才能看到被删除的表,没想到最近发现分区表的相关视图中仍然保留被删除表的信息。
看一个简单的例子:

SQL> CREATE TABLE T_PART (ID NUMBER, NAME VARCHAR2(30))
  2  PARTITION BY HASH (ID)
  3  PARTITIONS 4;
表已创建。
SQL> SELECT TABLE_NAME, PARTITIONING_TYPE, PARTITION_COUNT, STATUS
  2  FROM USER_PART_TABLES;
TABLE_NAME                     PARTITION PARTITION_COUNT STATUS
------------------------------ --------- --------------- --------
T_PART                         HASH                    4 VALID
SQL> SELECT TABLE_NAME, PARTITION_NAME, PARTITION_POSITION 
  2  FROM USER_TAB_PARTITIONS;
TABLE_NAME                     PARTITION_NAME                 PARTITION_POSITION
------------------------------ ------------------------------ ------------------
T_PART                         SYS_P61                                         1
T_PART                         SYS_P62                                         2
T_PART                         SYS_P63                                         3
T_PART                         SYS_P64                                         4
SQL> SELECT TABLE_NAME, STATUS, DROPPED
  2  FROM USER_TABLES
  3  WHERE TABLE_NAME = 'T_PART';
TABLE_NAME                     STATUS   DRO
------------------------------ -------- ---
T_PART                         VALID    NO

建立一个分区表,可以从分区相关的数据字典中查询到这个分区表的信息,下面删除这个表:

SQL> DROP TABLE T_PART;
表已删除。
SQL> SELECT TABLE_NAME, PARTITIONING_TYPE, PARTITION_COUNT, STATUS
  2  FROM USER_PART_TABLES;
TABLE_NAME                     PARTITION PARTITION_COUNT STATUS
------------------------------ --------- --------------- --------
BIN$dtFNG1JsS26GLpXZhpv56A==$0 HASH                    4 VALID
SQL> SELECT TABLE_NAME, PARTITION_NAME, PARTITION_POSITION 
  2  FROM USER_TAB_PARTITIONS;
TABLE_NAME                     PARTITION_NAME                 PARTITION_POSITION
------------------------------ ------------------------------ ------------------
BIN$dtFNG1JsS26GLpXZhpv56A==$0 SYS_P61                                         1
BIN$dtFNG1JsS26GLpXZhpv56A==$0 SYS_P62                                         2
BIN$dtFNG1JsS26GLpXZhpv56A==$0 SYS_P63                                         3
BIN$dtFNG1JsS26GLpXZhpv56A==$0 SYS_P64                                         4
SQL> SELECT TABLE_NAME, STATUS, DROPPED
  2  FROM USER_TABLES
  3  WHERE TABLE_NAME = 'T_PART';
未选定行
SQL> SELECT TABLE_NAME, STATUS, DROPPED
  2  FROM USER_TABLES
TABLE_NAME                     STATUS   DRO
------------------------------ -------- ---
SERVICES                       VALID    NO
COMPANIES                      VALID    NO
SERVICE_USERS                  VALID    NO
SERVICE_USAGE                  VALID    NO
SERVICE_RATES                  VALID    NO
T_DEFER                        VALID    NO
T                              VALID    NO
T_P                            VALID    NO
T_F                            VALID    NO
MLOG$_T_P                      VALID    NO
MLOG$_T_F                      VALID    NO
MV_T_ORACLE                    VALID    NO
T_LOAD_LOB                     VALID    NO
已选择13行。

显然表被删除后,记录从USER_TABLES表中被过滤,但是在USER_PART_TABLES和USER_TAB_PARTITIONS中记录仍然存在。这说明Oracle的视图的一致性工作做的并不好,导致视图中包含的记录性质不统一。
那么查询分区相关视图,需要关联*_TABLES视图,才能获取到当前没有被删除的分区表或表分区信息。当然如果对于严谨性要求不是那么高的话,通过表名进行过滤也是一个比较准确且简单易行的方法。
最后提一下,*_TABLES视图中的DROPPED列似乎并没有什么意义,因为被删除的对象已经从视图中被过滤,怀疑Oracle在设计字段和视图的时候并没有考虑好,从DBA_TABLES视图中可以发现,所有的记录这个值都是NO。

Posted in ORACLE | Tagged , , , , | Leave a comment

调用者存储过程访问权限问题

一直认为调用者权限过程所参考的对象是调用者用户下的对象,所依赖的权限也是调用者拥有的权限,但是现在发现事实并非完全如此。
事实上这个观点对于表、视图是正确的,调用者权限存储过程访问的表和视图是调用者当前用户下的,且权限都来自调用者本身。
但是对于存储过程,却是个例外,调用者权限存储过程中调用的OWNER用户下的存储过程,对于调用者而言,即使没有权限也是可以访问的。
描述有点抽象,看一个简单的例子:

SQL> CONN SYSTEM/MANAGER
已连接。
SQL> CREATE USER U1 IDENTIFIED BY U1 DEFAULT TABLESPACE USERS;
用户已创建。
SQL> GRANT CREATE SESSION TO U1;
授权成功。
SQL> CONN TEST/TEST 
已连接。
SQL> CREATE OR REPLACE PROCEDURE P_1 AUTHID CURRENT_USER AS
2 BEGIN
3 NULL;
4 END;
5 /
过程已创建。
SQL> CREATE OR REPLACE PROCEDURE P_2 AUTHID CURRENT_USER AS
2 BEGIN
3 P_1;
4 END;
5 /
过程已创建。
SQL> GRANT EXECUTE ON P_2 TO U1;
授权成功。
SQL> CONN U1/U1
已连接。
SQL> EXEC TEST.P_1
BEGIN TEST.P_1; END;
*1 行出现错误:
ORA-06550: 第 1,7 列:
PLS-00201: 必须声明标识符 'TEST.P_1'
ORA-06550: 第 1,7 列:
PL/SQL: Statement ignored
 
SQL> EXEC TEST.P_2
PL/SQL 过程已成功完成。

创建一个只有CREATE SESSION权限的用户。在另外一个用户下,创建两个调用者权限的存储过程,其中P_2调用P_1。将存储过程P_2的执行权限授权给P_1。用新创建的用户登录,可以看到,用户没有权限执行P_1,但是用户可以执行P_2,这意味着用户通过P_2的执行权限,调用了存储过程OWNER用户下调用者本没有权限的存储过程P_1。
这对于定义者权限存储过程是顺理成章的,但是对于调用者存储过程而言,对象的访问权限是在调用时判断,因此由调用者确定访问的对象和权限。没有想到,Oracle这里的对象的范围并不包含存储过程。
第一个过程是否是调用者权限并不重要,将其改为定义者权限,结果依旧:

SQL> CONN TEST/TEST
已连接。
SQL> CREATE OR REPLACE PROCEDURE P_1 AS
2 BEGIN
3 NULL;
4 END;
5 /
过程已创建。
SQL> CREATE OR REPLACE PROCEDURE P_2 AUTHID CURRENT_USER AS
2 BEGIN
3 P_1;
4 END;
5 /
过程已创建。
SQL> CONN U1/U1
已连接。
SQL> EXEC TEST.P_2
PL/SQL 过程已成功完成。
Posted in ORACLE | Tagged , , | Leave a comment

中止进程导致系统HANG住

以前在11g上碰到过一次类似的情况,由于ALTER SYSTEM KILL SESSION导致资源被完全占用,在一段时间内数据库处于HANG住状态。这次又碰到类似的问题。
简单描述一下问题产生的环境,用户在进行测试,在很短的时间内连续启动了多个应用服务器,导致大量的并发进程同时连接到数据库中,致使数据库服务器CPU利用率一下冲到100%。
由于数据库的这种状态,用户决定中止一些进程来释放服务器上的资源。但是通过kill -9和alter system kill session杀掉大量的会话后,数据库服务器反而处于HANG死状态,这时连sqlplus / as sysdba都无非正常登录。
于是用户继续通过kill -9清除所有非本地连接,到最后所有连接到数据库的非本地连接已经完全被杀掉,而服务器上的CPU资源已经下降,只有Oracle的PMON进程占用了单CPU的50%左右,其他CPU完全空闲。可是此时数据库仍然无非正常登录。
此时只能通过sqlplus –prelim “/ as sysdba”方式登录,然后利用oradebug执行systemstate的dump,查看导致数据库HANG死的原因。
检查SYSTEMSTATE的DUMP文件,发现PMON进程和大量的DEAD进程都在经历library cache: mutex X等待事件。而整个DUMP文件中library cache: mutex X等待事件出现了3000多次。这个等待事件是不正常的。
查询MOS发现果然是bug:Bug 9312879 “library cache: mutex x” waits after killing sessions / PMON slow to clean up。在11.1中,如果会话在KILL,那么PMON进程可能在清除进程会话是出现异常,导致清除进程失败后不断尝试,并最终产生这个问题。
这个BUG在11.2.0.1和11.1.0.7.7中被fixed,而Oracle对于这个问题的临时解决方案是不要kill会话。看来11.1版本和11.2相比确实是问题更多一些。

Posted in BUG | Tagged , , , | Leave a comment

ORA-7445(dbgrmqmqpk_query_pick_key)错误

客户数据库上出现了ORA-7445[dbgrmqmqpk_query_pick_key]错误。
详细错误信息为:

Sun Nov 13 00:20:24 2010
Exception [TYPE: SIGSEGV, Address NOT mapped TO object] [ADDR:0xB38F0000000049] [PC:0x100213D08, dbgrmqmqpk_query_pick_key()+0f88]
Errors IN file /opt/diag/rdbms/orcl/orcl/trace/orcl_m001_272500.trc (incident=137962):
ORA-07445: 出现异常错误: 核心转储 [dbgrmqmqpk_query_pick_key()+0f88] [SIGSEGV] [ADDR:0xB38F0000000049] [PC:0x100213D08] [Address NOT mapped TO object] []
Incident details IN: /opt/diag/rdbms/orcl/orcl/incident/incdir_137962/orcl_m001_272500_i137962.trc
Sun Nov 13 00:20:34 2010
Trace dumping IS performing id=[cdmp_20101113002034]
Sun Nov 13 00:20:24 2010
Sweep Incident[137962]: completed

显然错误信息与Mnnn进程有关,也就是说和AWR相关。查询MOS发现,和dbgrmqmqpk_query_pick_key有关的已知bug只有一个,Oracle在文档Bug 9390347 – ADR purge may dump (DIA-48457 [11]) [ID 9390347.8]描述了这个问题,当ADR在进行PURGE时,可能会导致这个错误的产生。
这个bug在11.2.0.2中被fixed,而当前的11.1.0.6正是当前被影响的版本。
由于错误并非每次重现,因此也可以选择忽略,如果错误频繁出现,那么应该考虑升级到11.1.0.7后应用bug对应的补丁,或者直接升级到11.2.0.2以上。

Posted in BUG | Tagged , , , | Leave a comment

ORA-600(2758)错误

客户数据库异常DOWN机,在启动过程中出现了这个错误。
由于掉电导致了客户数据库出现了控制文件的不一致,尝试启动报错如下:

SQL> startup
ORACLE instance started.
Total System Global Area 126950956 bytes
Fixed SIZE 454188 bytes
Variable SIZE 92274688 bytes
DATABASE Buffers 33554432 bytes
Redo Buffers 667648 bytes
*
ORA-00214: controlfile 'D:\ORACLE\ORADATA\SXXHDTS\CONTROL03.CTL' version 2623 inconsistent WITH file 'D:\ORACLE\ORADATA\SXXHDTS\CONTROL02.CTL' version 2619

显然这是控制文件不一致导致的,将最新的控制文件CONTROL03.CTL覆盖较旧的CONTROL02.CTL和CONTROL01.CTL,再次尝试打开,报错数据库需要恢复:

SQL> recover DATABASE;
Media recovery complete.
然后尝试打开数据库碰到ORA-600[2758]错误:
SQL> ALTER DATABASE OPEN;
ALTER DATABASE OPEN
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [2758], [1], [4294967295], [204800], [10], [], [], []

检查告警日志,可以发现更多的信息:

Mon Jan 09 12:02:08 2012
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
SCN scheme 2
USING log_archive_dest parameter DEFAULT VALUE
LICENSE_MAX_USERS = 0
SYS auditing IS disabled
Starting up ORACLE RDBMS Version: 9.2.0.1.0.
System parameters WITH non-DEFAULT VALUES:
processes = 150
timed_statistics = TRUE
shared_pool_size = 50331648
large_pool_size = 8388608
java_pool_size = 33554432
control_files = D:\oracle\oradata\sxxhdts\CONTROL01.CTL, D:\oracle\oradata\sxxhdts\CONTROL02.CTL, D:\oracle\oradata\sxxhdts\CONTROL03.CTL
db_block_size = 8192
db_cache_size = 25165824
compatible = 9.2.0.0.0
db_file_multiblock_read_count= 16
fast_start_mttr_target = 300
undo_management = AUTO
undo_tablespace = UNDOTBS1
undo_retention = 10800
remote_login_passwordfile= EXCLUSIVE
db_domain = 
instance_name = sxxhdts
dispatchers = (PROTOCOL=TCP) (SERVICE=sxxhdtsXDB)
job_queue_processes = 10
hash_join_enabled = TRUE
background_dump_dest = D:\oracle\admin\sxxhdts\bdump
user_dump_dest = D:\oracle\admin\sxxhdts\udump
core_dump_dest = D:\oracle\admin\sxxhdts\cdump
sort_area_size = 524288
db_name = sxxhdts
open_cursors = 300
star_transformation_enabled= FALSE
query_rewrite_enabled = FALSE
pga_aggregate_target = 25165824
aq_tm_processes = 1
PMON started WITH pid=2
DBW0 started WITH pid=3
LGWR started WITH pid=4
CKPT started WITH pid=5
SMON started WITH pid=6
RECO started WITH pid=7
CJQ0 started WITH pid=8
QMN0 started WITH pid=9
Mon Jan 09 12:02:11 2012
starting up 1 shared server(s) ...
starting up 1 dispatcher(s) FOR network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
Mon Jan 09 12:02:19 2012
ALTER DATABASE mount
Mon Jan 09 12:02:22 2012
ORA-214 signalled during: ALTER DATABASE mount...
Mon Jan 09 12:13:28 2012
ALTER DATABASE mount
Mon Jan 09 12:13:31 2012
ORA-214 signalled during: ALTER DATABASE mount...
Mon Jan 09 12:39:11 2012
ALTER DATABASE mount
Mon Jan 09 12:39:14 2012
ORA-214 signalled during: ALTER DATABASE mount
...
Mon Jan 09 12:41:17 2012
ALTER DATABASE mount
Mon Jan 09 12:41:21 2012
Successful mount OF redo thread 1, WITH mount id 633859757.
Mon Jan 09 12:41:21 2012
DATABASE mounted IN Exclusive Mode.
Completed: ALTER DATABASE mount
Mon Jan 09 12:41:27 2012
ALTER DATABASE OPEN
ORA-1113 signalled during: ALTER DATABASE OPEN...
Mon Jan 09 12:43:21 2012
Shutting down instance: further logons disabled
Shutting down instance (immediate)
License high water mark = 4
Waiting FOR dispatcher 'D000' TO shutdown
ALL dispatchers AND shared servers shutdown
Mon Jan 09 12:43:23 2012
ALTER DATABASE CLOSE NORMAL
ORA-1109 signalled during: ALTER DATABASE CLOSE NORMAL...
Mon Jan 09 12:43:23 2012
ALTER DATABASE DISMOUNT
Completed: ALTER DATABASE DISMOUNT
ARCH: Archiving IS disabled
Shutting down archive processes
Archiving IS disabled
Archive process shutdown avoided: 0 active
ARCH: Archiving IS disabled
Shutting down archive processes
Archiving IS disabled
Archive process shutdown avoided: 0 active
Mon Jan 09 12:45:57 2012
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
SCN scheme 2
USING log_archive_dest parameter DEFAULT VALUE
LICENSE_MAX_USERS = 0
SYS auditing IS disabled
Starting up ORACLE RDBMS Version: 9.2.0.1.0.
System parameters WITH non-DEFAULT VALUES:
processes = 150
timed_statistics = TRUE
shared_pool_size = 50331648
large_pool_size = 8388608
java_pool_size = 33554432
.
.
.
aq_tm_processes = 1
PMON started WITH pid=2
DBW0 started WITH pid=3
LGWR started WITH pid=4
CKPT started WITH pid=5
SMON started WITH pid=6
RECO started WITH pid=7
CJQ0 started WITH pid=8
QMN0 started WITH pid=9
Mon Jan 09 12:46:00 2012
starting up 1 shared server(s) ...
starting up 1 dispatcher(s) FOR network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
Mon Jan 09 12:46:00 2012
ALTER DATABASE MOUNT
Mon Jan 09 12:46:04 2012
Successful mount OF redo thread 1, WITH mount id 633842120.
Mon Jan 09 12:46:04 2012
DATABASE mounted IN Exclusive Mode.
Completed: ALTER DATABASE MOUNT
Mon Jan 09 12:52:21 2012
ALTER DATABASE RECOVER DATABASE 
Mon Jan 09 12:52:21 2012
Media Recovery START
WARNING! Recovering DATA file 2 FROM a fuzzy backup. It might be an online
backup taken WITHOUT entering the BEGIN backup command.
WARNING! Recovering DATA file 4 FROM a fuzzy backup. It might be an online
backup taken WITHOUT entering the BEGIN backup command.
WARNING! Recovering DATA file 5 FROM a fuzzy backup. It might be an online
backup taken WITHOUT entering the BEGIN backup command.
WARNING! Recovering DATA file 6 FROM a fuzzy backup. It might be an online
backup taken WITHOUT entering the BEGIN backup command.
WARNING! Recovering DATA file 8 FROM a fuzzy backup. It might be an online
backup taken WITHOUT entering the BEGIN backup command.
WARNING! Recovering DATA file 9 FROM a fuzzy backup. It might be an online
backup taken WITHOUT entering the BEGIN backup command.
WARNING! Recovering DATA file 10 FROM a fuzzy backup. It might be an online
backup taken WITHOUT entering the BEGIN backup command.
Starting datafile 2 recovery IN thread 1 SEQUENCE 317
Datafile 2: 'D:\ORACLE\ORADATA\SXXHDTS\UNDOTBS01.DBF'
Starting datafile 4 recovery IN thread 1 SEQUENCE 317
Datafile 4: 'D:\ORACLE\ORADATA\SXXHDTS\DRSYS01.DBF'
Starting datafile 5 recovery IN thread 1 SEQUENCE 317
Datafile 5: 'D:\ORACLE\ORADATA\SXXHDTS\EXAMPLE01.DBF'
Starting datafile 6 recovery IN thread 1 SEQUENCE 317
Datafile 6: 'D:\ORACLE\ORADATA\SXXHDTS\INDX01.DBF'
Starting datafile 8 recovery IN thread 1 SEQUENCE 317
Datafile 8: 'D:\ORACLE\ORADATA\SXXHDTS\TOOLS01.DBF'
Starting datafile 9 recovery IN thread 1 SEQUENCE 317
Datafile 9: 'D:\ORACLE\ORADATA\SXXHDTS\USERS01.DBF'
Starting datafile 10 recovery IN thread 1 SEQUENCE 317
Datafile 10: 'D:\ORACLE\ORADATA\SXXHDTS\XDB01.DBF'
Media Recovery Log 
Recovery OF Online Redo Log: Thread 1 GROUP 1 Seq 317 Reading mem 0
Mem# 0 errs 0: D:\ORACLE\ORADATA\SXXHDTS\REDO01.LOG
Media Recovery Complete
Completed: ALTER DATABASE RECOVER DATABASE 
Mon Jan 09 12:52:32 2012
ALTER DATABASE OPEN
Mon Jan 09 12:52:32 2012
Thread 1 opened at log SEQUENCE 317
CURRENT log# 1 seq# 317 mem# 0: D:\ORACLE\ORADATA\SXXHDTS\REDO01.LOG
Successful OPEN OF redo thread 1.
Mon Jan 09 12:52:32 2012
Errors IN file d:\oracle\admin\sxxhdts\bdump\sxxhdts_lgwr_5956.trc:
ORA-00345: redo log WRITE error block 4294967295 COUNT 1
ORA-00312: online log 1 thread 1: 'D:\ORACLE\ORADATA\SXXHDTS\REDO01.LOG'
ORA-27069: skgfdisp: attempt TO do I/O beyond the range OF the file
OSD-04026: 传递的参数无效。 (OS 4294967295)
Mon Jan 09 12:52:33 2012
Errors IN file d:\oracle\admin\sxxhdts\bdump\sxxhdts_lgwr_5956.trc:
ORA-00340: IO error processing online log 1 OF thread 1
ORA-00345: redo log WRITE error block 4294967295 COUNT 1
ORA-00312: online log 1 thread 1: 'D:\ORACLE\ORADATA\SXXHDTS\REDO01.LOG'
ORA-27069: skgfdisp: attempt TO do I/O beyond the range OF the file
OSD-04026: 传递的参数无效。 (OS 4294967295)
LGWR: terminating instance due TO error 340
Mon Jan 09 12:52:33 2012
Errors IN file d:\oracle\admin\sxxhdts\bdump\sxxhdts_pmon_4752.trc:
ORA-00340: IO error processing online log OF thread 
Instance TERMINATED BY LGWR, pid = 5956
Mon Jan 09 13:02:04 2012
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
SCN scheme 2
USING log_archive_dest parameter DEFAULT VALUE
LICENSE_MAX_USERS = 0
SYS auditing IS disabled
Starting up ORACLE RDBMS Version: 9.2.0.1.0.
System parameters WITH non-DEFAULT VALUES:
processes = 150
timed_statistics = TRUE
shared_pool_size = 50331648
.
.
.
aq_tm_processes = 1
PMON started WITH pid=2
DBW0 started WITH pid=3
LGWR started WITH pid=4
CKPT started WITH pid=5
SMON started WITH pid=6
RECO started WITH pid=7
CJQ0 started WITH pid=8
QMN0 started WITH pid=9
Mon Jan 09 13:02:07 2012
starting up 1 shared server(s) ...
starting up 1 dispatcher(s) FOR network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
Mon Jan 09 13:02:08 2012
ALTER DATABASE MOUNT
Mon Jan 09 13:02:12 2012
Successful mount OF redo thread 1, WITH mount id 633838736.
Mon Jan 09 13:02:12 2012
DATABASE mounted IN Exclusive Mode.
Completed: ALTER DATABASE MOUNT
Mon Jan 09 13:02:24 2012
ALTER DATABASE OPEN
Mon Jan 09 13:02:25 2012
Beginning crash recovery OF 1 threads
Mon Jan 09 13:02:25 2012
Started FIRST pass scan
Mon Jan 09 13:02:25 2012
Completed FIRST pass scan
0 redo blocks READ, 0 DATA blocks need recovery
Mon Jan 09 13:02:25 2012
Started recovery at
Thread 1: logseq 317, block 4294967295, scn 0.22473385
Recovery OF Online Redo Log: Thread 1 GROUP 1 Seq 317 Reading mem 0
Mem# 0 errs 0: D:\ORACLE\ORADATA\SXXHDTS\REDO01.LOG
Mon Jan 09 13:02:25 2012
Ended recovery at
Thread 1: logseq 317, block 4294967295, scn 0.22493386
0 DATA blocks READ, 0 DATA blocks written, 0 redo blocks READ
Mon Jan 09 13:02:25 2012
Errors IN file d:\oracle\admin\sxxhdts\udump\sxxhdts_ora_4844.trc:
ORA-00600: 内部错误代码,参数: [2758], [1], [4294967295], [204800], [8], [], [], []
ORA-600 signalled during: ALTER DATABASE OPEN
...

告警日志中事实上记录了更完整的信息,在第一次尝试打开数据库的时候,报错信息更将详尽,指出Oracle尝试在写REDO01.LOG日志时,返回了操作系统错误。而Oracle尝试写的REDO的位置恰好是4G-1,如果对比第二次打开时ORA-600错误的信息,就可以看到,事实上两次报错描述的是相同的问题。2758错误的第二个参数1指的就是GROUP 1,而4294967295就是前面报错的信息。
显然导致错误的原因是由于Oracle在写日志时出现了错误。由于数据库没有打开,无法清除日志文件,所以常规手段不起作用。
除了通过BBED修改控制文件中的信息外,还可以通过更常规一点的手段,通过添加隐含参数来解决:

Mon Jan 09 14:05:43 2012
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
SCN scheme 2
USING log_archive_dest parameter DEFAULT VALUE
LICENSE_MAX_USERS = 0
SYS auditing IS disabled
Starting up ORACLE RDBMS Version: 9.2.0.1.0.
System parameters WITH non-DEFAULT VALUES:
processes = 150
timed_statistics = TRUE
shared_pool_size = 50331648
large_pool_size = 8388608
java_pool_size = 33554432
control_files = D:\oracle\oradata\sxxhdts\CONTROL01.CTL, D:\oracle\oradata\sxxhdts\CONTROL02.CTL, D:\oracle\oradata\sxxhdts\CONTROL03.CTL
db_block_size = 8192
db_cache_size = 25165824
compatible = 9.2.0.0.0
db_file_multiblock_read_count= 16
fast_start_mttr_target = 300
_allow_resetlogs_corruption= TRUE
undo_management = AUTO
.
.
.
aq_tm_processes = 1
PMON started WITH pid=2
DBW0 started WITH pid=3
LGWR started WITH pid=4
CKPT started WITH pid=5
SMON started WITH pid=6
RECO started WITH pid=7
CJQ0 started WITH pid=8
QMN0 started WITH pid=9
Mon Jan 09 14:05:46 2012
starting up 1 shared server(s) ...
starting up 1 dispatcher(s) FOR network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
Mon Jan 09 14:05:47 2012
ALTER DATABASE MOUNT
Mon Jan 09 14:05:51 2012
Successful mount OF redo thread 1, WITH mount id 633898875.
Mon Jan 09 14:05:51 2012
DATABASE mounted IN Exclusive Mode.
Completed: ALTER DATABASE MOUNT
Mon Jan 09 14:07:06 2012
ALTER DATABASE RECOVER DATABASE until cancel 
Mon Jan 09 14:07:06 2012
Media Recovery START
Media Recovery NOT Required
Completed: ALTER DATABASE RECOVER DATABASE until cancel 
Mon Jan 09 14:10:06 2012
ALTER DATABASE RECOVER DATABASE USING backup controlfile until cancel 
Mon Jan 09 14:10:06 2012
Media Recovery START
Starting datafile 1 recovery IN thread 1 SEQUENCE 317
Datafile 1: 'D:\ORACLE\ORADATA\SXXHDTS\SYSTEM01.DBF'
Starting datafile 2 recovery IN thread 1 SEQUENCE 317
Datafile 2: 'D:\ORACLE\ORADATA\SXXHDTS\UNDOTBS01.DBF'
Starting datafile 3 recovery IN thread 1 SEQUENCE 317
Datafile 3: 'D:\ORACLE\ORADATA\SXXHDTS\CWMLITE01.DBF'
Starting datafile 4 recovery IN thread 1 SEQUENCE 317
Datafile 4: 'D:\ORACLE\ORADATA\SXXHDTS\DRSYS01.DBF'
Starting datafile 5 recovery IN thread 1 SEQUENCE 317
Datafile 5: 'D:\ORACLE\ORADATA\SXXHDTS\EXAMPLE01.DBF'
Starting datafile 6 recovery IN thread 1 SEQUENCE 317
Datafile 6: 'D:\ORACLE\ORADATA\SXXHDTS\INDX01.DBF'
Starting datafile 7 recovery IN thread 1 SEQUENCE 317
Datafile 7: 'D:\ORACLE\ORADATA\SXXHDTS\ODM01.DBF'
Starting datafile 8 recovery IN thread 1 SEQUENCE 317
Datafile 8: 'D:\ORACLE\ORADATA\SXXHDTS\TOOLS01.DBF'
Starting datafile 9 recovery IN thread 1 SEQUENCE 317
Datafile 9: 'D:\ORACLE\ORADATA\SXXHDTS\USERS01.DBF'
Starting datafile 10 recovery IN thread 1 SEQUENCE 317
Datafile 10: 'D:\ORACLE\ORADATA\SXXHDTS\XDB01.DBF'
Media Recovery Log 
ORA-279 signalled during: ALTER DATABASE RECOVER DATABASE USING backup con...
Mon Jan 09 14:10:13 2012
ALTER DATABASE RECOVER CANCEL 
Media Recovery Cancelled
Completed: ALTER DATABASE RECOVER CANCEL 
Mon Jan 09 14:10:27 2012
ALTER DATABASE OPEN resetlogs
Mon Jan 09 14:10:27 2012
RESETLOGS IS being done WITHOUT consistancy checks. This may RESULT
IN a corrupted DATABASE. The DATABASE should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 22473386
Resetting resetlogs activation ID 615377539 (0x24adea83)
Mon Jan 09 14:10:30 2012
Assigning activation ID 633898875 (0x25c8877b)
Thread 1 opened at log SEQUENCE 1
CURRENT log# 1 seq# 1 mem# 0: D:\ORACLE\ORADATA\SXXHDTS\REDO01.LOG
Successful OPEN OF redo thread 1.
Mon Jan 09 14:10:30 2012
SMON: enabling cache recovery
Mon Jan 09 14:10:31 2012
Undo Segment 1 Onlined
Undo Segment 2 Onlined
Undo Segment 3 Onlined
Undo Segment 4 Onlined
Undo Segment 5 Onlined
Undo Segment 6 Onlined
Undo Segment 7 Onlined
Undo Segment 8 Onlined
Undo Segment 9 Onlined
Undo Segment 10 Onlined
Successfully onlined Undo Tablespace 1.
Dictionary CHECK beginning
Dictionary CHECK complete
Mon Jan 09 14:10:31 2012
SMON: enabling tx recovery
Mon Jan 09 14:10:31 2012
DATABASE Characterset IS ZHS16GBK
replication_dependency_tracking turned off (no async multimaster replication found)
Completed: ALTER DATABASE OPEN resetlogs

可以看到,通过设置隐含参数_allow_resetlogs_corruption为TRUE,并执行不完全恢复,成功打开了数据库。
其实这种方式并没有造成数据的损失,只是避免了数据库在打开的时候尝试在错误的偏移量上去对REDO进行写操作。

Posted in ORACLE | Tagged , , , , , , , , , | Leave a comment