缺少归档导致不完整恢复

简单介绍一次客户由于缺少归档造成数据丢失的案例。
分享一个案例:客户的一个测试数据库要迁移到另一台服务器上,客户DBA在将数据库备份到带库后,对原数据库服务器进行了格式化。
当DBA在目标服务器上进行数据库的恢复时发现了问题。数据库的备份虽然成功完成,但是随后的归档日志的备份由于报错失败了。现在只有一个全库的热备份,而没有任何归档日志的备份。目前已经不是丢失数据的问题了,而是缺少一致性的备份,数据库根本无法打开。
当然这个案例的恢复对于我们来说不算什么困难的事情,全库恢复后,直接RECOVER UNTIL CANCEL,尝试直接OPEN RESETLOGS,会出现错误:

SQL> conn / AS sysdba
Connected.
SQL> ALTER DATABASE OPEN READ ONLY;
ALTER DATABASE OPEN READ ONLY
*
ERROR at line 1:
ORA-16004: backup DATABASE requires recovery
ORA-01152: file 1 was NOT restored FROM a sufficiently OLD backup
ORA-01110: DATA file 1: '/oradata/orcl/system01.dbf'

通过添加“_allow_resetlogs_corruption”=true,然后RECOVER DATABASE USING BACKUP CONTROLFILE UNTIL CANCEL;最后通过ALTER DATABASE OPEN RESETLOGS打开。
数据库打开后,很快就会由于尝试恢复UNDO中的信息导致数据库的再次DOWN掉:

Mon Jan 16 14:27:05 2012
Errors IN file /oracle/admin/orcl/bdump/orcl_smon_53215672.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Opening WITH internal Resource Manager plan
WHERE NUMA PG = 1, CPUs = 48
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started WITH pid=40, OS id=40239468
Mon Jan 16 14:27:06 2012
ORACLE Instance orcl (pid = 13) - Error 600 encountered while recovering TRANSACTION (9, 7).
Mon Jan 16 14:27:06 2012
Errors IN file /oracle/admin/orcl/bdump/orcl_smon_53215672.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Mon Jan 16 14:27:07 2012
ORA-01555 caused BY SQL statement below (SQL ID: 5wc2915k44m38, Query Duration=0 sec, SCN: 0x0000.003bd268):
Mon Jan 16 14:27:07 2012
SELECT USER#,TYPE# FROM USER$ WHERE name=:1
Mon Jan 16 14:27:07 2012
LOGSTDBY: Validating controlfile WITH logical metadata
Mon Jan 16 14:27:07 2012
LOGSTDBY: Validation complete
Completed: ALTER DATABASE OPEN resetlogs
Mon Jan 16 14:32:12 2012
Errors IN file /oracle/admin/orcl/bdump/orcl_smon_53215672.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Mon Jan 16 14:32:13 2012
ORACLE Instance orcl (pid = 13) - Error 600 encountered while recovering TRANSACTION (9, 7).
Mon Jan 16 14:32:13 2012
Errors IN file /oracle/admin/orcl/bdump/orcl_smon_53215672.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Mon Jan 16 14:37:15 2012
Errors IN file /oracle/admin/orcl/bdump/orcl_smon_53215672.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Mon Jan 16 14:37:16 2012
ORACLE Instance orcl (pid = 13) - Error 600 encountered while recovering TRANSACTION (9, 7).
Mon Jan 16 14:37:16 2012
Errors IN file /oracle/admin/orcl/bdump/orcl_smon_53215672.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Mon Jan 16 14:42:17 2012
Errors IN file /oracle/admin/orcl/bdump/orcl_smon_53215672.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Mon Jan 16 14:42:18 2012
ORACLE Instance orcl (pid = 13) - Error 600 encountered while recovering TRANSACTION (9, 7).
Mon Jan 16 14:42:18 2012
Errors IN file /oracle/admin/orcl/bdump/orcl_smon_53215672.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Mon Jan 16 14:47:19 2012
Errors IN file /oracle/admin/orcl/bdump/orcl_smon_53215672.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Mon Jan 16 14:47:20 2012
Errors IN file /oracle/admin/orcl/bdump/orcl_pmon_58655160.trc:
ORA-00474: SMON process TERMINATED WITH error
Mon Jan 16 14:47:20 2012
PMON: terminating instance due TO error 474
Instance TERMINATED BY PMON, pid = 58655160

对于这个错误,可以通过设置EVENTS 10513来避免后台PMON进程进行数据库的事务回滚,也可以通过设置_CORRUPTED_ROLLBACK_SEGMENTS来避免这个错误的产生。
再次重启后,通过EXP导出全库,重建数据库后导入即可。
其实这里想要探讨的并非是恢复技术本身,而是备份策略以及备份有效性检查的重要性。DBA在备份完成后,只需要简单的看一下备份输出的LOG文件,就可以马上判断备份是否完整,可惜的是,就是这个简单的确认操作没有进行,导致了最终重启数据库且丢失数据的损失。

Posted in ORACLE | Tagged , , , , | Leave a comment

ORA-600(krhpfh_03)错误

测试服务器上出现这个错误信息。
这个错误是发生在表空间OFFLINE的时候:

2012-01-07 00:42:13.026000 +08:00
CREATE tablespace test datafile SIZE 10m
2012-01-07 00:42:15.619000 +08:00
Completed: CREATE tablespace test datafile SIZE 10m
2012-01-07 00:46:25.448000 +08:00
Errors IN file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_13951.trc:
ORA-01116: error IN opening DATABASE file 5
ORA-01110: DATA file 5: '/u01/app/oracle/oradata/ORCL/datafile/o1_mf_test_7jg93510_.dbf'
ORA-27041: unable TO OPEN file
Solaris-AMD64 Error: 2: No such file OR directory
Additional information: 3
Checker run found 1 NEW persistent DATA failures
2012-01-07 00:46:26.593000 +08:00
Starting background process SMCO
SMCO started WITH pid=22, OS id=14017
2012-01-07 00:50:24.669000 +08:00
ALTER DATABASE datafile 5 offline
ORA-1145 signalled during: ALTER DATABASE datafile 5 offline...
2012-01-07 00:50:36.918000 +08:00
ALTER tablespace test offline
READ OF datafile '/u01/app/oracle/oradata/ORCL/datafile/o1_mf_test_7jg93510_.dbf' (fno 5) header failed WITH ORA-01208
Rereading datafile 5 header failed WITH ORA-01208
Errors IN file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_13965.trc (incident=3337):
ORA-00600: internal error code, arguments: [krhpfh_03-1208], [fno =], [5], [fecpc =], [4], [fhcpc =], [3], [], [], [], [], []
ORA-01110: DATA file 5: '/u01/app/oracle/oradata/ORCL/datafile/o1_mf_test_7jg93510_.dbf'
ORA-01122: DATABASE file 5 failed verification CHECK
ORA-01110: DATA file 5: '/u01/app/oracle/oradata/ORCL/datafile/o1_mf_test_7jg93510_.dbf'
ORA-01208: DATA file IS an OLD version - NOT accessing CURRENT version
Incident details IN: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_3337/orcl_ora_13965_i3337.trc
2012-01-07 00:50:38.613000 +08:00
USE ADRCI OR Support Workbench TO package the incident.
See Note 411.1 at My Oracle Support FOR error AND packaging details.
Dumping diagnostic DATA IN directory=[cdmp_20120107005038], requested BY (instance=1, osid=13965), summary=[incident=3337].
ORA-600 signalled during: ALTER tablespace test offline...

这个错误产生的原因是文件头被损坏,其中krhpfh_03-1208错误信息中包含了两部分信息,前一部分krhpfh_03是ORA-600错误的错误号,而后面的1208是导致这个ORA-600错误的错误号,也就是ORA-01208: data file is an old version – not accessing current version。导致这个ORA-600出现的其他常见错误还包括ORA-01202: wrong incarnation of this file – wrong creation time;ORA-01204: file number is string rather than string – wrong file ;ORA-01206: file is not part of this database – wrong database id ;ORA-01209: data file is from before the last RESETLOGS以及ORA-01210: data file header is media corrupt等等。
总的来说就是Oracle发现数据文件的头与控制文件中的不符所致。一般而言,不是由于磁盘损坏了头块,就是由于误操作所致。
当前这个ORA-1208错误比较特别,是BBED修改数据文件的文件头所致,解决方案除了常规的恢复之外,就只有通过BBED修改文件头这个方法了。

Posted in ORACLE | Tagged , , , , , , , , , , , | Leave a comment

修改数据字典表导致ORA-600(ktecgsc:kcbz_objdchk)错误

公司的测试服务环境被整崩溃了,简单记录一下错误。
同事发现一个测试数据库无法登录,对数据库执行SHUTDOWN ABORT后,重新启动bootstrap报错:
[ora11g@hpserver2 ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.3.0 Production on Sun Mar 4 00:37:58 2012
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup
ORACLE instance started.
Total System Global Area 2555744256 bytes
Fixed Size 2230912 bytes
Variable Size 637535616 bytes
Database Buffers 1862270976 bytes
Redo Buffers 53706752 bytes
Database mounted.
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-08103: object no longer exists
Process ID: 20964
Session ID: 125 Serial number: 5
数据库怎么会突然出现这么严重的问题,检查告警日志发现了如下的错误:
Errors in file /u02/app/oracle/diag/rdbms/orcl11g/orcl11g/trace/orcl11g_ora_31804.trc (incident=127465):
ORA-00600: internal error code, arguments: [ktecgsc:kcbz_objdchk], [0], [0], [1], [], [], [], [], [], [], [], []
Incident details in: /u02/app/oracle/diag/rdbms/orcl11g/orcl11g/incident/incdir_127465/orcl11g_ora_31804_i127465.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Dumping diagnostic data in directory=[cdmp_20120302133841], requested by (instance=1, osid=31804), summary=[incident=127465].
Sweep [inc][127465]: completed
Sweep [inc2][127465]: completed
Errors in file /u02/app/oracle/diag/rdbms/orcl11g/orcl11g/trace/orcl11g_ora_31900.trc (incident=127449):
ORA-00600: internal error code, arguments: [kkdlGetBaseUser2:authIdType], [0], [27], [GLOBAL_AQ_USER_ROLE], [], [], [], [], [], [], [], []
Incident details in: /u02/app/oracle/diag/rdbms/orcl11g/orcl11g/incident/incdir_127449/orcl11g_ora_31900_i127449.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Dumping diagnostic data in directory=[cdmp_20120302134009], requested by (instance=1, osid=31900), summary=[incident=127449].
Sweep [inc][127449]: completed
Sweep [inc2][127449]: completed
检查第一个ORA-600对应的TRACE,发现出现错误的会话在执行导入:
[ora11g@hpserver2 ~]$ more /u02/app/oracle/diag/rdbms/orcl11g/orcl11g/trace/orcl11g_ora_31804.trc
Trace file /u02/app/oracle/diag/rdbms/orcl11g/orcl11g/trace/orcl11g_ora_31804.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 – 64bit Production
With the Partitioning, Oracle Label Security and Real Application Testing options
ORACLE_HOME = /u02/app/oracle/product/11.2.0/db_1
System name: Linux
Node name: hpserver2.enmotech.com
Release: 2.6.32-100.28.5.el6.x86_64
Version: #1 SMP Wed Feb 2 18:40:23 EST 2011
Machine: x86_64
Instance name: orcl11g
Redo thread mounted by this instance: 1
Oracle process number: 29
Unix process pid: 31804, image: oracle@hpserver2.enmotech.com (TNS V1-V3)
*** 2012-03-02 13:38:34.271
*** SESSION ID:(141.3959) 2012-03-02 13:38:34.271
*** CLIENT ID:() 2012-03-02 13:38:34.271
*** SERVICE NAME:(SYS$USERS) 2012-03-02 13:38:34.271
*** MODULE NAME:(imp@hpserver2.enmotech.com (TNS V1-V3)) 2012-03-02 13:38:34.271
*** ACTION NAME:() 2012-03-02 13:38:34.271
*** SESSION ID:(141.3959) 2012-03-02 13:38:34.270
DATA seg.obj=-2, on-disk obj=27, dsflg=0, dsobj=74, cls=4
Incident 127465 created, dump file: /u02/app/oracle/diag/rdbms/orcl11g/orcl11g/incident/incdir_127465/orcl11g_ora_31804_i127465.trc
ORA-00600: internal error code, arguments: [ktecgsc:kcbz_objdchk], [0], [0], [1], [], [], [], [], [], [], [], []
*** SESSION ID:(141.3959) 2012-03-02 13:38:40.815
DATA seg.obj=0, on-disk obj=27, dsflg=0, dsobj=74, cls=4
和同事确认了一下,确实有个同事在导入数据字典表,而且本来应该导入到单独的用户下,但是导入时忘了添加TOUSER,导致数据被导入到SYS用户下,造成了数据字典的不一致,从而引发了这个问题。
这里并不是要解决这个错误,事实上如果没有完整的备份,数据库已经很难恢复了。这里主要是记录这个问题,希望给其他人引以为戒。
DBA总要做一些高风险的操作,而DBA又有足够的知识和能力导致问题一般情况下难以产生的前提要求得以实现。
比如当前这次事故,如果不是DBA,没有一定专业的知识,无法使用IMP方式来导入数据字典表。而对于缺少了TOUSER参数的情况下,表会导入到SYS用户下,但是这会出现表已经存在的错误,而使得导入失败。而对于DBA来说,这种常见错误并不会引发DBA的足够重视,一个IGNORE=Y参数的使用,最终酿成了这次事故。

Posted in ORACLE | Tagged , , , , , , , | Leave a comment

Database Firewall安全手册

前不久看完安全的管理员手册,发现Firewall的功能确实是其他产品难以代替的。
如果说管理员手册关注的是如何操作、设置和维护Firewall,那么安全手册主要描述的是Firewall的主要功能,以及在安全方面所带来的审计、分析、报表功能。
随着最近众多安全事情的不断产生,估计Firewall在国内的市场很快就会发展起来,和Oracle的其他技术以及产品一样,把Firewall配置的合理且高效同样不是一间很容易的事情,如果策略配置的太严格,可能造成正常的连接被误杀,而如果配置的太松,又可能造成恶意的操作被放行。
和VM一样,Firewall的文档位置也相对比较隐蔽:http://docs.oracle.com/cd/E26572_01/doc/doc.51/e22688/toc.htm

Posted in BOOKS | Leave a comment

ORA-600(kjrpar:noalloc)错误

由于ORA-4031引发的ORA-600错误。
告警日志中错误如下:

Sat Jan 28 06:07:49 2012
Errors IN file /export/home/OracleHomes/admin/jykfdb/bdump/jykfdb1_j000_21874.trc:
ORA-12012: error ON auto EXECUTE OF job 66
ORA-04031: unable TO allocate 6592 bytes OF shared memory ("shared pool","STANDARDSYS","PL/SQL MPCODE","PKR: PL_UEP")
Sat Jan 28 06:08:15 2012
Errors IN file /export/home/OracleHomes/admin/jykfdb/bdump/jykfdb1_j000_21874.trc:
ORA-12012: error ON auto EXECUTE OF job 87
ORA-04031: unable TO allocate 6592 bytes OF shared memory ("shared pool","STANDARDSYS","PL/SQL MPCODE","PKR: PL_UEP")
Sat Jan 28 06:08:50 2012
Errors IN file /export/home/OracleHomes/admin/jykfdb/bdump/jykfdb1_j000_21874.trc:
ORA-12012: error ON auto EXECUTE OF job 66
ORA-04031: unable TO allocate 6592 bytes OF shared memory ("shared pool","STANDARDSYS","PL/SQL MPCODE","PKR: PL_UEP")
Sat Jan 28 06:09:50 2012
Errors IN file /export/home/OracleHomes/admin/jykfdb/bdump/jykfdb1_j000_21874.trc:
ORA-12012: error ON auto EXECUTE OF job 66
ORA-04031: unable TO allocate 6592 bytes OF shared memory ("shared pool","STANDARDSYS","PL/SQL MPCODE","PKR: PL_UEP")
Sat Jan 28 06:10:15 2012
Errors IN file /export/home/OracleHomes/admin/jykfdb/bdump/jykfdb1_j000_21874.trc:
ORA-12012: error ON auto EXECUTE OF job 87
ORA-04031: unable TO allocate 6592 bytes OF shared memory ("shared pool","STANDARDSYS","PL/SQL MPCODE","PKR: PL_UEP")
Sat Jan 28 06:10:51 2012
Errors IN file /export/home/OracleHomes/admin/jykfdb/bdump/jykfdb1_j000_21874.trc:
ORA-12012: error ON auto EXECUTE OF job 66
ORA-04031: unable TO allocate 6592 bytes OF shared memory ("shared pool","STANDARDSYS","PL/SQL MPCODE","PKR: PL_UEP")
Sat Jan 28 06:11:34 2012
Errors IN file /export/home/OracleHomes/admin/jykfdb/bdump/jykfdb1_lmd0_24470.trc:
ORA-00600: internal error code, arguments: [kjrpar:noalloc], [1], [385], [0x1066F8850], [0x956F0C4C0], [0x94A9BE408], [], []
Sat Jan 28 06:11:35 2012
Errors IN file /export/home/OracleHomes/admin/jykfdb/bdump/jykfdb1_lmd0_24470.trc:
ORA-00600: internal error code, arguments: [kjrpar:noalloc], [1], [385], [0x1066F8850], [0x956F0C4C0], [0x94A9BE408], [], []
Sat Jan 28 06:11:35 2012
Trace dumping IS performing id=[cdmp_20120128061135]
Sat Jan 28 06:11:41 2012
Errors IN file /export/home/OracleHomes/admin/jykfdb/bdump/jykfdb1_lmd0_24470.trc:
ORA-00600: internal error code, arguments: [kjrpar:noalloc], [1], [385], [0x1066F8850], [0x956F0C4C0], [0x94A9BE408], [], []
Sat Jan 28 06:11:41 2012
LMD0: terminating instance due TO error 482
Sat Jan 28 06:11:41 2012
Errors IN file /export/home/OracleHomes/admin/jykfdb/bdump/jykfdb1_pmon_24462.trc:
ORA-00482: LMD* process TERMINATED WITH error
Sat Jan 28 06:11:41 2012
Errors IN file /export/home/OracleHomes/admin/jykfdb/bdump/jykfdb1_lms2_24480.trc:
ORA-00482: LMD* process TERMINATED WITH error
Sat Jan 28 06:11:41 2012
Errors IN file /export/home/OracleHomes/admin/jykfdb/bdump/jykfdb1_lms5_24505.trc:
ORA-00482: LMD* process TERMINATED WITH error
Sat Jan 28 06:11:41 2012
Errors IN file /export/home/OracleHomes/admin/jykfdb/bdump/jykfdb1_lmon_24468.trc:
ORA-00482: LMD* process TERMINATED WITH error
Sat Jan 28 06:11:41 2012
Errors IN file /export/home/OracleHomes/admin/jykfdb/bdump/jykfdb1_lms3_24492.trc:
ORA-00482: LMD* process TERMINATED WITH error
Sat Jan 28 06:11:41 2012
Errors IN file /export/home/OracleHomes/admin/jykfdb/bdump/jykfdb1_lms0_24472.trc:
ORA-00482: LMD* process TERMINATED WITH error
Sat Jan 28 06:11:41 2012
Errors IN file /export/home/OracleHomes/admin/jykfdb/bdump/jykfdb1_lms4_24496.trc:
ORA-00482: LMD* process TERMINATED WITH error
Sat Jan 28 06:11:41 2012
Errors IN file /export/home/OracleHomes/admin/jykfdb/bdump/jykfdb1_lms1_24476.trc:
ORA-00482: LMD* process TERMINATED WITH error
Sat Jan 28 06:11:41 2012
Errors IN file /export/home/OracleHomes/admin/jykfdb/bdump/jykfdb1_cjq0_24533.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00482: LMD* process TERMINATED WITH error
Sat Jan 28 06:11:41 2012
Errors IN file /export/home/OracleHomes/admin/jykfdb/bdump/jykfdb1_ckpt_24527.trc:
ORA-00482: LMD* process TERMINATED WITH error
Sat Jan 28 06:11:41 2012
Errors IN file /export/home/OracleHomes/admin/jykfdb/bdump/jykfdb1_mman_24509.trc:
ORA-00482: LMD* process TERMINATED WITH error
Sat Jan 28 06:11:42 2012
Errors IN file /export/home/OracleHomes/admin/jykfdb/bdump/jykfdb1_dbw0_24511.trc:
ORA-00482: LMD* process TERMINATED WITH error
Sat Jan 28 06:11:42 2012
Errors IN file /export/home/OracleHomes/admin/jykfdb/bdump/jykfdb1_lck0_24724.trc:
ORA-00482: LMD* process TERMINATED WITH error
Sat Jan 28 06:11:42 2012
Errors IN file /export/home/OracleHomes/admin/jykfdb/bdump/jykfdb1_lgwr_24517.trc:
ORA-00482: LMD* process TERMINATED WITH error

根据MOS描述,这个ORA-600的错误源于ORA-4031错误:ORA-600[kjrpar:noalloc] Error Raised in the LMD Process [ID 864377.1]。升级到11.2、11.1.0.7和10.2.0.5可以避免这个错误。
虽然导致错误的本质是ORA-4031,但是当前的这个错误引发了LMD进程的异常退出,并最终导致数据库的崩溃,因此对于当前的数据库而言,这个错误的严重程度甚至更甚于ORA-4031。
从数据库的初始化参数进行分析,这个数据库的共享池配置确实不足,因此在短时间无法升级的情况下,增大共享池的内存设置是目前避免这个问题的最好办法。

Posted in BUG | Tagged , , , , , | 1 Comment

ORA-600(kjbclose:L)错误

数据库关闭过程中出现这个错误。
错误信息如下:

Sat May 14 20:12:58 EAT 2011
Shutting down instance (immediate)
Sat May 14 20:12:58 EAT 2011
Shutting down instance: further logons disabled
Sat May 14 20:13:00 EAT 2011
Stopping background process CJQ0
Sat May 14 20:13:00 EAT 2011
Stopping background process QMNC
Sat May 14 20:13:01 EAT 2011
Stopping background process MMNL
Sat May 14 20:13:02 EAT 2011
Stopping background process MMON
License high water mark = 537
Sat May 14 20:13:04 EAT 2011
Job queue slave processes stopped
Waiting FOR dispatcher 'D000' TO shutdown
ALL dispatchers AND shared servers shutdown
Sat May 14 20:13:07 EAT 2011
ALTER DATABASE CLOSE NORMAL
Sat May 14 20:13:07 EAT 2011
SMON: disabling tx recovery
SMON: disabling cache recovery
Sat May 14 20:13:07 EAT 2011
Block CHANGE tracking service stopping.
Sat May 14 20:13:07 EAT 2011
Stopping background process CTWR
Sat May 14 20:13:12 EAT 2011
Shutting down archive processes
Archiving IS disabled
Sat May 14 20:13:17 EAT 2011
ARCH shutting down
ARC1: Archival stopped
Sat May 14 20:13:22 EAT 2011
ARCH shutting down
ARC0: Archival stopped
Sat May 14 20:13:23 EAT 2011
Thread 2 closed at log SEQUENCE 58833
Successful close OF redo thread 2
Sat May 14 20:14:49 EAT 2011
Completed: ALTER DATABASE CLOSE NORMAL
Sat May 14 20:14:49 EAT 2011
ALTER DATABASE DISMOUNT
Completed: ALTER DATABASE DISMOUNT
ARCH: Archival disabled due TO shutdown: 1089
Shutting down archive processes
Archiving IS disabled
Archive process shutdown avoided: 0 active
ARCH: Archival disabled due TO shutdown: 1089
Shutting down archive processes
Archiving IS disabled
Archive process shutdown avoided: 0 active
Sat May 14 20:14:52 EAT 2011
Errors IN file /oracle/app/admin/orcl/bdump/orcl2_lms4_27651.trc:
ORA-00600: internal error code, arguments: [kjbclose:L], [0xC00000091DF8B308], [], [], [], [], [], []
Sat May 14 20:14:54 EAT 2011
Errors IN file /oracle/app/admin/orcl/bdump/orcl2_lms5_27653.trc:
ORA-00600: internal error code, arguments: [kjbclose:L], [0xC000001124FB8998], [], [], [], [], [], []
Sat May 14 20:14:59 EAT 2011
Errors IN file /oracle/app/admin/orcl/bdump/orcl2_lms4_27651.trc:
ORA-00600: internal error code, arguments: [kjbclose:L], [0xC00000091DF8B308], [], [], [], [], [], []
Sat May 14 20:14:59 EAT 2011
LMS4: terminating instance due TO error 484
Sat May 14 20:15:01 EAT 2011
Shutting down instance (abort)
License high water mark = 537
Sat May 14 20:15:05 EAT 2011
Instance TERMINATED BY LMS4, pid = 27651
Sat May 14 20:15:06 EAT 2011
Instance TERMINATED BY USER, pid = 15724

虽然这个错误发生在系统进程LMS上,但是由于是在关闭数据库的阶段,因此可能是由于数据库关闭状态导致。
在MOS文档Bug 10209426 RAC instance crash due to ORA-600 [kjbclose:L] in LMS描述了这个bug,确认影响版本为10.2.0.4,而Oracle目前还没有确认解决这个bug的版本。而这个bug的产生可能与关闭远端并行服务通信产生的锁有关。
而对于当前的环境而言,本身就是在数据库的关闭阶段,因此即使产生了CRASH也并不会造成太严重的后果,无非是将SHUTDOWN IMMEDIATE升级为SHUTDOWN ABORT。
如果在其他情况下碰到这个错误,需要随时关注Oracle是否提供专门的PATCH解决这个问题。

Posted in BUG | Tagged , , | Leave a comment

Oracle SecureBackup管理员手册

和其他备份管理工具比较类似,绝大部分工作都是通过图形化界面配置。
如果对比其他几个备份工具看,SecureBackup和他们相比确实比较类似,实现的功能大同小异,菜单和界面的组织形式也非常类似。
可能对于日常监控、管理带库的操作者而言,这种图形化的工具相对比较方便,但是对于关注备份策略、备份存储以及备份效率的DBA来说,这种工具总感觉不是很顺手。当然还没有具体用过SecureBackup,应该是没有发言权的,不过从文档介绍的功能和图形化方式来看,似乎并没有比其他厂商提供的产品有什么显著的不同。

Posted in BOOKS | Leave a comment

一次客户数据库恢复的过程

前一段时间给一个客户恢复了一个数据库。
事情其实已经过去一段时间了,而且整个过程无非是一个数据库的基于时间的不完全恢复,从技术角度讲,没有什么太多可以值得描述的,而且由于所有的操作都是在客户的主机上进行,因此没有留下任何的痕迹,因此这件事情很难写成一篇技术相关的文章,所以也就一直没有写出来。
今天打算简单描述一下这个问题,主要是受Eygle的影响,Eygle最近刚刚写完一本新书《数据安全警示录》,我在帮他进行一些文字上的检查工作。在这本书中,Eygle用他帮别人恢复数据库的案例来警示大家,希望大家能利用别人的失败教训来警醒我们自己。
虽然我的这个案例在技术上并没有太多值得描述的,但是这个案例本身却很有借鉴意义,在这里和大家分享一下。
我们接到客户的请求是周五的下午,据说客户的数据库在周四夜里DOWN机,随后负责维护的人员对数据库进行了恢复,但是数据库丢失了部分数据,客户对于恢复的程度不认可,于是找到了我们。
开始以为只是由于不完全恢复的RESETLOGS导致了少量的数据丢失,可能需要通过类似LOGMINER的方法来看看能否多恢复一些数据,但是到了现场进行了简单的沟通后,发现我们之前得到的信息非常的不准确。
客户的数据库在周四夜里11点左右DOWN机,随后负责维护的人员对数据库进行了恢复,但是数据库最终只恢复到了周四上午10点左右,客户对于这种程度的恢复当然是不认可的,以致于最终对于现场负责维护的技术人员的不信任,最终找到了我们。
而对于现场环境的检查后,发现了很多的问题。
数据库DOWN机的原因是由于当前日志损坏,由于数据库的日志每组只有一个,这个当前日志的损坏不但导致数据库的DOWN机,而且会造成数据的丢失。但是损失仅此而已,只是当前日志中少部分已经COMMIT但是还没有写到数据文件的数据。对于这种情况,最差的情况是丢失一个小时的数据,也就是说到上一个日志切换时刻数据都是正常的。而对于当前日志丢失的情况,完全没有必要也没有理由去执行全库的恢复,只要清除当前损坏的日志就可以打开数据库。执行全库的恢复说明维护人员根本不理解Oracle的备份恢复机制,只是发现数据库无法正常打开,就去盲目的还原并恢复数据库,导致更多的问题产生。
维护人员水平不高,因此备份策略的不严谨以及备份和恢复过程的混乱也是意料之内的事情。首先,Oracle的备份策略存在比较严重的问题,客户的环境中有存储备份的带库,这对于数据库备份而言,应该是更安全的保障,但是备份策略的设置不但没有提高备份的安全性,反而使得备份的可用性下降。
客户的数据库每周执行一次全库备份,每天进行一次增量的备份,此外每天要进行多次的归档日志的备份。除了全库备份一定存储在带库上之外,增量备份和归档日志的备份有可能存储在带库上,也有可能存储在磁盘上,且没有任何一个位置上的备份是齐备的。以这次的问题为例,进行数据库的恢复首先需要恢复4天前带库上的全备,然后需要应用3天前带库上的增量,2天前磁盘上的增量,1天前带库上的增量,以及带库和本地磁盘上最近一天的归档备份。这种备份策略导致的一个问题就是,如果磁盘或带库任意出现一点损坏,就会导致数据库无法完全恢复。带库设备本来可以增加备份的可靠性,但是这种备份策略的设置使得备份的安全性和可靠性被降到最低。此外这种备份的存储和恢复也会给备份恢复的性能带来很大的影响。
除了备份策略的问题之外,操作过程也存在非常严重的问题。其中一点就是,数据库恢复过程的RMAN输出日志被删除。由于缺少当时恢复过程的日志,无法确定维护人员当时具体执行的操作是什么,所有的信息只能来自当事人的口述,本来口述这个方式就不靠谱,何况还是一个概念本身就不很清晰的人,而且问题产生的后果已经比较严重,很难说他在描述的过程中是否为自己开脱。总之,按照当事人的描述,恢复步骤应该是没有问题的,但是恢复后数据库中的数据只到早上10点左右,缺少了最后一天12个小时的数据。
最后维护人员还犯了一个更加严重的错误,第一次在进行数据库恢复的过程中,有一些带库上的归档日志文件被恢复到本地磁盘上。在数据库恢复完成后,出于空间的考虑,所有硬盘上恢复出来的归档日志被删除。但是由于删除的时候没有进行限定,删除操作删除了数据库硬盘上所有的归档,不仅包括恢复过程中从带库恢复到硬盘上的,还包括最新的几个还没有进行过备份的归档文件。这个删除的操作最终导致了数据库再次进行恢复时,丢失了最后两个小时的数据。
可以看到,原本一个只是损失几分钟数据,造成半个小时左右停机时间的故障,由于维护人员的错误,导致问题一步步放大,最终造成了损失2个小时左右的数据,且数据库停机一天半以上的重大事故。

Posted in ORACLE | Tagged , , , | Leave a comment

Oracle Database 9i/10g/11g编程艺术——深入数据库体系结构

周三的ACOUG大会上得到了Tom签名的这本书。
这本书的第一版《Oracle9i&10g编程艺术》在08年就看过了,不过Tom的这本第二版是11年1月出版的,而我居然在拿到手之后才知道,可谓是后知后觉了。对于Tom的书来说根本没有必要多做介绍,每一本都是经典,只要静下心去读就好了。
编程艺术给人感觉面向开发人员,不过里面包含的大部分内容对于DBA而言也是同样应该了解的,至少我当时就从第一版中获益匪浅。
《Export One on One Oracle》第一版是8i到9i版本,到第二版变成9i到10g版本。而这本书第一版是9i到10g的版本,到这一版已经是9i、10g和11g的版本了。这说明Tom本身也在不断学习,不断将最新的特性纳入到他的著作中。在12c很快就要面世的情况下,有一些国内的从业者却还没有用过11g,如果再不抓紧时间学习,那么技术真的是很快就会落伍了。

Posted in BOOKS | Leave a comment

DDL触发器设置导致DDL无法执行(二)

公司测试数据库发现执行DDL报错。
继续描述问题的诊断和解决。
DDL触发器设置导致DDL无法执行(一):http://yangtingkun.itpub.net/post/468/526410
从对象定义是找不到问题的原因的,只能通过10046跟踪一下:
SQL> ALTER SESSION SET EVENTS ‘10046 TRACE NAME CONTEXT FOREVER, LEVEL 12’;
Session altered.
SQL> CREATE TABLE T_CREATE (ID NUMBER);
CREATE TABLE T_CREATE (ID NUMBER)
*
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-30511: invalid DDL operation in system triggers
ORA-06512: at line 24
ORA-06512: at line 24
SQL> ALTER SESSION SET EVENTS ‘10046 TRACE NAME CONTEXT OFF’;
Session altered.

查询得到的TRACE文件:

*** ACTION NAME:() 2012-01-31 12:15:24.811
*** MODULE NAME:(sqlplus@hpserver2.enmotech.com (TNS V1-V3)) 2012-01-31 12:15:24.811
*** SERVICE NAME:(SYS$USERS) 2012-01-31 12:15:24.811
*** SESSION ID:(146.138) 2012-01-31 12:15:24.811
Skipped error 604 during the execution OF DBFW_CONSOLE_ACCESS.TRIGGER_LOGIN 
*** 2012-01-31 12:15:24.811
ksedmp: internal OR fatal error
ORA-00604: error occurred at recursive SQL level 1
ORA-30511: invalid DDL operation IN system triggers
ORA-06512: at line 3
*** 2012-01-31 12:26:10.929
*** ACTION NAME:() 2012-01-31 12:26:10.929
*** MODULE NAME:(SQL*Plus) 2012-01-31 12:26:10.929
A deadlock among DDL AND parse locks IS detected.
This deadlock IS usually due TO USER errors IN
the design OF an application OR FROM issuing a SET
OF concurrent statements which can cause a deadlock.
This should NOT be reported TO Oracle Support.
The following information may aid IN finding
the errors which cause the deadlock:
ORA-04020: deadlock detected while trying TO LOCK object EYGLE.BIN$trcEn8qthIjgQKjAEwAm+g==$0
--------------------------------------------------------
  object   waiting  waiting       blocking blocking
  handle   SESSION     LOCK mode   SESSION     LOCK mode
--------  -------- -------- ----  -------- -------- ----
0xb88364d8  0xb8f713c8 0xb56ca598    X  0xb8f713c8 0xb56ca7c0    X
--------------------------------------------------------
---------- DUMP OF WAITING AND BLOCKING LOCKS ----------
--------------------------------------------------------
------------- WAITING LOCK -------------
----------------------------------------
SO: 0xb56ca598, TYPE: 53, owner: 0xb73c2990, flag: INIT/-/-/0x00
LIBRARY OBJECT LOCK: LOCK=b56ca598 handle=b88364d8 request=X
CALL pin=(nil) SESSION pin=(nil) hpc=0000 hlc=0000
htl=0xb56ca618[0xb56ca840,0xb53b1e18] htb=0xb53b1e18 ssga=0xb53b13b0
USER=b8f713c8 SESSION=b8f713c8 COUNT=0 flags=[0000] SAVEPOINT=0x1494
LIBRARY OBJECT HANDLE: handle=b88364d8 mtx=0xb8836608(0) cdp=0
name=EYGLE.BIN$trcEn8qthIjgQKjAEwAm+g==$0 
hash=41d7b6fc85abfd51d2baef487c92dc95 TIMESTAMP=01-17-2012 17:33:09
namespace=TABL flags=KGHP/TIM/FUL/FUP/SML/[0e000000]
kkkk-dddd-llll=0000-0701-0701 LOCK=X pin=X latch#=3 hpc=01d6 hlc=01d6
lwt=0xb8836580[0xb56ca5c8,0xb56ca5c8] ltm=0xb8836590[0xb8836590,0xb8836590]
pwt=0xb8836548[0xb8836548,0xb8836548] ptm=0xb8836558[0xb8836558,0xb8836558]
REF=0xb88365b0[0xb88365b0,0xb88365b0] lnd=0xb88365c8[0xb88365c8,0xb88365c8]
  LIBRARY OBJECT: object=b48a8d18
  TYPE=TABL flags=EXS/LOC/UPD/PRG[0985] pflags=[0040] STATUS=VALD LOAD=0
  DATA BLOCKS:
  DATA#     heap  pointer    STATUS pins CHANGE whr
  ----- -------- -------- --------- ---- ------ ---
      0 b8836418 b48a8e30 I/P/A/-/-    0 NONE   00 
      8 b48a88a8 b03cdbf8 I/P/A/-/-    1 UPDATE 00 
      9 b48a8978 b03cd428 I/P/A/-/-    1 NONE   00 
     10 b48a8a00 b03cd040 I/P/A/-/-    1 UPDATE 00 
------------- BLOCKING LOCK ------------
----------------------------------------
SO: 0xb56ca7c0, TYPE: 53, owner: 0xb73c20f0, flag: INIT/-/-/0x00
LIBRARY OBJECT LOCK: LOCK=b56ca7c0 handle=b88364d8 mode=X
CALL pin=(nil) SESSION pin=(nil) hpc=0000 hlc=0000
htl=0xb56ca840[0xb53b1e18,0xb56ca618] htb=0xb53b1e18 ssga=0xb53b13b0
USER=b8f713c8 SESSION=b8f82a10 COUNT=2 flags=[0000] SAVEPOINT=0xd70
LIBRARY OBJECT HANDLE: handle=b88364d8 mtx=0xb8836608(0) cdp=0
name=EYGLE.BIN$trcEn8qthIjgQKjAEwAm+g==$0 
hash=41d7b6fc85abfd51d2baef487c92dc95 TIMESTAMP=01-17-2012 17:33:09
namespace=TABL flags=KGHP/TIM/FUL/FUP/SML/[0e000000]
kkkk-dddd-llll=0000-0701-0701 LOCK=X pin=X latch#=3 hpc=01d6 hlc=01d6
lwt=0xb8836580[0xb56ca5c8,0xb56ca5c8] ltm=0xb8836590[0xb8836590,0xb8836590]
pwt=0xb8836548[0xb8836548,0xb8836548] ptm=0xb8836558[0xb8836558,0xb8836558]
REF=0xb88365b0[0xb88365b0,0xb88365b0] lnd=0xb88365c8[0xb88365c8,0xb88365c8]
  LIBRARY OBJECT: object=b48a8d18
  TYPE=TABL flags=EXS/LOC/UPD/PRG[0985] pflags=[0040] STATUS=VALD LOAD=0
  DATA BLOCKS:
  DATA#     heap  pointer    STATUS pins CHANGE whr
  ----- -------- -------- --------- ---- ------ ---
      0 b8836418 b48a8e30 I/P/A/-/-    0 NONE   00 
      8 b48a88a8 b03cdbf8 I/P/A/-/-    1 UPDATE 00 
      9 b48a8978 b03cd428 I/P/A/-/-    1 NONE   00 
     10 b48a8a00 b03cd040 I/P/A/-/-    1 UPDATE 00 
--------------------------------------------------------
This LOCK request was aborted.
*** 2012-01-31 17:25:14.137
A deadlock among DDL AND parse locks IS detected.
This deadlock IS usually due TO USER errors IN
the design OF an application OR FROM issuing a SET
OF concurrent statements which can cause a deadlock.
This should NOT be reported TO Oracle Support.
The following information may aid IN finding
the errors which cause the deadlock:
ORA-04020: deadlock detected while trying TO LOCK object EYGLE.BIN$trcI7ykLAu7gQKjAEwAnkA==$0
--------------------------------------------------------
  object   waiting  waiting       blocking blocking
  handle   SESSION     LOCK mode   SESSION     LOCK mode
--------  -------- -------- ----  -------- -------- ----
0xb209bfc0  0xb8f713c8 0xb56c3aa0    X  0xb8f713c8 0xb56756f0    X
--------------------------------------------------------
---------- DUMP OF WAITING AND BLOCKING LOCKS ----------
--------------------------------------------------------
------------- WAITING LOCK -------------
----------------------------------------
SO: 0xb56c3aa0, TYPE: 53, owner: 0xb77d5078, flag: INIT/-/-/0x00
LIBRARY OBJECT LOCK: LOCK=b56c3aa0 handle=b209bfc0 request=X
CALL pin=(nil) SESSION pin=(nil) hpc=0000 hlc=0000
htl=0xb56c3b20[0xb5675770,0xb53b1b08] htb=0xb53b1b08 ssga=0xb53b13b0
USER=b8f713c8 SESSION=b8f713c8 COUNT=0 flags=[0000] SAVEPOINT=0x386e
LIBRARY OBJECT HANDLE: handle=b209bfc0 mtx=0xb209c0f0(0) cdp=0
name=EYGLE.BIN$trcI7ykLAu7gQKjAEwAnkA==$0 
hash=f2ab522fb35d982d1fbc28237e665464 TIMESTAMP=01-17-2012 17:34:17
namespace=TABL flags=KGHP/TIM/FUP/SML/[0a000000]
kkkk-dddd-llll=0000-0701-0701 LOCK=X pin=X latch#=2 hpc=0038 hlc=0038
lwt=0xb209c068[0xb56c3ad0,0xb56c3ad0] ltm=0xb209c078[0xb209c078,0xb209c078]
pwt=0xb209c030[0xb209c030,0xb209c030] ptm=0xb209c040[0xb209c040,0xb209c040]
REF=0xb209c098[0xb209c098,0xb209c098] lnd=0xb209c0b0[0xb209c0b0,0xb209c0b0]
  LIBRARY OBJECT: object=b0c49fd8
  TYPE=TABL flags=EXS/LOC/UPD/PRG[0985] pflags=[0040] STATUS=VALD LOAD=0
  DATA BLOCKS:
  DATA#     heap  pointer    STATUS pins CHANGE whr
  ----- -------- -------- --------- ---- ------ ---
      0 b209bf00 b0c4a0f0 I/P/A/-/-    0 NONE   00 
      8 b0c49b68 b0201480 I/P/A/-/-    1 UPDATE 00 
      9 b0c49c38 b0200cb0 I/P/A/-/-    1 NONE   00 
     10 b0c49cc0 b01af398 I/P/A/-/-    1 UPDATE 00 
------------- BLOCKING LOCK ------------
----------------------------------------
SO: 0xb56756f0, TYPE: 53, owner: 0xb77d47d8, flag: INIT/-/-/0x00
LIBRARY OBJECT LOCK: LOCK=b56756f0 handle=b209bfc0 mode=X
CALL pin=(nil) SESSION pin=(nil) hpc=0000 hlc=0000
htl=0xb5675770[0xb53b1b08,0xb56c3b20] htb=0xb53b1b08 ssga=0xb53b13b0
USER=b8f713c8 SESSION=b8f6fe60 COUNT=2 flags=[0000] SAVEPOINT=0x35eb
LIBRARY OBJECT HANDLE: handle=b209bfc0 mtx=0xb209c0f0(0) cdp=0
name=EYGLE.BIN$trcI7ykLAu7gQKjAEwAnkA==$0 
hash=f2ab522fb35d982d1fbc28237e665464 TIMESTAMP=01-17-2012 17:34:17
namespace=TABL flags=KGHP/TIM/FUP/SML/[0a000000]
kkkk-dddd-llll=0000-0701-0701 LOCK=X pin=X latch#=2 hpc=0038 hlc=0038
lwt=0xb209c068[0xb56c3ad0,0xb56c3ad0] ltm=0xb209c078[0xb209c078,0xb209c078]
pwt=0xb209c030[0xb209c030,0xb209c030] ptm=0xb209c040[0xb209c040,0xb209c040]
REF=0xb209c098[0xb209c098,0xb209c098] lnd=0xb209c0b0[0xb209c0b0,0xb209c0b0]
  LIBRARY OBJECT: object=b0c49fd8
  TYPE=TABL flags=EXS/LOC/UPD/PRG[0985] pflags=[0040] STATUS=VALD LOAD=0
  DATA BLOCKS:
  DATA#     heap  pointer    STATUS pins CHANGE whr
  ----- -------- -------- --------- ---- ------ ---
      0 b209bf00 b0c4a0f0 I/P/A/-/-    0 NONE   00 
      8 b0c49b68 b0201480 I/P/A/-/-    1 UPDATE 00 
      9 b0c49c38 b0200cb0 I/P/A/-/-    1 NONE   00 
     10 b0c49cc0 b01af398 I/P/A/-/-    1 UPDATE 00 
--------------------------------------------------------
This LOCK request was aborted.
*** 2012-01-31 17:39:16.132
WAIT #2: nam='SQL*Net message to client' ela= 3 driver id=1650815232 #bytes=1 p3=0 obj#=37 tim=1296877691535577
*** 2012-01-31 17:40:12.165
WAIT #2: nam='SQL*Net message from client' ela= 54719914 driver id=1650815232 #bytes=1 p3=0 obj#=37 tim=1296877746255669
XCTEND rlbk=0, rd_only=1
=====================
PARSING IN CURSOR #10 len=33 dep=0 uid=60 oct=1 lid=60 tim=1296877746256995 hv=4051323229 ad='b2088510'
CREATE TABLE T_CREATE (ID NUMBER)
END OF STMT
PARSE #10:c=0,e=885,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=1296877746256954
BINDS #10:
=====================
.
.
.
=====================
PARSING IN CURSOR #13 len=753 dep=1 uid=58 oct=47 lid=58 tim=1296877746275168 hv=2905514213 ad='b8852970'
DECLARE
  li ora_name_list_t;
  n INTEGER;
  cSQL CLOB;
BEGIN
  IF dbfw_console_access.is_local THEN
    IF ora_dict_obj_type = 'TABLE' AND
       ora_dict_obj_owner = 'DBFW_CONSOLE_ACCESS' AND
       ora_dict_obj_name = 'EVENT' THEN
      NULL;
    ELSE
      n := ora_sql_txt(li);
      FOR i IN 1 .. n loop
        cSQL := cSQL || li(i);
      END loop;
      IF cSQL IS NOT NULL THEN
        INSERT INTO dbfw_console_access.event(id,username,sessionid,event,text)
          SELECT dbfw_console_access.event_seq.nextval,
                 sys_context('USERENV','SESSION_USER'),
                 sys_context('USERENV','SESSIONID'),
                 'DDL',
                 cSQL
            FROM dual;
        END IF;
    END IF;
  END IF;
END;
END OF STMT
PARSE #13:c=0,e=20,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=1296877746275166
BINDS #13:
=====================
PARSING IN CURSOR #5 len=213 dep=2 uid=58 oct=2 lid=58 tim=1296877746275611 hv=1466247469 ad='b88468c0'
INSERT INTO DBFW_CONSOLE_ACCESS.EVENT(ID,USERNAME,SESSIONID,EVENT,TEXT) SELECT DBFW_CONSOLE_ACCESS.EVENT_SEQ.NEXTVAL, SYS_CONTEXT('USERENV','SESSION_USER'), 
SYS_CONTEXT('USERENV','SESSIONID'), 'DDL', :B1 FROM DUAL
END OF STMT
PARSE #5:c=0,e=17,p=0,cr=0,cu=0,mis=0,r=0,dep=2,og=1,tim=1296877746275608
BINDS #5:
kkscoacd
 Bind#0
  oacdty=112 mxl=4000(4000) mxlc=00 mal=00 scl=00 pre=00
  oacflg=01 fl2=206001 frm=01 csi=873 siz=4000 off=0
  kxsbbbfp=7fa2b3cf99c0  bln=4000  avl=40  flg=05
  VALUE=
Dump OF memory FROM 0x00007FA2B3CF99C0 TO 0x00007FA2B3CF99E8
7FA2B3CF99C0 01002600 03800802 00000200 0092008A  [.&..............]
7FA2B3CF99D0 08000000 17006903 01000000 00000000  [.....i..........]
7FA2B3CF99E0 F9D69D36 5422DE81                    [6....."T]        
=====================
.
.
.
=====================
PARSING IN CURSOR #4 len=57 dep=3 uid=0 oct=12 lid=0 tim=1296877746278076 hv=0 ad='b40eead8'
drop table "EYGLE"."BIN$trcI7ykLAu7gQKjAEwAnkA==$0" purge
END OF STMT
PARSE #4:c=0,e=158,p=0,cr=0,cu=0,mis=1,r=0,dep=3,og=4,tim=1296877746278074
BINDS #4:
=====================
.
.
.
=====================
PARSING IN CURSOR #7 len=753 dep=4 uid=58 oct=47 lid=58 tim=1296877746356839 hv=2905514213 ad='b8852970'
declare
  li ora_name_list_t;
  n integer;
  cSQL clob;
begin
  if dbfw_console_access.is_local then
    if ora_dict_obj_type = 'TABLE' and
       ora_dict_obj_owner = 'DBFW_CONSOLE_ACCESS' and
       ora_dict_obj_name = 'EVENT' then
      null;
    else
      n := ora_sql_txt(li);
      for i in 1 .. n loop
        cSQL := cSQL || li(i);
      end loop;
      if cSQL is not null then
        insert into dbfw_console_access.event(id,username,sessionid,event,text)
          select dbfw_console_access.event_seq.nextval,
                 sys_context('USERENV','SESSION_USER'),
                 sys_context('USERENV','SESSIONID'),
                 'DDL',
                 cSQL
            from dual;
        end if;
    end if;
  end if;
end;
END OF STMT
PARSE #7:c=0,e=26,p=0,cr=0,cu=0,mis=0,r=0,dep=4,og=1,tim=1296877746356836
BINDS #7:
=====================
PARSING IN CURSOR #14 len=213 dep=5 uid=58 oct=2 lid=58 tim=1296877746357237 hv=1466247469 ad='b88468c0'
INSERT INTO DBFW_CONSOLE_ACCESS.EVENT(ID,USERNAME,SESSIONID,EVENT,TEXT) SELECT DBFW_CONSOLE_ACCESS.EVENT_SEQ.NEXTVAL, SYS_CONTEXT('USERENV','SESSION_USER'), 
SYS_CONTEXT('USERENV','SESSIONID'), 'DDL', :B1 FROM DUAL
END OF STMT
PARSE #14:c=0,e=15,p=0,cr=0,cu=0,mis=0,r=0,dep=5,og=1,tim=1296877746357234
BINDS #14:
kkscoacd
 Bind#0
  oacdty=112 mxl=4000(4000) mxlc=00 mal=00 scl=00 pre=00
  oacflg=01 fl2=206001 frm=01 csi=873 siz=4000 off=0
  kxsbbbfp=7fa2b3d2d680  bln=4000  avl=40  flg=05
  value=
Dump of memory from 0x00007FA2B3D2D680 to 0x00007FA2B3D2D6A8
7FA2B3D2D680 01002600 03800802 00000200 0092008A  [.&..............]
7FA2B3D2D690 0A000000 31006903 01000000 00000000  [.....i.1........]
7FA2B3D2D6A0 30937991 2DB5F5E9                    [.y.0...-]        
=====================
.
.
.
=====================
PARSING IN CURSOR #11 len=51 dep=7 uid=0 oct=10 lid=0 tim=1296877746421849 hv=0 ad='b40e3b50'
drop index "EYGLE"."BIN$trcI7ykNAu7gQKjAEwAnkA==$0"
END OF STMT
PARSE #11:c=5000,e=19651,p=1,cr=66,cu=0,mis=1,r=0,dep=7,og=4,tim=1296877746421846
BINDS #11:
EXEC #1:c=10999,e=55067,p=2,cr=154,cu=18,mis=0,r=0,dep=6,og=4,tim=1296877746421937
ERROR #1:err=30511 tim=793245673
STAT #12 id=1 cnt=1 pid=0 pos=1 obj=0 op='NESTED LOOPS  (cr=12 pr=0 pw=0 time=184 us)'
STAT #12 id=2 cnt=1 pid=1 pos=1 obj=32 op='TABLE ACCESS BY INDEX ROWID CCOL$ (cr=4 pr=0 pw=0 time=33 us)'
STAT #12 id=3 cnt=1 pid=2 pos=1 obj=54 op='INDEX RANGE SCAN I_CCOL1 (cr=3 pr=0 pw=0 time=19 us)'
STAT #12 id=4 cnt=1 pid=1 pos=2 obj=21 op='TABLE ACCESS BY INDEX ROWID COL$ (cr=8 pr=0 pw=0 time=149 us)'
STAT #12 id=5 cnt=1 pid=4 pos=1 obj=47 op='INDEX UNIQUE SCAN I_COL3 (cr=4 pr=0 pw=0 time=79 us)'
STAT #2 id=1 cnt=2 pid=0 pos=1 obj=0 op='SORT ORDER BY (cr=3 pr=0 pw=0 time=67 us)'
STAT #2 id=2 cnt=3 pid=1 pos=1 obj=702 op='TABLE ACCESS FULL RECYCLEBIN$ (cr=3 pr=0 pw=0 time=40 us)'
EXEC #14:c=17998,e=65066,p=2,cr=373,cu=57,mis=0,r=0,dep=5,og=1,tim=1296877746422333
ERROR #14:err=604 tim=793245673
EXEC #7:c=17998,e=65535,p=2,cr=374,cu=82,mis=0,r=0,dep=4,og=1,tim=1296877746422463
ERROR #7:err=30511 tim=793245673
XCTEND rlbk=1, rd_only=1
STAT #3 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT ORDER BY (cr=3 pr=0 pw=0 time=75 us)'
STAT #3 id=2 cnt=3 pid=1 pos=1 obj=702 op='TABLE ACCESS FULL RECYCLEBIN$ (cr=3 pr=0 pw=0 time=48 us)'
EXEC #5:c=73990,e=147097,p=5,cr=882,cu=104,mis=0,r=0,dep=2,og=1,tim=1296877746422739
ERROR #5:err=604 tim=793245673
EXEC #13:c=73990,e=147555,p=5,cr=883,cu=130,mis=0,r=0,dep=1,og=1,tim=1296877746422827
ERROR #13:err=30511 tim=793245673
WAIT #10: nam='log file sync' ela= 76283 buffer#=2642 p2=4987314 p3=0 obj#=-1 tim=1296877746499472
WAIT #10: nam='SQL*Net break/reset to client' ela= 3 driver id=1650815232 break?=1 p3=0 obj#=-1 tim=1296877746499549
WAIT #10: nam='SQL*Net break/reset to client' ela= 92 driver id=1650815232 break?=0 p3=0 obj#=-1 tim=1296877746499674
WAIT #10: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1296877746499688
*** 2012-01-31 17:40:22.304
WAIT #10: nam='SQL*Net message from client' ela= 9656896 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1296877756156629
=====================
PARSING IN CURSOR #19 len=55 dep=0 uid=60 oct=42 lid=60 tim=1296877756157111 hv=524428051 ad='0'
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF'
END OF STMT
PARSE #19:c=0,e=198,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,tim=1296877756157104
EXEC #19:c=0,e=83,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,tim=1296877756157288

不难看出,导致问题产生的原因是由于在执行了CREATE TABLE语句后,触发器引发了删除回收站中的索引和表的DDL,而这个DDL又会再次引发触发器的触发,从而在一个会话中引入了死锁。
不过为什么触发器会导致回收站中的索引和表的DROP动作呢,查询后发现,原来触发器的触发动作是一个INSERT语句,而当前表中数据已满,需要扩展新的空间,而恰好这个对象所在的表空间也没有剩余空间了,所以Oracle不得以要清除RECYCLEBIN中的对象来释放新的空间:

SQL> SELECT TABLE_NAME, tablespace_name FROM dba_tables 
2 WHERE TABLE_NAME = 'EVENT'
3 AND OWNER = 'DBFW_CONSOLE_ACCESS';
TABLE_NAME                     TABLESPACE_NAME
------------------------------ ------------------------------
EVENT                          USERS
SQL> SELECT SUM(BYTES)/1024/1024 FROM DBA_FREE_SPACE WHERE TABLESPACE_NAME = 'USERS';
SUM(BYTES)/1024/1024
--------------------

找到问题的原因解决起来就很简单了,不过由于DDL触发器会导致INSERT,因此除了PURGE之外,所有其他的尝试都会报错:

SQL> conn / AS sysdba
Connected.
SQL> purge TABLE eygle.t_pwd;
TABLE purged.
SQL> SELECT * FROM dba_recyclebin;
no ROWS selected
SQL> CREATE TABLE T_CREATE (ID NUMBER);
TABLE created.

好在PURGE语句没有导致触发器的触发,从而避免了错误的产生,如果PURGE同样导致触发器运行,或者RECYCLEBIN中没有对象可以清除,那么当前的问题可能就很难用常规手段解决了。
这也说明一个问题,在DDL触发器中执行的操作一定要谨慎,否则很容易导致系统的故障。

Posted in ORACLE | Tagged , , , | Leave a comment