缺少归档导致不完整恢复

简单介绍一次客户由于缺少归档造成数据丢失的案例。
分享一个案例:客户的一个测试数据库要迁移到另一台服务器上,客户DBA在将数据库备份到带库后,对原数据库服务器进行了格式化。
当DBA在目标服务器上进行数据库的恢复时发现了问题。数据库的备份虽然成功完成,但是随后的归档日志的备份由于报错失败了。现在只有一个全库的热备份,而没有任何归档日志的备份。目前已经不是丢失数据的问题了,而是缺少一致性的备份,数据库根本无法打开。
当然这个案例的恢复对于我们来说不算什么困难的事情,全库恢复后,直接RECOVER UNTIL CANCEL,尝试直接OPEN RESETLOGS,会出现错误:

SQL> conn / AS sysdba
Connected.
SQL> ALTER DATABASE OPEN READ ONLY;
ALTER DATABASE OPEN READ ONLY
*
ERROR at line 1:
ORA-16004: backup DATABASE requires recovery
ORA-01152: file 1 was NOT restored FROM a sufficiently OLD backup
ORA-01110: DATA file 1: '/oradata/orcl/system01.dbf'

通过添加“_allow_resetlogs_corruption”=true,然后RECOVER DATABASE USING BACKUP CONTROLFILE UNTIL CANCEL;最后通过ALTER DATABASE OPEN RESETLOGS打开。
数据库打开后,很快就会由于尝试恢复UNDO中的信息导致数据库的再次DOWN掉:

Mon Jan 16 14:27:05 2012
Errors IN file /oracle/admin/orcl/bdump/orcl_smon_53215672.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Opening WITH internal Resource Manager plan
WHERE NUMA PG = 1, CPUs = 48
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started WITH pid=40, OS id=40239468
Mon Jan 16 14:27:06 2012
ORACLE Instance orcl (pid = 13) - Error 600 encountered while recovering TRANSACTION (9, 7).
Mon Jan 16 14:27:06 2012
Errors IN file /oracle/admin/orcl/bdump/orcl_smon_53215672.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Mon Jan 16 14:27:07 2012
ORA-01555 caused BY SQL statement below (SQL ID: 5wc2915k44m38, Query Duration=0 sec, SCN: 0x0000.003bd268):
Mon Jan 16 14:27:07 2012
SELECT USER#,TYPE# FROM USER$ WHERE name=:1
Mon Jan 16 14:27:07 2012
LOGSTDBY: Validating controlfile WITH logical metadata
Mon Jan 16 14:27:07 2012
LOGSTDBY: Validation complete
Completed: ALTER DATABASE OPEN resetlogs
Mon Jan 16 14:32:12 2012
Errors IN file /oracle/admin/orcl/bdump/orcl_smon_53215672.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Mon Jan 16 14:32:13 2012
ORACLE Instance orcl (pid = 13) - Error 600 encountered while recovering TRANSACTION (9, 7).
Mon Jan 16 14:32:13 2012
Errors IN file /oracle/admin/orcl/bdump/orcl_smon_53215672.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Mon Jan 16 14:37:15 2012
Errors IN file /oracle/admin/orcl/bdump/orcl_smon_53215672.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Mon Jan 16 14:37:16 2012
ORACLE Instance orcl (pid = 13) - Error 600 encountered while recovering TRANSACTION (9, 7).
Mon Jan 16 14:37:16 2012
Errors IN file /oracle/admin/orcl/bdump/orcl_smon_53215672.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Mon Jan 16 14:42:17 2012
Errors IN file /oracle/admin/orcl/bdump/orcl_smon_53215672.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Mon Jan 16 14:42:18 2012
ORACLE Instance orcl (pid = 13) - Error 600 encountered while recovering TRANSACTION (9, 7).
Mon Jan 16 14:42:18 2012
Errors IN file /oracle/admin/orcl/bdump/orcl_smon_53215672.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Mon Jan 16 14:47:19 2012
Errors IN file /oracle/admin/orcl/bdump/orcl_smon_53215672.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Mon Jan 16 14:47:20 2012
Errors IN file /oracle/admin/orcl/bdump/orcl_pmon_58655160.trc:
ORA-00474: SMON process TERMINATED WITH error
Mon Jan 16 14:47:20 2012
PMON: terminating instance due TO error 474
Instance TERMINATED BY PMON, pid = 58655160

对于这个错误,可以通过设置EVENTS 10513来避免后台PMON进程进行数据库的事务回滚,也可以通过设置_CORRUPTED_ROLLBACK_SEGMENTS来避免这个错误的产生。
再次重启后,通过EXP导出全库,重建数据库后导入即可。
其实这里想要探讨的并非是恢复技术本身,而是备份策略以及备份有效性检查的重要性。DBA在备份完成后,只需要简单的看一下备份输出的LOG文件,就可以马上判断备份是否完整,可惜的是,就是这个简单的确认操作没有进行,导致了最终重启数据库且丢失数据的损失。

This entry was posted in ORACLE and tagged , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *