RAC环境启动单实例报错ORA-1105

客户的4节点RAC环境，其中一个节点实例出现故障，发现无法正常启动。
检查CLUSTER和告警日志信息，发现节点1心跳超时，被踢出集群。服务器重新启动后，数据库实例没有自动启动。
告警日志信息为：

Mon Apr 16 03:42:39 2012
Thread 1 advanced TO log SEQUENCE 22348 (LGWR switch)
CURRENT log# 16 seq# 22348 mem# 0: +DATA/orcl/onlinelog/group_16.291.766326571
CURRENT log# 16 seq# 22348 mem# 1: +DATA/orcl/onlinelog/group_16.293.766330969
Mon Apr 16 15:02:58 2012
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Interface TYPE 1 eth1 10.0.0.0 configured FROM OCR FOR USE AS a cluster interconnect
Interface TYPE 1 eth0 192.168.1.0 configured FROM OCR FOR USE AS a public interface
Picked latch-free SCN scheme 3
Autotune OF undo retention IS turned ON.
LICENSE_MAX_USERS = 0
SYS auditing IS disabled
ksdpec: called FOR event 13740 prior TO event GROUP initialization
Starting up ORACLE RDBMS Version: 10.2.0.4.0.
System parameters WITH non-DEFAULT VALUES:
processes = 1500
sessions = 1655
sga_max_size = 19830669312
pre_page_sga = FALSE
lock_sga = FALSE
__shared_pool_size = 3674210304
__large_pool_size = 16777216
__java_pool_size = 16777216
__streams_pool_size = 33554432
spfile = +DATA/orcl/spfileorcl.ora
sga_target = 19830669312
control_files = +DATA/orcl/controlfile/CURRENT.274.720740395
db_block_size = 8192
__db_cache_size = 16072572928
compatible = 10.2.0.3.0
log_archive_dest_1 = LOCATION=+DATA/
log_archive_format = %t_%s_%r.dbf
db_file_multiblock_read_count= 16
cluster_database = TRUE
cluster_database_instances= 4
db_create_file_dest = +DATA
_gc_affinity_time = 0
_gc_affinity_limit = 10000000
_gc_affinity_minimum = 10000000
thread = 1
instance_number = 1
undo_management = AUTO
undo_tablespace = UNDOTBS1
remote_login_passwordfile= EXCLUSIVE
db_domain =
dispatchers = (PROTOCOL=TCP) (SERVICE=orclXDB)
local_listener = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.21)(PORT = 1521))
remote_listener = LISTENERS_ORCL
job_queue_processes = 20
cursor_sharing = FORCE
background_dump_dest = /u01/app/oracle/admin/orcl/bdump
user_dump_dest = /u01/app/oracle/admin/orcl/udump
core_dump_dest = /u01/app/oracle/admin/orcl/cdump
audit_file_dest = /u01/app/oracle/admin/orcl/adump
db_name = orcl
open_cursors = 1000
pga_aggregate_target = 5872025600
Cluster communication IS configured TO USE the following interface(s) FOR this instance
10.0.0.11
Mon Apr 16 15:02:59 2012
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
PMON started WITH pid=2, OS id=20588
DIAG started WITH pid=3, OS id=20590
PSP0 started WITH pid=4, OS id=20592
LMON started WITH pid=5, OS id=20594
LMD0 started WITH pid=6, OS id=20596
LMS0 started WITH pid=7, OS id=20603
LMS1 started WITH pid=8, OS id=20607
LMS2 started WITH pid=9, OS id=20611
LMS3 started WITH pid=10, OS id=20615
MMAN started WITH pid=11, OS id=20619
DBW0 started WITH pid=12, OS id=20621
DBW1 started WITH pid=13, OS id=20623
LGWR started WITH pid=14, OS id=20625
CKPT started WITH pid=15, OS id=20627
SMON started WITH pid=16, OS id=20629
RECO started WITH pid=17, OS id=20631
CJQ0 started WITH pid=18, OS id=20633
MMON started WITH pid=19, OS id=20635
Mon Apr 16 15:03:00 2012
starting up 1 dispatcher(s) FOR network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
MMNL started WITH pid=20, OS id=20637
Mon Apr 16 15:03:00 2012
starting up 1 shared server(s) ...
Mon Apr 16 15:03:02 2012
lmon registered WITH NM - instance id 1 (internal mem no 0)
Mon Apr 16 15:03:05 2012
Reconfiguration started (OLD inc 0, NEW inc 30)
List OF nodes:
0 1 2 3
Global Resource Directory frozen
* allocate DOMAIN 0, invalid = TRUE
Communication channels reestablished
* DOMAIN 0 valid according TO instance 3
* DOMAIN 0 valid = 1 according TO instance 1
Mon Apr 16 15:03:05 2012
Master broadcasted resource hash VALUE bitmaps
Non-LOCAL Process blocks cleaned OUT
Mon Apr 16 15:03:05 2012
LMS 0: 0 GCS shadows cancelled, 0 closed
Mon Apr 16 15:03:05 2012
LMS 1: 0 GCS shadows cancelled, 0 closed
Mon Apr 16 15:03:05 2012
LMS 2: 0 GCS shadows cancelled, 0 closed
Mon Apr 16 15:03:05 2012
LMS 3: 0 GCS shadows cancelled, 0 closed
SET master node info
Submitted ALL remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
ALL grantable enqueues GRANTED
Mon Apr 16 15:03:07 2012
LMS 2: 0 GCS shadows traversed, 0 replayed
Mon Apr 16 15:03:07 2012
LMS 1: 0 GCS shadows traversed, 0 replayed
Mon Apr 16 15:03:07 2012
LMS 3: 0 GCS shadows traversed, 0 replayed
Mon Apr 16 15:03:07 2012
LMS 0: 0 GCS shadows traversed, 0 replayed
Mon Apr 16 15:03:07 2012
Submitted ALL GCS remote-cache requests
Post SMON TO START 1st pass IR
Fix WRITE IN gcs resources
Reconfiguration complete
LCK0 started WITH pid=23, OS id=20699
Mon Apr 16 15:03:12 2012
ALTER DATABASE MOUNT
Mon Apr 16 15:03:12 2012
Starting background process ASMB
ASMB started WITH pid=25, OS id=20710
Starting background process RBAL
RBAL started WITH pid=26, OS id=20714
Mon Apr 16 15:03:17 2012
SUCCESS: diskgroup DATA was mounted
Mon Apr 16 15:03:21 2012
Setting recovery target incarnation TO 2
Mon Apr 16 15:03:21 2012
SUCCESS: diskgroup DATA was dismounted
Mon Apr 16 15:03:21 2012
ORA-1105 signalled during: ALTER DATABASE MOUNT...

Mon Apr 16 03:42:39 2012 Thread 1 advanced to log sequence 22348 (LGWR switch) Current log# 16 seq# 22348 mem# 0: +DATA/orcl/onlinelog/group_16.291.766326571 Current log# 16 seq# 22348 mem# 1: +DATA/orcl/onlinelog/group_16.293.766330969 Mon Apr 16 15:02:58 2012 Starting ORACLE instance (normal) LICENSE_MAX_SESSION = 0 LICENSE_SESSIONS_WARNING = 0 Interface type 1 eth1 10.0.0.0 configured from OCR for use as a cluster interconnect Interface type 1 eth0 192.168.1.0 configured from OCR for use as a public interface Picked latch-free SCN scheme 3 Autotune of undo retention is turned on. LICENSE_MAX_USERS = 0 SYS auditing is disabled ksdpec: called for event 13740 prior to event group initialization Starting up ORACLE RDBMS Version: 10.2.0.4.0. System parameters with non-default values: processes = 1500 sessions = 1655 sga_max_size = 19830669312 pre_page_sga = FALSE lock_sga = FALSE __shared_pool_size = 3674210304 __large_pool_size = 16777216 __java_pool_size = 16777216 __streams_pool_size = 33554432 spfile = +DATA/orcl/spfileorcl.ora sga_target = 19830669312 control_files = +DATA/orcl/controlfile/current.274.720740395 db_block_size = 8192 __db_cache_size = 16072572928 compatible = 10.2.0.3.0 log_archive_dest_1 = LOCATION=+DATA/ log_archive_format = %t_%s_%r.dbf db_file_multiblock_read_count= 16 cluster_database = TRUE cluster_database_instances= 4 db_create_file_dest = +DATA _gc_affinity_time = 0 _gc_affinity_limit = 10000000 _gc_affinity_minimum = 10000000 thread = 1 instance_number = 1 undo_management = AUTO undo_tablespace = UNDOTBS1 remote_login_passwordfile= EXCLUSIVE db_domain = dispatchers = (PROTOCOL=TCP) (SERVICE=orclXDB) local_listener = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.21)(PORT = 1521)) remote_listener = LISTENERS_ORCL job_queue_processes = 20 cursor_sharing = FORCE background_dump_dest = /u01/app/oracle/admin/orcl/bdump user_dump_dest = /u01/app/oracle/admin/orcl/udump core_dump_dest = /u01/app/oracle/admin/orcl/cdump audit_file_dest = /u01/app/oracle/admin/orcl/adump db_name = orcl open_cursors = 1000 pga_aggregate_target = 5872025600 Cluster communication is configured to use the following interface(s) for this instance 10.0.0.11 Mon Apr 16 15:02:59 2012 cluster interconnect IPC version:Oracle UDP/IP (generic) IPC Vendor 1 proto 2 PMON started with pid=2, OS id=20588 DIAG started with pid=3, OS id=20590 PSP0 started with pid=4, OS id=20592 LMON started with pid=5, OS id=20594 LMD0 started with pid=6, OS id=20596 LMS0 started with pid=7, OS id=20603 LMS1 started with pid=8, OS id=20607 LMS2 started with pid=9, OS id=20611 LMS3 started with pid=10, OS id=20615 MMAN started with pid=11, OS id=20619 DBW0 started with pid=12, OS id=20621 DBW1 started with pid=13, OS id=20623 LGWR started with pid=14, OS id=20625 CKPT started with pid=15, OS id=20627 SMON started with pid=16, OS id=20629 RECO started with pid=17, OS id=20631 CJQ0 started with pid=18, OS id=20633 MMON started with pid=19, OS id=20635 Mon Apr 16 15:03:00 2012 starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'... MMNL started with pid=20, OS id=20637 Mon Apr 16 15:03:00 2012 starting up 1 shared server(s) ... Mon Apr 16 15:03:02 2012 lmon registered with NM - instance id 1 (internal mem no 0) Mon Apr 16 15:03:05 2012 Reconfiguration started (old inc 0, new inc 30) List of nodes: 0 1 2 3 Global Resource Directory frozen * allocate domain 0, invalid = TRUE Communication channels reestablished * domain 0 valid according to instance 3 * domain 0 valid = 1 according to instance 1 Mon Apr 16 15:03:05 2012 Master broadcasted resource hash value bitmaps Non-local Process blocks cleaned out Mon Apr 16 15:03:05 2012 LMS 0: 0 GCS shadows cancelled, 0 closed Mon Apr 16 15:03:05 2012 LMS 1: 0 GCS shadows cancelled, 0 closed Mon Apr 16 15:03:05 2012 LMS 2: 0 GCS shadows cancelled, 0 closed Mon Apr 16 15:03:05 2012 LMS 3: 0 GCS shadows cancelled, 0 closed Set master node info Submitted all remote-enqueue requests Dwn-cvts replayed, VALBLKs dubious All grantable enqueues granted Mon Apr 16 15:03:07 2012 LMS 2: 0 GCS shadows traversed, 0 replayed Mon Apr 16 15:03:07 2012 LMS 1: 0 GCS shadows traversed, 0 replayed Mon Apr 16 15:03:07 2012 LMS 3: 0 GCS shadows traversed, 0 replayed Mon Apr 16 15:03:07 2012 LMS 0: 0 GCS shadows traversed, 0 replayed Mon Apr 16 15:03:07 2012 Submitted all GCS remote-cache requests Post SMON to start 1st pass IR Fix write in gcs resources Reconfiguration complete LCK0 started with pid=23, OS id=20699 Mon Apr 16 15:03:12 2012 ALTER DATABASE MOUNT Mon Apr 16 15:03:12 2012 Starting background process ASMB ASMB started with pid=25, OS id=20710 Starting background process RBAL RBAL started with pid=26, OS id=20714 Mon Apr 16 15:03:17 2012 SUCCESS: diskgroup DATA was mounted Mon Apr 16 15:03:21 2012 Setting recovery target incarnation to 2 Mon Apr 16 15:03:21 2012 SUCCESS: diskgroup DATA was dismounted Mon Apr 16 15:03:21 2012 ORA-1105 signalled during: ALTER DATABASE MOUNT...

这个ORA-1105错误只是说明当前实例的某些参数设置和RAC其他实例设置的不符，并不能说明导致错误的真正原因。
为了找到问题只有手工启动实例：

[oracle@rac1 ~]$ sqlplus / AS sysdba
SQL*Plus: Release 10.2.0.4.0 - Production ON 星期一 4月 16 17:04:54 2012
Copyright (c) 1982, 2007, Oracle. ALL Rights Reserved.
Connected TO an idle instance.
SQL> STARTUP MOUNT
ORACLE instance started.
Total System Global Area 1.9831E+10 bytes
Fixed SIZE 2119216 bytes
Variable SIZE 3741321680 bytes
DATABASE Buffers 1.6073E+10 bytes
Redo Buffers 14655488 bytes
ORA-01105: mount IS incompatible WITH mounts BY other instances
ORA-01606: gc_files_to_locks NOT identical TO that OF another mounted instance

通过手工执行，可以了解具体导致错误产生的原因。不过gc_files_to_locks并没有设置为不同的值：

SQL> SHOW parameter gc_files_to_locks
NAME                TYPE                VALUE
------------------- ------------------- -----------------
gc_files_to_locks   string
SQL> SELECT sid, name, VALUE FROM v$spparameter WHERE name = 'gc_files_to_locks'; 
SID        NAME                           VALUE
---------- ------------------------------ --------------------------------
*          gc_files_to_locks

不过导致问题产生的确实与GC设置有关，问题并非是gc_files_to_locks参数导致，而是SPFILE中设置的_gc_affinity_time参数。这个参数是静态参数，只有重启后才能生效，而在SPFILE中设置后，会导致重启的实例1生效了该参数，因此和没有重启过的其他实例产生了不兼容。
解决方法有两个，一个是重启所有的节点，另外一个是去掉SPFILE中这个参数的设置：

SQL> ALTER system reset "_gc_affinity_time" scope = spfile sid = '*';
System altered.
SQL> shutdown immediate
ORA-01507: DATABASE NOT mounted
 
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.
Total System Global Area 1.9831E+10 bytes
Fixed SIZE 2119216 bytes
Variable SIZE 3741321680 bytes
DATABASE Buffers 1.6073E+10 bytes
Redo Buffers 14655488 bytes
DATABASE mounted.
SQL> ALTER DATABASE OPEN;
DATABASE altered.

RAC环境启动单实例报错ORA-1105

Leave a Reply Cancel reply

Recent Posts

Recent Comments

Archives

Categories

Meta