一个和并行执行有关的bug。
虽然是并行执行相关,但是报错的并不是Pnnn进程,事实上,实在RAC环境中查询GV$表导致了这个错误。
Wed May 04 15:56:02 EAT 2011 Starting ORACLE instance (normal) LICENSE_MAX_SESSION = 0 LICENSE_SESSIONS_WARNING = 0 Interface TYPE 1 lan900 192.168.194.0 configured FROM OCR FOR USE AS a cluster interconnect Interface TYPE 1 lan901 10.142.194.0 configured FROM OCR FOR USE AS a public interface Picked latch-free SCN scheme 3 LICENSE_MAX_USERS = 0 SYS auditing IS disabled ksdpec: called FOR event 13740 prior TO event GROUP initialization Starting up ORACLE RDBMS Version: 10.2.0.5.0. System parameters WITH non-DEFAULT VALUES: processes = 2000 sessions = 3000 resource_limit = TRUE lock_sga = FALSE shared_pool_size = 2147483648 large_pool_size = 536870912 java_pool_size = 536870912 spfile = /oracle/spfileorcl.ora nls_language = AMERICAN nls_territory = AMERICA disk_asynch_io = FALSE sga_target = 75497472000 control_files = /oracle/control01.ctl, /oracle/control02.ctl, /oracle/control03.ctl db_block_size = 8192 db_cache_size = 47194308608 db_keep_cache_size = 2147483648 db_writer_processes = 8 compatible = 10.2.0.5.0 log_archive_dest_1 = location=/orcl03/arch db_file_multiblock_read_count= 16 cluster_database = TRUE cluster_database_instances= 2 db_recovery_file_dest = /orcl02/flash_recovery_area db_recovery_file_dest_size= 42949672960 thread = 2 fast_start_mttr_target = 300 recovery_parallelism = 6 instance_number = 2 undo_management = AUTO undo_tablespace = UNDOTBS2 undo_retention = 7200 remote_login_passwordfile= EXCLUSIVE db_domain = distributed_lock_timeout = 700 dispatchers = (PROTOCOL=TCP) (SERVICE=orclXDB) local_listener = (ADDRESS = (PROTOCOL = TCP)(HOST = 10.142.194.10) (PORT = 1521)) remote_listener = LISTENERS_ORCL job_queue_processes = 40 parallel_execution_message_size= 16384 _hash_join_enabled = TRUE background_dump_dest = /oracle/app/admin/orcl/bdump user_dump_dest = /oracle/app/admin/orcl/udump core_dump_dest = /oracle/app/admin/orcl/cdump audit_file_dest = /oracle/app/admin/orcl/adump db_name = orcl open_cursors = 2000 optimizer_index_cost_adj = 30 optimizer_index_caching = 88 query_rewrite_enabled = FALSE pga_aggregate_target = 8589934592 Wed May 04 15:56:07 EAT 2011 Oracle instance running WITH ODM: Veritas 5.0 ODM Library, Version 1.0 cluster interconnect IPC version: VERITAS IPC '5.0.31.5' 01:50:22 Jan 25 2010 IPC Vendor 86 proto 76 Version 1.0 PMON started WITH pid=2, OS id=23891 DIAG started WITH pid=3, OS id=23893 PSP0 started WITH pid=4, OS id=23895 LMON started WITH pid=5, OS id=23897 LMD0 started WITH pid=6, OS id=23899 LMS0 started WITH pid=7, OS id=23901 LMS1 started WITH pid=8, OS id=23903 LMS2 started WITH pid=9, OS id=23910 LMS3 started WITH pid=10, OS id=23912 MMAN started WITH pid=11, OS id=23914 DBW0 started WITH pid=12, OS id=23916 DBW1 started WITH pid=13, OS id=23918 DBW2 started WITH pid=14, OS id=23920 DBW3 started WITH pid=15, OS id=23922 DBW4 started WITH pid=16, OS id=23924 DBW5 started WITH pid=17, OS id=23926 DBW6 started WITH pid=18, OS id=23928 DBW7 started WITH pid=19, OS id=23930 LGWR started WITH pid=20, OS id=23932 CKPT started WITH pid=21, OS id=23934 SMON started WITH pid=22, OS id=23936 RECO started WITH pid=23, OS id=23938 CJQ0 started WITH pid=24, OS id=23940 MMON started WITH pid=25, OS id=23947 Wed May 04 15:56:08 EAT 2011 starting up 1 dispatcher(s) FOR network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'... MMNL started WITH pid=26, OS id=23949 Wed May 04 15:56:08 EAT 2011 starting up 1 shared server(s) ... Wed May 04 15:56:23 EAT 2011 lmon registered WITH NM - instance id 2 (internal mem no 1) Wed May 04 15:56:23 EAT 2011 Reconfiguration started (OLD inc 0, NEW inc 24) List OF nodes: 0 1 Global Resource Directory frozen * allocate DOMAIN 0, invalid = TRUE Communication channels reestablished * DOMAIN 0 valid = 1 according TO instance 0 Wed May 04 15:56:23 EAT 2011 Master broadcasted resource hash VALUE bitmaps Non-LOCAL Process blocks cleaned OUT Wed May 04 15:56:23 EAT 2011 LMS 0: 0 GCS shadows cancelled, 0 closed Wed May 04 15:56:23 EAT 2011 LMS 1: 0 GCS shadows cancelled, 0 closed Wed May 04 15:56:23 EAT 2011 LMS 3: 0 GCS shadows cancelled, 0 closed Wed May 04 15:56:23 EAT 2011 LMS 2: 0 GCS shadows cancelled, 0 closed SET master node info Submitted ALL remote-enqueue requests Dwn-cvts replayed, VALBLKs dubious ALL grantable enqueues GRANTED Wed May 04 15:56:23 EAT 2011 LMS 0: 0 GCS shadows traversed, 0 replayed Wed May 04 15:56:23 EAT 2011 LMS 1: 0 GCS shadows traversed, 0 replayed Wed May 04 15:56:23 EAT 2011 LMS 2: 0 GCS shadows traversed, 0 replayed Wed May 04 15:56:23 EAT 2011 LMS 3: 0 GCS shadows traversed, 0 replayed Wed May 04 15:56:23 EAT 2011 Submitted ALL GCS remote-cache requests Fix WRITE IN gcs resources Reconfiguration complete LCK0 started WITH pid=29, OS id=24102 Wed May 04 15:56:24 EAT 2011 ALTER DATABASE MOUNT Wed May 04 15:56:29 EAT 2011 Setting recovery target incarnation TO 1 Wed May 04 15:56:29 EAT 2011 Successful mount OF redo thread 2, WITH mount id 3143841947 Wed May 04 15:56:29 EAT 2011 DATABASE mounted IN Shared Mode (CLUSTER_DATABASE=TRUE) Completed: ALTER DATABASE MOUNT Wed May 04 15:56:29 EAT 2011 ALTER DATABASE OPEN Block CHANGE tracking file IS CURRENT. Picked broadcast ON commit scheme TO generate SCNs Wed May 04 15:56:29 EAT 2011 LGWR: STARTING ARCH PROCESSES ARC0 started WITH pid=31, OS id=24229 Wed May 04 15:56:30 EAT 2011 ARC0: Archival started ARC1: Archival started LGWR: STARTING ARCH PROCESSES COMPLETE ARC1 started WITH pid=32, OS id=24231 Wed May 04 15:56:30 EAT 2011 Thread 2 opened at log SEQUENCE 1086 CURRENT log# 4 seq# 1086 mem# 0: /oracle/redo04.log Wed May 04 15:56:30 EAT 2011 Successful OPEN OF redo thread 2 Wed May 04 15:56:30 EAT 2011 ARC0: Becoming the 'no FAL' ARCH Wed May 04 15:56:30 EAT 2011 ARC0: Becoming the 'no SRL' ARCH Wed May 04 15:56:30 EAT 2011 ARC1: Becoming the heartbeat ARCH Wed May 04 15:56:30 EAT 2011 Starting background process CTWR CTWR started WITH pid=33, OS id=24233 Block CHANGE tracking service IS active. Wed May 04 15:56:30 EAT 2011 SMON: enabling cache recovery Wed May 04 15:56:32 EAT 2011 Errors IN file /oracle/app/admin/orcl/udump/orcl2_ora_24245.trc: ORA-00600: internal error code, arguments: [kgeade_is_0], [], [], [], [], [], [], [] Wed May 04 15:56:34 EAT 2011 Successfully onlined Undo Tablespace 4. Wed May 04 15:56:34 EAT 2011 SMON: enabling tx recovery Wed May 04 15:56:34 EAT 2011 DATABASE Characterset IS ZHS16GBK Opening WITH internal Resource Manager plan replication_dependency_tracking turned off (no async multimaster replication found) Starting background process QMNC QMNC started WITH pid=37, OS id=24328 Wed May 04 15:56:39 EAT 2011 Completed: ALTER DATABASE OPEN Wed May 04 15:56:39 EAT 2011 Errors IN file /oracle/app/admin/orcl/bdump/orcl2_m000_24393.trc: ORA-00600: internal error code, arguments: [kgeade_is_0], [], [], [], [], [], [], [] Wed May 04 15:56:41 EAT 2011 Trace dumping IS performing id=[cdmp_20110504155641] Wed May 04 15:56:44 EAT 2011 Errors IN file /oracle/app/admin/orcl/bdump/orcl2_mmon_23947.trc: ORA-00600: internal error code, arguments: [kgeade_is_0], [], [], [], [], [], [], [] Wed May 04 15:56:44 EAT 2011 Errors IN file /oracle/app/admin/orcl/bdump/orcl2_m000_24393.trc: ORA-00600: internal error code, arguments: [kgeade_is_0], [], [], [], [], [], [], [] Wed May 04 15:56:48 EAT 2011 Errors IN file /oracle/app/admin/orcl/bdump/orcl2_mmon_23947.trc: ORA-00600: internal error code, arguments: [kgeade_is_0], [], [], [], [], [], [], [] Wed May 04 15:57:41 EAT 2011 Errors IN file /oracle/app/admin/orcl/udump/orcl2_ora_25174.trc: ORA-00600: internal error code, arguments: [kgeade_is_0], [], [], [], [], [], [], [] Wed May 04 15:57:43 EAT 2011 Trace dumping IS performing id=[cdmp_20110504155743] Wed May 04 15:58:44 EAT 2011 Errors IN file /oracle/app/admin/orcl/udump/orcl2_ora_25174.trc: ORA-00600: internal error code, arguments: [kgeade_is_0], [], [], [], [], [], [], [] Wed May 04 15:59:46 EAT 2011 Reconfiguration started (OLD inc 24, NEW inc 26) List OF nodes: 1 Global Resource Directory frozen * dead instance detected - DOMAIN 0 invalid = TRUE Communication channels reestablished Master broadcasted resource hash VALUE bitmaps Non-LOCAL Process blocks cleaned OUT Wed May 04 15:59:46 EAT 2011 LMS 0: 0 GCS shadows cancelled, 0 closed Wed May 04 15:59:46 EAT 2011 LMS 3: 0 GCS shadows cancelled, 0 closed Wed May 04 15:59:46 EAT 2011 LMS 2: 0 GCS shadows cancelled, 0 closed Wed May 04 15:59:46 EAT 2011 LMS 1: 0 GCS shadows cancelled, 0 closed SET master node info Submitted ALL remote-enqueue requests Dwn-cvts replayed, VALBLKs dubious ALL grantable enqueues GRANTED Post SMON TO START 1st pass IR Wed May 04 15:59:46 EAT 2011 Instance recovery: looking FOR dead threads Instance recovery: LOCK DOMAIN invalid but no dead threads Wed May 04 15:59:46 EAT 2011 LMS 2: 1347 GCS shadows traversed, 0 replayed Wed May 04 15:59:46 EAT 2011 LMS 3: 1386 GCS shadows traversed, 0 replayed Wed May 04 15:59:46 EAT 2011 LMS 1: 1383 GCS shadows traversed, 0 replayed Wed May 04 15:59:46 EAT 2011 LMS 0: 1430 GCS shadows traversed, 0 replayed Wed May 04 15:59:46 EAT 2011 Submitted ALL GCS remote-cache requests Fix WRITE IN gcs resources Reconfiguration complete Wed May 04 16:01:40 EAT 2011 Reconfiguration started (OLD inc 26, NEW inc 28) List OF nodes: 0 1 Global Resource Directory frozen Communication channels reestablished * DOMAIN 0 valid = 1 according TO instance 0 Wed May 04 16:01:40 EAT 2011 Master broadcasted resource hash VALUE bitmaps Non-LOCAL Process blocks cleaned OUT Wed May 04 16:01:40 EAT 2011 LMS 0: 0 GCS shadows cancelled, 0 closed Wed May 04 16:01:40 EAT 2011 LMS 1: 0 GCS shadows cancelled, 0 closed Wed May 04 16:01:40 EAT 2011 LMS 3: 0 GCS shadows cancelled, 0 closed Wed May 04 16:01:40 EAT 2011 LMS 2: 0 GCS shadows cancelled, 0 closed SET master node info Submitted ALL remote-enqueue requests Dwn-cvts replayed, VALBLKs dubious ALL grantable enqueues GRANTED Wed May 04 16:01:40 EAT 2011 LMS 2: 2224 GCS shadows traversed, 1095 replayed Wed May 04 16:01:40 EAT 2011 LMS 3: 2255 GCS shadows traversed, 1073 replayed Wed May 04 16:01:40 EAT 2011 LMS 1: 2339 GCS shadows traversed, 1110 replayed Wed May 04 16:01:40 EAT 2011 LMS 0: 2300 GCS shadows traversed, 1216 replayed Wed May 04 16:01:40 EAT 2011 Submitted ALL GCS remote-cache requests Fix WRITE IN gcs resources Reconfiguration complete |
可以看到,数据库在还没有完全打开的时候就报出了这个错误,而在数据库实例打开之后,又由MMON进程引发了这个错误。
*** SERVICE NAME:(SYS$BACKGROUND) 2011-05-04 15:56:44.161 *** SESSION ID:(2977.1) 2011-05-04 15:56:44.161 *** 2011-05-04 15:56:44.161 ksedmp: internal OR fatal error ORA-00600: internal error code, arguments: [kgeade_is_0], [], [], [], [], [], [], [] CURRENT SQL statement FOR this SESSION: SELECT INSTANCE_NAME, HOST_NAME, NVL(GVI_STARTUP_TIME, SYSTIMESTAMP) - INTERVAL '1' SECOND AS SHUTDOWN_TIME FROM (SELECT RRI.INSTANCE_NAME AS INSTANCE_NAME, RRI.HOST_NAME AS HOST_NAME, FROM_TZ(RRI.STARTUP_TIME, '+00:00') AS RRI_STARTUP_TIME, DBMS_HA_ALERTS_PRVT.INSTANCE_STARTUP_TIMESTAMP_TZ(GVI.STARTUP_TIME) AS G VI_STARTUP_TIME FROM RECENT_RESOURCE_INCARNATIONS$ RRI LEFT OUTER JOIN GV$INSTANCE GVI ON GVI.INSTANCE_NAME = RRI.RESOURCE_NAME WHERE RRI.RESOURCE_TYPE = 'IN STANCE' AND :B2 = RRI.DB_UNIQUE_NAME AND :B1 = RRI.DB_DOMAIN) WHERE GVI_STARTUP_TIME IS NULL OR GVI_STARTUP_TIME > RRI_STARTUP_TIME GROUP BY INSTANCE_NAME, H OST_NAME, GVI_STARTUP_TIME ----- PL/SQL Call Stack ----- object line object handle NUMBER name c0000012906ce9b0 301 package body SYS.DBMS_HA_ALERTS_PRVT c0000012915b91f0 1 anonymous block ----- Call Stack Trace ----- calling CALL entry argument VALUES IN hex location TYPE point (? means dubious VALUE) -------------------- -------- -------------------- ---------------------------- ksedst()+64 CALL ksedst1() 000000000 ? 000000001 ? ksedmp()+2176 CALL ksedst() 000000000 ? C000000000000D20 ? 4000000004037940 ? 000000000 ? 000000000 ? 000000000 ? ksfdmp()+112 CALL ksedmp() 000000003 ? 9FFFFFFFFFFE5ED0 ? 60000000000BA290 ? 9FFFFFFFFFFE64A0 ? C000000000000999 ? 400000000407F9B0 ? kgerinv()+304 CALL ksfdmp() 9FFFFFFFFFFE6A30 ? 000000003 ? 9FFFFFFFFFFE64B0 ? 60000000000BA290 ? C000000000000612 ? 40000000098C38B0 ? kgeasnmierr()+144 CALL kgerinv() 60000000000318D0 ? 4000000001AD98A0 ? 6000000000032988 ? 4000000001AD98A0 ? 9FFFFFFFFFFE6A70 ? $cold_kgeade()+64 CALL kgeasnmierr() 60000000000318D0 ? 9FFFFFFFFD3B3438 ? 9FFFFFFFFD3B3448 ? 6000000000032D00 ? 9FFFFFFFFD45A598 ? 9FFFFFFFFFFE69B0 ? 9FFFFFFFFD460E98 ? C000001290883058 ? kgerev()+96 CALL $cold_kgeade() 60000000000318D0 ? 6000000000031A50 ? 9FFFFFFFFD3B3438 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? kserec0()+160 CALL kgerev() 60000000000318D0 ? 9FFFFFFFFD3B3438 ? 000000000 ? 6000000000032CF0 ? 9FFFFFFFFFFE6B38 ? |
根据MOS记录的信息,当查询GV$视图,Oracle在RAC的多个节点同时查询本地的V$视图出错。导致错误的原因有可能是各个节点上的parallel_execution_message_size参数或者parallel_automatic_tuning参数设置不一致所致。
检查当前的RAC节点,发现各个PARALLEL相关的参数的设置都是相同的。虽然不是参数所致,但是发现,在报出后不久,当前节点就检测到RAC的另外一个节点CRASH,那么产生这个ORA-600错误的原因就是因为远端节点无法启动,导致远端的V$视图查询进程无法启动,从而产生了这个问题。