20120930OOW大会开幕

虽然正式的大会要到10月1日,才算正式开始,但是今天晚上Larry的演讲才是大会开幕的标识。
上午去会场听了超大型数据库设计的主题,演讲者是Morpho,Oracle的ACE,他的主题围绕大数据量的数据库应该在设计和维护时应该考虑的方面。其内容涉及数据加载、分区、并行执行、备份恢复、统计信息收集、迁移等多方面的内容。在演讲的最后给出了一个真实的案例,数据库从11.2.0.2迁移到11.2.0.3,数据量达到40T左右,几乎是零DOWN机时间。虽然迁移方式仍然是利用DATA GUARD的功能,但是实现方法上却因为收到存储空间以及时间等多方面的限制和普通的备份恢复的方式有很大的不同。他利用了RMAN的INCREMENT UPDATABLE BACKUP功能,在本地生成DATAFILE COPY,然后利用DBMS_TRANSFER_FILE从本地的FRA直接复制到远端数据库的ASM中,然后利用归档在本地进行恢复。除了这个案例外,演讲中提到的11g对于分区表收集统计信息的新特性incremental statistics gathering和concurrent statistics gathering对于大数据量的统计信息收集是非常有帮助的。12c已经来了,而11g的新特性还有很多不了解,很有压力啊。
在会场碰到了FENNG,从他去了杭州之后,这些年几乎没见过几次,没想到这次OOW倒是给了我们难得的叙旧机会。由于我是第一次到旧金山,FENNG自告奋勇带我去渔人码头转转,结果路上发现居然他不记路,而且也没有导航,于是我对于能否顺利到达并返回产生了怀疑。不过事实证明FENNG还是相当靠谱的,不过旧金山的CABLE CAR真是不靠谱,效率低得一塌糊涂,来回路上一共一个小时,在渔人码头转了一个半小时,而来回等车用了两个半小时。还好最后紧赶慢赶总算是赶上了Larry的演讲。
Oracle这次的大会的主题是云,旧金山满大街到处都是Oracle的广告:Hardware and Software Engineered to Work Together。Oracle的云服务包含Saas、PaaS和IaaS。Oracle在SaaS领域主要竞争对手是Salesforce,而不是SAP;在IaaS领域的竞争对手是亚马逊而不是IBM。在PaaS领域Oracle在数据库、中间件等多个产品都做到了NO1,于是Oracle提都没提竞争对手的事情。此外,Larry还介绍了Oracle12c以及ExadataX3,可惜由于要参加ACE dinner这部分没有听到,看来只能过几天再补了。
Ace晚宴离在渔人码头和金门大桥直接的位置,早知道就不急着往回赶了。现场无数的大牛,可惜认识的不多。Kamus倒是意兴盎然的四处邀请人来参加明年的Otn China Tour活动,这里就体现出英语好与不好的区别了。

Posted in NEWS | Tagged , , | Leave a comment

20120929赴旧金山参加OOW

前年参加过北京的Oracle Open World,而到旧金山参加OOW还是第一次。
其实早在04年的时候北京就举办过OOW大会,只不过那个时候还没有能力参加;到了07年的时候在上海举行,很多朋友都参加了那次的盛会,可惜当时走不开也没有参加;到了10年北京的那次,其实才是第一次参加OOW。而美国旧金山Oracle总部的OOW就更不用说了。好在今年的ACE Director顺利的批了下来,这才使得这趟OOW之旅顺利成行。
巧合的是,飞机上旁边座位的兄弟也是去参加OOW大会的,他是Oracle的员工,而我坐的这架航班,Oracle公司的人大约有7、8个人之多。更巧的是,机场外等车的时候碰到了崔华,看来OOW使得很多熟人都聚集到了旧金山。
由于申请的是BLOGER方式的OOW门票,因此办理签到稍显麻烦,在酒店一层的签到处无法进行办理,只能跑到会场处的指定地点进行办理。好在Kamus下午刚刚办完,在他的帮助下,顺利的拿到了Oracle OpenWorld的入场券,明天就可以开始参加会议了。不过真正的重头戏还要明天下午Larry的演讲之后才会开始。
晚上和Kamus、Biti_rainy一起吃的晚饭,在Biti的要求下,找了一家中餐馆,味道很地道,几乎找不到一点旧金山的感觉,因为整个点菜过程都是用中文完成的。

Posted in NEWS | Tagged , , , | Leave a comment

ORA-600(kauxs_do_jou:3)错误

客户9.2.0.6数据库出现ORA-600(kauxs_do_jou:3)错误。
错误信息为:

Wed Mar 28 22:44:48 2012
Errors IN file /DATA/oracle/admin/orcl/udump/orcl_ora_19724.trc:
ORA-00600: 内部错误代码,参数: [kauxs_do_jou:3], [68], [], [], [], [], [], []
Wed Mar 28 22:46:38 2012
Errors IN file /DATA/oracle/admin/orcl/udump/orcl_ora_19856.trc:
ORA-00600: 内部错误代码,参数: [kauxs_do_jou:3], [68], [], [], [], [], [], []
Wed Mar 28 22:47:01 2012
Errors IN file /DATA/oracle/admin/orcl/udump/orcl_ora_19919.trc:
ORA-00600: 内部错误代码,参数: [kauxs_do_jou:3], [68], [], [], [], [], [], []
Wed Mar 28 22:48:33 2012
Errors IN file /DATA/oracle/admin/orcl/udump/orcl_ora_19987.trc:
ORA-00600: 内部错误代码,参数: [kauxs_do_jou:3], [68], [], [], [], [], [], []
Wed Mar 28 22:48:54 2012
Errors IN file /DATA/oracle/admin/orcl/udump/orcl_ora_19766.trc:
ORA-00600: 内部错误代码,参数: [kauxs_do_jou:3], [68], [], [], [], [], [], []

这个问题是索引重建导致的,详细文档可以参考Bug 2955004 OERI[kauxs_do_jou:3] / corrupt index from online index rebuild with concurrent DML,在ONLINE REBUILD索引时如果在基表上发生DML语句,就可能会导致索引的损坏。
这个问题确认的影响版本为9.2.0.7,而当前的9.2.0.6版本虽没有明确说明,但应该存在同样的问题。这个bug在10.1.0.5和9.2.0.8中被FIXED。
除了升级之外,对于当前版本而言,应该避免使用ONLINE REBUILD索引。对于已经存在的索引,采用DROP后重建的方式。

Posted in BUG | Tagged , , , , , , | Leave a comment

ITPUB SQL大赛第四期

SQL大赛第四期解法的最终答案。
第四期题目参考:http://www.itpub.net/thread-1411495-1-1.html版主newkid点评参考:http://www.itpub.net/thread-1417576-1-1.html
最近整理BLOG文章时发现,这篇文章当时没有贴出来。由于当时发现最后一题出现了错误,且手头的事情比较多,本打算搞清楚之后再把结果贴出来,没想到后来忘记了。虽然已经过去了一年了,不过信守自己的承诺,还是把最终结果贴出来。
我的代码为:

WITH DAY AS
(
	SELECT TO_DATE(:P_START_DATE, 'YYYYMMDD') + ROWNUM - 1 DAY 
	FROM DUAL
	CONNECT BY ROWNUM <= TO_DATE(:P_END_DATE, 'YYYYMMDD') - TO_DATE(:P_START_DATE, 'YYYYMMDD') + 1
), 
COMPANY_INFO AS 
(
	SELECT A.USER_ID, SERVICE_ID, B.COMPANY_ID, 
		GREATEST(START_DATE, TO_DATE(:P_START_DATE, 'YYYYMMDD')) START_DATE, 
		LEAST(END_DATE, TO_DATE(:P_END_DATE, 'YYYYMMDD')) END_DATE
	FROM SERVICE_USAGE A, SERVICE_USERS B 
	WHERE A.USER_ID = B.USER_ID
	AND START_DATE <= TO_DATE(:P_END_DATE, 'YYYYMMDD')
	AND END_DATE >= TO_DATE(:P_START_DATE, 'YYYYMMDD')
	AND TO_DATE(:P_END_DATE, 'YYYYMMDD') - TO_DATE(:P_START_DATE, 'YYYYMMDD') >= 60
), 
COMPANY_DAY1 AS 
(
	SELECT SERVICE_ID, COMPANY_ID, DAY, COUNT(DISTINCT USER_ID) CN
	FROM DAY D, COMPANY_INFO C
	WHERE D.DAY BETWEEN C.START_DATE AND C.END_DATE
	GROUP BY SERVICE_ID, COMPANY_ID, DAY
), 
COMPANY_DAY2 AS 
(
	SELECT SERVICE_ID, COMPANY_ID, DAY, COUNT(DISTINCT A.USER_ID) CN
	FROM 
	(
		SELECT USER_ID, SERVICE_ID, START_DATE, END_DATE, DAY
		FROM SERVICE_USAGE 
		WHERE START_DATE <= TO_DATE(:P_END_DATE, 'YYYYMMDD')
		AND END_DATE >= TO_DATE(:P_START_DATE, 'YYYYMMDD')
		AND TO_DATE(:P_END_DATE, 'YYYYMMDD') - TO_DATE(:P_START_DATE, 'YYYYMMDD') < 60
		MODEL UNIQUE SINGLE REFERENCE
		PARTITION BY (USER_ID, SERVICE_ID, 
			GREATEST(START_DATE, TO_DATE(:P_START_DATE, 'YYYYMMDD')) START_DATE, 
			LEAST(END_DATE, TO_DATE(:P_END_DATE, 'YYYYMMDD')) END_DATE)
		DIMENSION BY (CAST(NULL AS DATE) DAY)
		MEASURES (0 V)
		RULES
		(
		V[
		FOR DAY 
			FROM TO_DATE(:P_START_DATE, 'YYYYMMDD')
			TO TO_DATE(:P_END_DATE, 'YYYYMMDD')
			INCREMENT 1
		] = 1
		) 
	) A, SERVICE_USERS B 
	WHERE A.USER_ID = B.USER_ID
	AND DAY BETWEEN START_DATE AND END_DATE 
	GROUP BY SERVICE_ID, COMPANY_ID, DAY
), 
COMPANY_DAY AS
(
	SELECT SERVICE_ID, COMPANY_ID, DAY, CN
	FROM COMPANY_DAY1
	UNION ALL
	SELECT SERVICE_ID, COMPANY_ID, DAY, CN
	FROM COMPANY_DAY2
)
SELECT R.COMPANY_ID, COMPANY_NAME, R.SERVICE_ID, SERVICE_NAME, FEE1, FEE2, 
	FEE3, FEE4, NVL(FEE1, 0) + NVL(FEE2, 0) + NVL(FEE3, 0) + NVL(FEE4, 0) TOTAL_FEE 
FROM
(
	SELECT CD.SERVICE_ID, CD.COMPANY_ID, CATEGORY_ID, RATE*CN
	FROM COMPANY_DAY CD, SERVICE_RATES SR
	WHERE CD.SERVICE_ID = SR.SERVICE_ID
	AND CD.COMPANY_ID = SR.COMPANY_ID
	AND CN BETWEEN USER_COUNT_MIN AND USER_COUNT_MAX
) 
PIVOT 
(
	SUM(RATE) FOR CATEGORY_ID IN (1 FEE1, 2 FEE2, 3 FEE3, 4 FEE4)
) R, SERVICES S, COMPANIES C
WHERE R.SERVICE_ID = S.SERVICE_ID
AND R.COMPANY_ID = C.COMPANY_ID
ORDER BY 1, 3;

这是改正之后的代码,与之前错误的代码相比,在SELECT CD.SERVICE_ID, CD.COMPANY_ID, CATEGORY_ID, RATE这一行,将RATE改为了RATE*CN。题目里面其实对这个问题进行了说明,不过后续在处理SQL的过程中漏掉了,于是导致了SQL的错误。
下面对SQL的实现进行简单的说明:
整个SQL由两大部分组成,第一个部分通过构造日期的方式获取每天每个SERVICE_ID和COMPANY_ID有多少不同的用户使用。第一部分的功能都通过WITH子查询方式实现,第二部分主要是利用11G的PIVOT实现列到行的转换。
出于性能方面的考虑,第一部分的逻辑通过等价的两种方式实现:分别是利用CONNECT BY方式构造日期和利用MODEL语句来构造日期,这两种方式分别对应COMPANY_DAY1和COMPANY_DAY2两个WITH语句。
在我的测试环境中,当:P_START_DATE与:P_END_DATE的时间间隔小于2个月的时候,第二种MODEL方式,也就是COMPANY_DAY2的效率更高,而当:P_START_DATE与:P_END_DATE的时间间隔较长,比如大于1年,则第一种方式,也就是COMPANY_DAY1的效率要远高于第二种方式。由于第一种方式线性更好,因此在不确定数据分布的情况下,将二者的阈值设置为60天,在两个WITH中加上了互斥的常量条件,分别为日期间隔大于等于60和间隔小于60。通过UNION ALL将两个WITH查询合并,为第二部分构造了统一的接口COMPANY_DAY,使得SQL根据输入间隔的不同自动选择更优的代码执行。
子查询COMPANY_DAY1的思路很简单:DAY子查询用于构造输入变量:P_START_DATE和:P_END_DATE之间所有的日期信息。关联SERVICE_USAGE和SERVICE_USERS表,获取COMPANY_ID和SERVICE_ID的对应关系,同时获取对应的时间范围,开始时间由SERVICE_USAGE中START_DATE和输入变量:P_START_DATE二者中大的值确定,结束时间由SERVICE_USAGE中END_DATE和输入变量:P_END_DATE中小的确定。
通过DAY子查询和COMPANY_INFO关联并聚集,使用COUNT(DISTINCT USER_ID)是为了避免同一个用户一天内多次使用该服务而被重复计算。同样用户配置使用服务的日期区间的重叠问题也会被DISTINCT消除掉。子查询COMPANY_DAY2的思路其实和COMPANY_DAY1没有区别,只不过利用了MODEL来自动生成日期维度,由于无论是利用FROM TO方式还是利用子查询方式,都无法带入当前记录的START_DATE和END_DATE。因此只能使用输入的变量:P_START_DATE和:P_END_DATE构造日期维度,随后构造的维度DAY和记录的START_DATE和END_DATE过滤数据,随后的思路和COMPANY_DAY1相同。
COMPANY_DAY主要是提供一个统一接口,关联COMPANY_DAY和SERVICE_RATES表,根据每天的人数和SERVICE_RATES的人数范围进行匹配,汇总对应的RATE并利用PIVOT进行行转列操作,最后关联SERVICES和COMPANIES维度表获取名称,并进行汇总值的计算

Posted in ORACLE | Tagged , , | Leave a comment

ORA-600(504)(row cache objects)错误

客户数据库出现ORA-600(504)错误,其中第5个错误参数为row cache objects。
以前碰到过一个shared pool的相关latch:http://yangtingkun.itpub.net/post/468/526079
错误信息为:

Wed Apr 18 06:03:41 2012
Errors IN file /DATA/oracle/admin/orcl/udump/orcl_ora_19038.trc:
ORA-00600: 内部错误代码,参数: [504], [0x4628BD7A0], [32], [4], [ROW cache objects], [10], [2], [0x000000000]

数据库版本为9.2.0.6,根据MOS文档,导致问题的原因是查询了V$SQL_PLAN视图,且当前会话已经持有了一个不兼容的library cache latch。
Oracle给出的解决方法只有升级到10.2.0.1,对于10.2以下的版本,可以考虑设置隐含参数_cursor_plan_unparse_enabled的值为false来解决该问题。

Posted in BUG | Tagged , , , , | Leave a comment

ORA-7445(kpufdesc)错误

客户10.2.0.4 RAC数据库出现ORA-7445(kpufdesc)错误。
错误信息如下:

Thu DEC 22 15:29:07 2011
Errors IN file /home/oracle/product/10.2.0/admin/orcl/bdump/orcl2_mmon_16893.trc:
ORA-07445: exception encountered: core dump [kpufdesc()+634] [SIGSEGV] [Address NOT mapped TO object] [0x000000065] [] []
ORA-22303: TYPE "SYS"."ALERT_TYPE" NOT found

出现错误的进程是MMON进程,而报错信息中显示SYS用户的对象ALERT_TYPE无法找到,显然Oracle数据库中对象的状态不正常。根据MOS文档ORA-07445 [kpufdesc()+872] ORA-22303 and ORA-1403 [ID 1380614.1],导致错误产生的原因是ALERT_QUE队列的状态不正常。
接近方法就是重建ALTER_QUE队列。在11g中可以通过catmwin.sql来进行重建,对于当前的10.2版本,重建步骤为将系统至于restricted session状态,运行catnoalr.sql、dbmsslrt.sql和catalrt.sql脚本,并利用utlrp.sql进行编译,如果操作导致了DBSNMP.MGMT_BSLN_INTERNAL状态的异常,还需要执行catsnmp.sql,然后关闭restricted session既可。

 

 

Posted in BUG | Tagged , , , , , , , , | Leave a comment

系统出现cursor: mutex X等待导致实例HANG死

客户数据库出现明显的cursor: mutex X等待,尝试通过ALTER SYSTEM KILL SESSION的方式释放等待的会话,结果导致整个实例出现HANG死的情况。

数据库版本为10.2.0.4,检查数据库问题发生时刻的AWR报告,发现cursor: mutex X等待在AWR报告中并不明显,不但没有出现在TOP 5里,而在整个等待事件的排名中也排在非常靠后的位置,且该事件的总等待时间为0,这与客户反馈的大约70多个会话持续等待cursor: mutex X事件的描述极为不符。
导致这个现象的原因可能有两种,一种是虽然从V$SESSION中看到当前的等待是cursor :mutex X事件,但是该事件已经结束,因此会话的等待时间并未记录并汇总到这个事件的等待时间中;另外一种情况是Oracle对于这个事件的统计值存在错误。
继续分析AWR报告,可以发现VERSION COUNT报告中,最高的两个SQL已经远远超过了正常的范围,达到了13W以上,毫无疑问这肯定不正常的情况所致:

Version Count

Executions

SQL Id

SQL Module

SQL Text

150,649

7q1tdx7ysmg6f

** SQL Text Not Available **

134,787

fa3vfvs859n5n

** SQL Text Not Available **

10,732

gr0wk1tf0npm4

select count(:”SYS_B_0″) as ro…

6,547

6r91qu3knm3vz

** SQL Text Not Available **

5,888

36ft4hmuru5d2

** SQL Text Not Available **

排在第3和第4位的VERSION COUNT达到5000以上就已经不太正常了,何况最高的两条语句的值已经超过了13W,不用说这两条语句会占用大量的共享池空间:

Sharable Mem (b)

Executions

% Total

SQL Id

SQL Module

SQL Text

3,082,390,224

28.84

7q1tdx7ysmg6f

** SQL Text Not Available **

2,902,320,992

27.16

fa3vfvs859n5n

** SQL Text Not Available **

174,014,440

1.63

gr0wk1tf0npm4

select count(:”SYS_B_0″) as ro…

126,804,120

1.19

36ft4hmuru5d2

** SQL Text Not Available **

106,961,832

1.00

6r91qu3knm3vz

** SQL Text Not Available **

果然不出所料,每个问题SQL占用的共享池空间都达到了3G左右。
显然导致当前cursor: mutex X的问题肯定和这两个SQL直接相关,最后通过ASH视图验证一下这个判断:

SQL&gt; SELECT to_char(sample_time, 'yyyy-mm-dd hh24:mi:ss.ff'), session_id, event_name, sql_id
  2  FROM wrh$_active_session_history ash, wrh$_event_name en
  3  WHERE ash.dbid = 1944025784
  4  AND en.dbid = 1944025784
  5  AND ash.event_id = en.event_id(+)
  6  AND event_name LIKE '%mutex%'
  7  AND sample_timeselect sql_id, COUNT(*)
  2  FROM wrh$_active_session_history ash, wrh$_event_name en
  3  WHERE ash.dbid = 1944025784
  4  AND en.dbid = 1944025784
  5  AND ash.event_id = en.event_id(+)
  6  AND event_name LIKE '%mutex%'
  7  AND sample_time

很明显,导致系统出现cursor: mutex X等待的就是AWR报告中看到的两个语句,正是这两个语句不断的产生新的VERSION,导致了cursor: mutex X事件,并占用越来越多的共享池空间,最终导致整个数据库出现了HANG死的情况。
根据分析导致这个问题的最直接的原因是初始化参数CURSOR_SHARING设置为SIMILAR。而CURSOR_SHARING设置为SIMILAR从而导致VERSION COUNT过大的问题已经有过很多案例了,对于当前的数据库而言,要想彻底避免这个问题,除了升级到最高的10.2.0.5.6以外,就是通过改变CURSOR_SHARING的设置。显然EXACT是最佳方案,但是这必须同时修改程序使用绑定变量,才能避免由于参数修改带来的硬解析的问题。此外修改为FORCE也是一个选择,不过这需要在测试环境中进行验证,避免此修改带来的性能问题和其他BUG的风险。

Posted in ORACLE | Tagged , , , , | Leave a comment

ORA-7445(lxsCntChar)错误

使用正则表达式导致的错误。
错误信息为:

Fri DEC 16 15:37:51 2011
Exception [TYPE: SIGSEGV, Invalid permissions FOR mapped object] [ADDR:0x2AFEA42A4000] [PC:0x9016222, lxsCntChar()+210] [flags: 0x0, COUNT: 1]
Errors IN file /u01/app/oracle/diag/rdbms/orcl/ods1/trace/orcl1_ora_12150.trc  (incident=41375):
ORA-07445: exception encountered: core dump [lxsCntChar()+210] [SIGSEGV] [ADDR:0x2AFEA42A4000] [PC:0x9016222] [Invalid permissions FOR mapped object] []
Incident details IN: /u01/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_41375/orcl1_ora_12150_i41375.trc
USE ADRCI OR Support Workbench TO package the incident.
See Note 411.1 at My Oracle Support FOR error AND packaging details.

数据库的版本为11.2.0.2 RAC on Exadata v2,查询MOS发现,这个问题和正则表达式有关。详细文档为Bug 14173447 – ORA-7445 [lxsCntChar] under lxregmatch using REGEXP_REPLACE [ID 14173447.8],这个问题确认影响的版本为11.2.0.3,Oracle计划在11.2.0.4和12.1中解决该错误。
除了通过补丁集解决该问题外,Oracle还提供了HP-UX和Linux X86-64平台下的补丁来解决这个问题。

Posted in BUG | Tagged , , , | Leave a comment

SCHEDULER调用XDB程序导致性能问题

客户数据库出现SCHEDULER后台作业频繁调用XDB包,导致系统性能受到影响,数据库版本为10.2.0.3。
导致性能问题的SQL为:

BEGIN BEGIN IF (xdb.DBMS_XDBZ0.is_hierarchy_enabled_internal(sys.dictionary_obj_owner, sys.dictionary_obj_name, sys.dictionary_obj_owner)) THEN xdb.XDB_PITRIG_PKG.pitrig_truncate(sys.dictionary_obj_owner, sys.dictionary_obj_name); END IF; EXCEPTION WHEN OTHERS THEN NULL; END; BEGIN IF (xdb.DBMS_XDBZ0.is_hierarchy_enabled_internal(sys.dictionary_obj_owner, sys.dictionary_obj_name, sys.dictionary_obj_owner, xdb.DBMS_XDBZ.IS_ENABLED_RESMETADATA)) THEN xdb.XDB_PITRIG_PKG.pitrig_dropmetadata(sys.dictionary_obj_owner, sys.dictionary_obj_name); END IF; EXCEPTION WHEN OTHERS THEN NULL; END; END;

这个SQL在一小时的时间内执行了120多万次,平均每秒要执行160次,难怪会导致性能问题。Oracle在文档SQL IN XDB.DBMS_XDBZ0 LINE 637 IS MAJOR CONSUMER OF CPU [ID 972671.1]对这个问题进行了说明,导致这个问题的原因是XDB用户调用了SYS.DBMS_SYS_SQL过程,Oracle提供了补丁Patch 4766344可以解决这个问题。

Posted in BUG | Tagged , , , | Leave a comment

ORA-600(kfgFinalize_2)错误

11.2.0.1的ASM出现ORA-600[kfgFinalize_2]错误。
错误信息为:

Fri Jun 17 09:59:51 2011
NOTE: No asm libraries found IN the system
* instance_number obtained FROM CSS = 2, checking FOR the existence OF node 0... 
* node 0 does NOT exist. instance_number = 2 
Starting ORACLE instance (normal)
sskgpgetexecname failed TO GET name
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Interface TYPE 1 en1 192.168.138.0 configured FROM GPnP Profile FOR USE AS a cluster interconnect
Interface TYPE 1 en0 10.121.128.0 configured FROM GPnP Profile FOR USE AS a public interface
Picked latch-free SCN scheme 3
USING LOG_ARCHIVE_DEST_1 parameter DEFAULT VALUE AS /u01/app/grid/dbs/arch
Autotune OF undo retention IS turned ON. 
LICENSE_MAX_USERS = 0
SYS auditing IS disabled
Starting up:
Oracle DATABASE 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
WITH the REAL Application Clusters AND Automatic Storage Management options.
USING parameter settings IN server-side spfile +CRS/jhzw-cluster/asmparameterfile/registry.253.732844175
System parameters WITH non-DEFAULT VALUES:
  large_pool_size          = 12M
  instance_type            = "asm"
  remote_login_passwordfile= "EXCLUSIVE"
  asm_diskgroups           = "DATA"
  asm_diskgroups           = "FRA"
  asm_power_limit          = 1
  diagnostic_dest          = "/u01/grid"
Cluster communication IS configured TO USE the following interface(s) FOR this instance
  192.168.138.187
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
Fri Jun 17 09:59:59 2011
PMON started WITH pid=2, OS id=258330 
Fri Jun 17 09:59:59 2011
VKTM started WITH pid=3, OS id=311588 at elevated priority
VKTM running at (10)millisec PRECISION WITH DBRM quantum (100)ms
Fri Jun 17 09:59:59 2011
GEN0 started WITH pid=4, OS id=319784 
Fri Jun 17 10:00:00 2011
DIAG started WITH pid=5, OS id=282666 
Fri Jun 17 10:00:00 2011
PING started WITH pid=6, OS id=323882 
Fri Jun 17 10:00:00 2011
PSP0 started WITH pid=7, OS id=327980 
Fri Jun 17 10:00:00 2011
DIA0 started WITH pid=8, OS id=286766 
Fri Jun 17 10:00:00 2011
LMON started WITH pid=9, OS id=332080 
Fri Jun 17 10:00:02 2011
LMD0 started WITH pid=10, OS id=278586 
Fri Jun 17 10:00:03 2011
LMS0 started WITH pid=11, OS id=290872 at elevated priority
Fri Jun 17 10:00:03 2011
LMHB started WITH pid=12, OS id=299072 
Fri Jun 17 10:00:03 2011
MMAN started WITH pid=13, OS id=336182 
Fri Jun 17 10:00:03 2011
DBW0 started WITH pid=14, OS id=303170 
Fri Jun 17 10:00:03 2011
LGWR started WITH pid=15, OS id=348478 
Fri Jun 17 10:00:03 2011
CKPT started WITH pid=16, OS id=352578 
Fri Jun 17 10:00:03 2011
SMON started WITH pid=17, OS id=311366 
Fri Jun 17 10:00:03 2011
RBAL started WITH pid=18, OS id=307272 
Fri Jun 17 10:00:03 2011
GMON started WITH pid=19, OS id=315464 
Fri Jun 17 10:00:04 2011
MMON started WITH pid=20, OS id=340278 
Fri Jun 17 10:00:04 2011
MMNL started WITH pid=21, OS id=319562 
lmon registered WITH NM - instance NUMBER 2 (internal mem no 1)
Reconfiguration started (OLD inc 0, NEW inc 12)
ASM instance 
List OF instances:
 1 2 (myinst: 2) 
 Global Resource Directory frozen
* allocate DOMAIN 0, invalid = TRUE 
 Communication channels reestablished
 * DOMAIN 0 valid = 1 according TO instance 1 
* allocate DOMAIN 1, invalid = TRUE 
 * DOMAIN 1 valid = 1 according TO instance 1 
* allocate DOMAIN 2, invalid = TRUE 
 * DOMAIN 2 valid = 1 according TO instance 1 
* allocate DOMAIN 3, invalid = TRUE 
 * DOMAIN 3 valid = 1 according TO instance 1 
 Master broadcasted resource hash VALUE bitmaps
 Non-LOCAL Process blocks cleaned OUT
 LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
 SET master node info 
 Submitted ALL remote-enqueue requests
 Dwn-cvts replayed, VALBLKs dubious
 ALL grantable enqueues GRANTED
 Submitted ALL GCS remote-cache requests
 Fix WRITE IN gcs resources
Reconfiguration complete
Fri Jun 17 10:00:06 2011
LCK0 started WITH pid=22, OS id=327758 
Fri Jun 17 10:00:09 2011
ORACLE_BASE NOT SET IN environment. It IS recommended
that ORACLE_BASE be SET IN the environment
Fri Jun 17 10:00:12 2011
SQL> ALTER DISKGROUP ALL MOUNT /* asm agent */ 
NOTE: Diskgroups listed IN ASM_DISKGROUPS are
 	 DATA
 	 FRA
NOTE: Diskgroup used FOR Voting files IS:
 	 CRS
Diskgroup WITH spfile:CRS
Diskgroup used FOR OCR IS:CRS
NOTE: cache registered GROUP CRS NUMBER=1 incarn=0xfdaebf4e
NOTE: cache began mount (NOT FIRST) OF GROUP CRS NUMBER=1 incarn=0xfdaebf4e
NOTE: cache registered GROUP DATA NUMBER=2 incarn=0xfe4ebf4f
NOTE: cache began mount (NOT FIRST) OF GROUP DATA NUMBER=2 incarn=0xfe4ebf4f
NOTE: cache registered GROUP FRA NUMBER=3 incarn=0xfe5ebf50
NOTE: cache began mount (NOT FIRST) OF GROUP FRA NUMBER=3 incarn=0xfe5ebf50
NOTE: Assigning NUMBER (1,2) TO disk (/dev/rhdisk10)
NOTE: Assigning NUMBER (1,0) TO disk (/dev/rhdisk8)
NOTE: Assigning NUMBER (1,1) TO disk (/dev/rhdisk9)
NOTE: Assigning NUMBER (2,1) TO disk (/dev/rhdisk12)
NOTE: Assigning NUMBER (2,0) TO disk (/dev/rhdisk6)
NOTE: Assigning NUMBER (3,0) TO disk (/dev/rhdisk7)
kfdp_query(CRS): 4 
Fri Jun 17 10:00:14 2011
kfdp_queryBg(): 4 
NOTE: cache opening disk 0 OF grp 1: CRS_0000 path:/dev/rhdisk8
NOTE: F1X0 found ON disk 0 au 2 fcn 0.0
NOTE: cache opening disk 1 OF grp 1: CRS_0001 path:/dev/rhdisk9
NOTE: F1X0 found ON disk 1 au 2 fcn 0.0
NOTE: cache opening disk 2 OF grp 1: CRS_0002 path:/dev/rhdisk10
NOTE: F1X0 found ON disk 2 au 2 fcn 0.0
NOTE: cache mounting (NOT FIRST) normal redundancy GROUP 1/0xFDAEBF4E (CRS)
Fri Jun 17 10:00:14 2011
kjbdomatt send TO inst 1
Fri Jun 17 10:00:14 2011
NOTE: attached TO recovery DOMAIN 1
Fri Jun 17 10:00:14 2011
NOTE: LGWR attempting TO mount thread 2 FOR diskgroup 1 (CRS)
NOTE: LGWR found thread 2 OPEN - signalling ORA-15133
WARNING: instance recovery required during mount
kfdp_query(DATA): 5 
kfdp_queryBg(): 5 
NOTE: cache opening disk 0 OF grp 2: DATA_0000 path:/dev/rhdisk6
NOTE: F1X0 found ON disk 0 au 2 fcn 0.61290
NOTE: cache opening disk 1 OF grp 2: DATA_0001 path:/dev/rhdisk12
NOTE: cache mounting (NOT FIRST) external redundancy GROUP 2/0xFE4EBF4F (DATA)
kjbdomatt send TO inst 1
NOTE: attached TO recovery DOMAIN 2
NOTE: LGWR attempting TO mount thread 2 FOR diskgroup 2 (DATA)
NOTE: LGWR found thread 2 closed at ABA 12.1279
NOTE: LGWR mounted thread 2 FOR diskgroup 2 (DATA)
NOTE: LGWR opening thread 2 at fcn 0.1600788 ABA 13.1280
NOTE: cache mounting GROUP 2/0xFE4EBF4F (DATA) succeeded
NOTE: cache ending mount (success) OF GROUP DATA NUMBER=2 incarn=0xfe4ebf4f
kfdp_query(FRA): 6 
kfdp_queryBg(): 6 
NOTE: cache opening disk 0 OF grp 3: FRA_0000 path:/dev/rhdisk7
NOTE: F1X0 found ON disk 0 au 2 fcn 0.0
NOTE: cache mounting (NOT FIRST) external redundancy GROUP 3/0xFE5EBF50 (FRA)
kjbdomatt send TO inst 1
NOTE: attached TO recovery DOMAIN 3
NOTE: LGWR attempting TO mount thread 2 FOR diskgroup 3 (FRA)
NOTE: LGWR found thread 2 closed at ABA 12.91
NOTE: LGWR mounted thread 2 FOR diskgroup 3 (FRA)
NOTE: LGWR opening thread 2 at fcn 0.2439 ABA 13.92
NOTE: cache mounting GROUP 3/0xFE5EBF50 (FRA) succeeded
NOTE: cache ending mount (success) OF GROUP FRA NUMBER=3 incarn=0xfe5ebf50
Fri Jun 17 10:00:17 2011
kfdp_query(DATA): 7 
kfdp_queryBg(): 7 
NOTE: Instance updated compatible.asm TO 11.2.0.0.0 FOR grp 2
SUCCESS: diskgroup DATA was mounted
kfdp_query(FRA): 8 
kfdp_queryBg(): 8 
NOTE: Instance updated compatible.asm TO 11.2.0.0.0 FOR grp 3
SUCCESS: diskgroup FRA was mounted
kfdp_query(CRS): 10 
kfdp_queryBg(): 10 
NOTE: cache opening disk 0 OF grp 1: CRS_0000 path:/dev/rhdisk8
NOTE: F1X0 found ON disk 0 au 2 fcn 0.0
NOTE: cache opening disk 1 OF grp 1: CRS_0001 path:/dev/rhdisk9
NOTE: F1X0 found ON disk 1 au 2 fcn 0.0
NOTE: cache opening disk 2 OF grp 1: CRS_0002 path:/dev/rhdisk10
NOTE: F1X0 found ON disk 2 au 2 fcn 0.0
NOTE: cache mounting (NOT FIRST) (retry) normal redundancy GROUP 1/0xFDAEBF4E (CRS)
NOTE: LGWR attempting TO mount thread 2 FOR diskgroup 1 (CRS)
NOTE: LGWR found thread 2 OPEN - signalling ORA-15133
WARNING: instance recovery required during mount
.
.
.
Fri Jun 17 10:06:55 2011
kfdp_query(CRS): 406 
kfdp_queryBg(): 406 
NOTE: cache opening disk 0 OF grp 1: CRS_0000 path:/dev/rhdisk8
NOTE: F1X0 found ON disk 0 au 2 fcn 0.0
NOTE: cache opening disk 1 OF grp 1: CRS_0001 path:/dev/rhdisk9
NOTE: F1X0 found ON disk 1 au 2 fcn 0.0
NOTE: cache opening disk 2 OF grp 1: CRS_0002 path:/dev/rhdisk10
NOTE: F1X0 found ON disk 2 au 2 fcn 0.0
NOTE: cache mounting (NOT FIRST) (retry) normal redundancy GROUP 1/0xFDAEBF4E (CRS)
NOTE: LGWR attempting TO mount thread 2 FOR diskgroup 1 (CRS)
NOTE: LGWR found thread 2 OPEN - signalling ORA-15133
WARNING: instance recovery required during mount
Errors IN file /u01/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_340052.trc  (incident=11385):
ORA-00600: internal error code, arguments: [kfgFinalize_2], [], [], [], [], [], [], [], [], [], [], []
Incident details IN: /u01/grid/diag/asm/+asm/+ASM2/incident/incdir_11385/+ASM2_ora_340052_i11385.trc
Fri Jun 17 10:07:01 2011
Trace dumping IS performing id=[cdmp_20110617100701]
ORA-00600: internal error code, arguments: [kfgFinalize_2], [], [], [], [], [], [], [], [], [], [], []
ERROR: ALTER DISKGROUP ALL MOUNT /* asm agent */
NOTE: cache dismounting (clean) GROUP 1/0xFDAEBF4E (CRS) 
NOTE: lgwr NOT being msg'd to dismount
Fri Jun 17 10:07:02 2011
kjbdomdet send to inst 1
detach from dom 1, sending detach message to inst 1
Fri Jun 17 10:07:02 2011
NOTE: detached from domain 1
NOTE: cache dismounted group 1/0xFDAEBF4E (CRS) 
NOTE: cache ending mount (fail) of group CRS number=1 incarn=0xfdaebf4e
kfdp_dismount(): 407 
Fri Jun 17 10:07:02 2011
kfdp_dismountBg(): 407 
NOTE: De-assigning number (1,0) from disk (/dev/rhdisk8)
NOTE: De-assigning number (1,1) from disk (/dev/rhdisk9)
NOTE: De-assigning number (1,2) from disk (/dev/rhdisk10)
ERROR: diskgroup CRS was not mounted
NOTE: cache deleting context for group CRS 1/-38879410
Fri Jun 17 10:07:02 2011
Shutting down instance (abort)
License high water mark = 2
USER (ospid: 307518): terminating the instance
Instance terminated by USER, pid = 307518
Fri Jun 17 10:07:03 2011
Instance shutdown complete

在实例异常崩溃后,尝试启动ASM实例时,ASM尝试进行实例回复,并导致了错误,Oracle在不断的进行多次尝试后,出现了这个ORA-600的错误,并最终导致了实例的异常关闭。
查询MOS,发现存在几个和当前类似的错误信息,错误函数都是kfgFinalize_2,不过绝大部分错误都明确在11.2.0.1中被FIXED。经排除和文档Bug 6771400 – ASM fails to mount a disk group with OERI[kfgfinalize_2] [ID 6771400.8]描述的最为解决,该错误确认影响的版本是10.2.0.3和10.2.0.4,在10.2.0.5和11.1.0.6中被FIXED,该bug并没有提及11.2版本,因此不确认当前问题是否在11.2.0.1版本出现,不过从其他表相上看,当前问题属于这个bug的可能性很大。
这个问题发生在ASM异常崩溃后,可能会导致ASM实例第一次重启出现异常,随后的手工启动并未发生异常,且该问题出现概率很低,可以忽略掉该错误。

Posted in BUG | Tagged , , , , , | Leave a comment