ORA-7445(dbgrlWriteAlertDetail_int)和ORA-4030导致实例崩溃

客户11.2.0.2 RAC for Solaris 10 sparc单实例出现ORA-7445和ORA-4030操作,导致实例崩溃。
这个错误比较严重:

2012-05-04 02:02:26.403000 +08:00
Archived Log entry 949 added FOR thread 1 SEQUENCE 518 ID 0x70a64e83 dest 1:
Archived Log entry 950 added FOR thread 1 SEQUENCE 519 ID 0x70a64e83 dest 1:
Thread 1 advanced TO log SEQUENCE 521 (after internal thread enable)
Thread 2 opened at log SEQUENCE 441
CURRENT log# 4 seq# 441 mem# 0: /orcldata1/orcl/redo04.log
CURRENT log# 4 seq# 441 mem# 1: /orcldata2/orcl/redo04.log
Successful OPEN OF redo thread 2
MTTR advisory IS disabled because FAST_START_MTTR_TARGET IS NOT SET
SMON: enabling cache recovery
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
2012-05-04 02:02:27.570000 +08:00
[374] Successfully onlined Undo Tablespace 5.
Undo initialization finished serial:0 START:903149023 END:903149480 diff:457 (4 seconds)
Verifying file header compatibility FOR 11g tablespace encryption..
Verifying 11g file header compatibility FOR tablespace encryption completed
Redo thread 1 internally disabled at seq 521 (CKPT)
SMON: enabling tx recovery
DATABASE Characterset IS ZHS16GBK
Archived Log entry 951 added FOR thread 1 SEQUENCE 520 ID 0x70a64e83 dest 1:
ARC3: Archiving disabled thread 1 SEQUENCE 521
Archived Log entry 952 added FOR thread 1 SEQUENCE 521 ID 0x70a64e83 dest 1:
No Resource Manager plan active
minact-scn: Inst 2 IS now the master inc#:2 mmon proc-id:326 STATUS:0x7
minact-scn STATUS: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000
2012-05-04 02:02:29.806000 +08:00
Starting background process GTX0
GTX0 started WITH pid=72, OS id=660
Starting background process RCBG
RCBG started WITH pid=73, OS id=662
replication_dependency_tracking turned off (no async multimaster replication found)
Thread 2 advanced TO log SEQUENCE 442 (LGWR switch)
CURRENT log# 3 seq# 442 mem# 0: /orcldata1/orcl/redo03.log
CURRENT log# 3 seq# 442 mem# 1: /orcldata2/orcl/redo03.log
Archived Log entry 953 added FOR thread 2 SEQUENCE 441 ID 0x70a64e83 dest 1:
2012-05-04 02:02:31.733000 +08:00
Starting background process QMNC
2012-05-04 02:02:37.268000 +08:00
Exception [TYPE: SIGSEGV, Address NOT mapped TO object] [ADDR:0xFFFFFFFF7FFF5EF8] [PC:0x108068724, dbgrlWriteAlertDetail_int()+132] [flags: 0x0, COUNT: 1]
Exception [TYPE: SIGSEGV, Address NOT mapped TO object] [ADDR:0xFFFFFFFF7FFF5FF4] [PC:0xFFFFFFFF7BD00F34, _memset()+52] [flags: 0x0, COUNT: 1]
2012-05-04 02:02:38.594000 +08:00
Errors IN file /opt/oracle/app/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_m000_666.trc (incident=156594):
ORA-07445: exception encountered: core dump [dbgrlWriteAlertDetail_int()+132] [SIGSEGV] [ADDR:0xFFFFFFFF7FFF5EF8] [PC:0x108068724] [Address NOT mapped TO obj
ect] []
ORA-04030: OUT OF process memory WHEN trying TO allocate 67108896 bytes (qesmmCheckPgaL,qesmmCheckPgaLimit:mem)
Incident details IN: /opt/oracle/app/oracle/diag/rdbms/orcl/orcl2/incident/incdir_156594/orcl2_m000_666_i156594.trc
USE ADRCI OR Support Workbench TO package the incident.
See Note 411.1 at My Oracle Support FOR error AND packaging details.
Errors IN file /opt/oracle/app/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_psp0_266.trc (incident=156026):
ORA-07445: exception encountered: core dump [_memset()+52] [SIGSEGV] [ADDR:0xFFFFFFFF7FFF5FF4] [PC:0xFFFFFFFF7BD00F34] [Address NOT mapped TO object] []
Incident details IN: /opt/oracle/app/oracle/diag/rdbms/orcl/orcl2/incident/incdir_156026/orcl2_psp0_266_i156026.trc
USE ADRCI OR Support Workbench TO package the incident.
See Note 411.1 at My Oracle Support FOR error AND packaging details.
2012-05-04 02:02:49.732000 +08:00
Exception [TYPE: SIGSEGV, Address NOT mapped TO object] [ADDR:0xFFFFFFFF7FFF2000] [PC:0x107D04724, dbgemdGetCallStackWFlag()+100] [flags: 0x0, COUNT: 1]
Errors IN file /opt/oracle/app/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_dia0_282.trc:
ORA-07445: exception encountered: core dump [dbgemdGetCallStackWFlag()+100] [SIGSEGV] [ADDR:0xFFFFFFFF7FFF2000] [PC:0x107D04724] [Address NOT mapped TO objec
t] []
ORA-04030: OUT OF process memory WHEN trying TO allocate 816 bytes (ksdhngmemctx_h,ksdhng:enod)
2012-05-04 02:02:55.200000 +08:00
Exception [TYPE: SIGSEGV, Address NOT mapped TO object] [ADDR:0xFFFFFFFF7FFF2000] [PC:0x107D04724, dbgemdGetCallStackWFlag()+100] [flags: 0x0, COUNT: 1]
Errors IN file /opt/oracle/app/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_diag_274.trc:
ORA-07445: exception encountered: core dump [dbgemdGetCallStackWFlag()+100] [SIGSEGV] [ADDR:0xFFFFFFFF7FFF2000] [PC:0x107D04724] [Address NOT mapped TO objec
t] []
ORA-04030: OUT OF process memory WHEN trying TO allocate 32128 bytes (pga heap,grpsvc msg)
2012-05-04 02:02:58.724000 +08:00
USE ADRCI OR Support Workbench TO package the incident.
See Note 411.1 at My Oracle Support FOR error AND packaging details.
USE ADRCI OR Support Workbench TO package the incident.
See Note 411.1 at My Oracle Support FOR error AND packaging details.
2012-05-04 02:04:01.919000 +08:00
PMON (ospid: 264): terminating the instance due TO error 490
2012-05-04 02:04:05.129000 +08:00
Instance TERMINATED BY PMON, pid = 264

数据库的启动操作还没完成,就碰到ORA-7445 [dbgrlWriteAlertDetail_int]错误,随后是ORA-04030错误,接着是ORA-07445[_memset]错误,最后是ORA-7445[dbgemdGetCallStackWFlag]错误。而这一系列的错误出现,最终导致了PMON结束了数据库实例。
从错误信息上看,和内存分配有关,但是数据库刚启动,怎么会连67M的内存都无法分配呢,查询MOS发现,原来是SWAP空间耗尽导致的,详细描述可以参考Instance crash ORA-7445 [_memset()+120] and ORA-4030 (QERHJ hash-joi,kllcqas:kllsltba) [ID 1071033.1]
检查系统的日志信息message:

May 4 02:02:14 orcl2 Had[5187]: [ID 702911 daemon.notice] VCS CRITICAL V-16-1-50086 Swap usage ON orcl2 IS 97%
May 4 02:02:33 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 666 (oracle)
May 4 02:02:33 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 674 (oracle)
May 4 02:02:33 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 668 (oracle)
May 4 02:02:34 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 266 (oracle)
May 4 02:02:35 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 666 (oracle)
May 4 02:02:35 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 668 (oracle)
May 4 02:02:35 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 674 (oracle)
May 4 02:02:35 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 266 (oracle)
May 4 02:02:36 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 666 (oracle)
May 4 02:02:36 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 674 (oracle)
May 4 02:02:36 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 668 (oracle)
May 4 02:02:36 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 266 (oracle)
May 4 02:02:37 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 666 (oracle)
May 4 02:02:37 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 668 (oracle)

果然找到大量的SWAP空间不足的告警,对于Solaris系统而言,清理/tmp空间,然后重启数据库,问题不再出现。

This entry was posted in BUG and tagged , , , , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *