客户11.2.0.2 RAC for Solaris 10 sparc单实例出现ORA-7445和ORA-4030操作,导致实例崩溃。
这个错误比较严重:
2012-05-04 02:02:26.403000 +08:00 Archived Log entry 949 added FOR thread 1 SEQUENCE 518 ID 0x70a64e83 dest 1: Archived Log entry 950 added FOR thread 1 SEQUENCE 519 ID 0x70a64e83 dest 1: Thread 1 advanced TO log SEQUENCE 521 (after internal thread enable) Thread 2 opened at log SEQUENCE 441 CURRENT log# 4 seq# 441 mem# 0: /orcldata1/orcl/redo04.log CURRENT log# 4 seq# 441 mem# 1: /orcldata2/orcl/redo04.log Successful OPEN OF redo thread 2 MTTR advisory IS disabled because FAST_START_MTTR_TARGET IS NOT SET SMON: enabling cache recovery ARC3: Archival started ARC0: STARTING ARCH PROCESSES COMPLETE 2012-05-04 02:02:27.570000 +08:00 [374] Successfully onlined Undo Tablespace 5. Undo initialization finished serial:0 START:903149023 END:903149480 diff:457 (4 seconds) Verifying file header compatibility FOR 11g tablespace encryption.. Verifying 11g file header compatibility FOR tablespace encryption completed Redo thread 1 internally disabled at seq 521 (CKPT) SMON: enabling tx recovery DATABASE Characterset IS ZHS16GBK Archived Log entry 951 added FOR thread 1 SEQUENCE 520 ID 0x70a64e83 dest 1: ARC3: Archiving disabled thread 1 SEQUENCE 521 Archived Log entry 952 added FOR thread 1 SEQUENCE 521 ID 0x70a64e83 dest 1: No Resource Manager plan active minact-scn: Inst 2 IS now the master inc#:2 mmon proc-id:326 STATUS:0x7 minact-scn STATUS: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000 2012-05-04 02:02:29.806000 +08:00 Starting background process GTX0 GTX0 started WITH pid=72, OS id=660 Starting background process RCBG RCBG started WITH pid=73, OS id=662 replication_dependency_tracking turned off (no async multimaster replication found) Thread 2 advanced TO log SEQUENCE 442 (LGWR switch) CURRENT log# 3 seq# 442 mem# 0: /orcldata1/orcl/redo03.log CURRENT log# 3 seq# 442 mem# 1: /orcldata2/orcl/redo03.log Archived Log entry 953 added FOR thread 2 SEQUENCE 441 ID 0x70a64e83 dest 1: 2012-05-04 02:02:31.733000 +08:00 Starting background process QMNC 2012-05-04 02:02:37.268000 +08:00 Exception [TYPE: SIGSEGV, Address NOT mapped TO object] [ADDR:0xFFFFFFFF7FFF5EF8] [PC:0x108068724, dbgrlWriteAlertDetail_int()+132] [flags: 0x0, COUNT: 1] Exception [TYPE: SIGSEGV, Address NOT mapped TO object] [ADDR:0xFFFFFFFF7FFF5FF4] [PC:0xFFFFFFFF7BD00F34, _memset()+52] [flags: 0x0, COUNT: 1] 2012-05-04 02:02:38.594000 +08:00 Errors IN file /opt/oracle/app/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_m000_666.trc (incident=156594): ORA-07445: exception encountered: core dump [dbgrlWriteAlertDetail_int()+132] [SIGSEGV] [ADDR:0xFFFFFFFF7FFF5EF8] [PC:0x108068724] [Address NOT mapped TO obj ect] [] ORA-04030: OUT OF process memory WHEN trying TO allocate 67108896 bytes (qesmmCheckPgaL,qesmmCheckPgaLimit:mem) Incident details IN: /opt/oracle/app/oracle/diag/rdbms/orcl/orcl2/incident/incdir_156594/orcl2_m000_666_i156594.trc USE ADRCI OR Support Workbench TO package the incident. See Note 411.1 at My Oracle Support FOR error AND packaging details. Errors IN file /opt/oracle/app/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_psp0_266.trc (incident=156026): ORA-07445: exception encountered: core dump [_memset()+52] [SIGSEGV] [ADDR:0xFFFFFFFF7FFF5FF4] [PC:0xFFFFFFFF7BD00F34] [Address NOT mapped TO object] [] Incident details IN: /opt/oracle/app/oracle/diag/rdbms/orcl/orcl2/incident/incdir_156026/orcl2_psp0_266_i156026.trc USE ADRCI OR Support Workbench TO package the incident. See Note 411.1 at My Oracle Support FOR error AND packaging details. 2012-05-04 02:02:49.732000 +08:00 Exception [TYPE: SIGSEGV, Address NOT mapped TO object] [ADDR:0xFFFFFFFF7FFF2000] [PC:0x107D04724, dbgemdGetCallStackWFlag()+100] [flags: 0x0, COUNT: 1] Errors IN file /opt/oracle/app/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_dia0_282.trc: ORA-07445: exception encountered: core dump [dbgemdGetCallStackWFlag()+100] [SIGSEGV] [ADDR:0xFFFFFFFF7FFF2000] [PC:0x107D04724] [Address NOT mapped TO objec t] [] ORA-04030: OUT OF process memory WHEN trying TO allocate 816 bytes (ksdhngmemctx_h,ksdhng:enod) 2012-05-04 02:02:55.200000 +08:00 Exception [TYPE: SIGSEGV, Address NOT mapped TO object] [ADDR:0xFFFFFFFF7FFF2000] [PC:0x107D04724, dbgemdGetCallStackWFlag()+100] [flags: 0x0, COUNT: 1] Errors IN file /opt/oracle/app/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_diag_274.trc: ORA-07445: exception encountered: core dump [dbgemdGetCallStackWFlag()+100] [SIGSEGV] [ADDR:0xFFFFFFFF7FFF2000] [PC:0x107D04724] [Address NOT mapped TO objec t] [] ORA-04030: OUT OF process memory WHEN trying TO allocate 32128 bytes (pga heap,grpsvc msg) 2012-05-04 02:02:58.724000 +08:00 USE ADRCI OR Support Workbench TO package the incident. See Note 411.1 at My Oracle Support FOR error AND packaging details. USE ADRCI OR Support Workbench TO package the incident. See Note 411.1 at My Oracle Support FOR error AND packaging details. 2012-05-04 02:04:01.919000 +08:00 PMON (ospid: 264): terminating the instance due TO error 490 2012-05-04 02:04:05.129000 +08:00 Instance TERMINATED BY PMON, pid = 264 |
数据库的启动操作还没完成,就碰到ORA-7445 [dbgrlWriteAlertDetail_int]错误,随后是ORA-04030错误,接着是ORA-07445[_memset]错误,最后是ORA-7445[dbgemdGetCallStackWFlag]错误。而这一系列的错误出现,最终导致了PMON结束了数据库实例。
从错误信息上看,和内存分配有关,但是数据库刚启动,怎么会连67M的内存都无法分配呢,查询MOS发现,原来是SWAP空间耗尽导致的,详细描述可以参考Instance crash ORA-7445 [_memset()+120] and ORA-4030 (QERHJ hash-joi,kllcqas:kllsltba) [ID 1071033.1]
检查系统的日志信息message:
May 4 02:02:14 orcl2 Had[5187]: [ID 702911 daemon.notice] VCS CRITICAL V-16-1-50086 Swap usage ON orcl2 IS 97% May 4 02:02:33 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 666 (oracle) May 4 02:02:33 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 674 (oracle) May 4 02:02:33 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 668 (oracle) May 4 02:02:34 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 266 (oracle) May 4 02:02:35 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 666 (oracle) May 4 02:02:35 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 668 (oracle) May 4 02:02:35 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 674 (oracle) May 4 02:02:35 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 266 (oracle) May 4 02:02:36 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 666 (oracle) May 4 02:02:36 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 674 (oracle) May 4 02:02:36 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 668 (oracle) May 4 02:02:36 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 266 (oracle) May 4 02:02:37 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 666 (oracle) May 4 02:02:37 orcl2 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap SPACE TO grow stack FOR pid 668 (oracle) |
果然找到大量的SWAP空间不足的告警,对于Solaris系统而言,清理/tmp空间,然后重启数据库,问题不再出现。