数据库出现ORA-27300、ORA-27301和ORA-27302错误并最终出现ORA-29702错误,导致数据库实例的崩溃。
数据库版本为10.2.0.4 RAC for HP-UX,详细错误信息为:
Wed Jun 27 04:31:07 2012 Process startup failed, error stack: Wed Jun 27 04:31:07 2012 Errors IN file /u01/app/oracle/admin/orcl/bdump/orcl2_psp0_1943.trc: ORA-27300: OS system dependent operation:fork failed WITH STATUS: 11 ORA-27301: OS failure message: Resource temporarily unavailable ORA-27302: failure occurred at: skgpspawn3 Wed Jun 27 04:31:08 2012 Process J001 died, see its trace file Wed Jun 27 04:31:08 2012 kkjcre1p: unable TO spawn jobq slave process Wed Jun 27 04:31:08 2012 Errors IN file /u01/app/oracle/admin/orcl/bdump/orcl2_cjq0_1965.trc: Wed Jun 27 04:41:55 2012 Process startup failed, error stack: Wed Jun 27 04:41:55 2012 Errors IN file /u01/app/oracle/admin/orcl/bdump/orcl2_psp0_1943.trc: ORA-27300: OS system dependent operation:fork failed WITH STATUS: 11 ORA-27301: OS failure message: Resource temporarily unavailable ORA-27302: failure occurred at: skgpspawn3 Wed Jun 27 04:41:56 2012 Process q001 died, see its trace file Wed Jun 27 04:41:56 2012 ksvcreate: Process(q001) creation failed Wed Jun 27 04:53:16 2012 Process startup failed, error stack: . . . Errors IN file /u01/app/oracle/admin/orcl/bdump/orcl2_psp0_1943.trc: ORA-27300: OS system dependent operation:fork failed WITH STATUS: 11 ORA-27301: OS failure message: Resource temporarily unavailable ORA-27302: failure occurred at: skgpspawn3 Wed Jun 27 05:05:06 2012 Process q001 died, see its trace file Wed Jun 27 05:05:06 2012 ksvcreate: Process(q001) creation failed Wed Jun 27 05:13:04 2012 Error: KGXGN aborts the instance (6) Wed Jun 27 05:13:04 2012 Errors IN file /u01/app/oracle/admin/orcl/bdump/orcl2_lmon_1945.trc: ORA-29702: error occurred IN Cluster GROUP Service operation LMON: terminating instance due TO error 29702 |
从status为11以及fork进程报错判断,导致问题的原因是操作系统无法分配进程。
系统dmesg日志出现大量信息:
proc: TABLE IS FULL proc: TABLE IS FULL proc: TABLE IS FULL proc: TABLE IS FULL proc: TABLE IS FULL proc: TABLE IS FULL proc: TABLE IS FULL proc: TABLE IS FULL proc: TABLE IS FULL proc: TABLE IS FULL proc: TABLE IS FULL proc: TABLE IS FULL proc: TABLE IS FULL proc: TABLE IS FULL |
这也证明了导致问题的原因是当前进程数超过了内核参数的设置。
重新设置操作系统上的nproc和maxuprc的值,根据Oracle的安装文档nproc的值至少为4096,而maxuprc的值为nproc*9/10,如果当前进程数量超过设置的值,则根据实际需求重新调整两个值。