11g告警日志中大量ORA-12170错误

在11g数据库的告警日志中,经常可以看到ORA-12170错误信息。
如果这个错误是偶然出现,问题可能是网络问题所致,但是如果这个错误短时间内频繁出现,那么问题就不一定是网络不畅那么简单了。
客户的数据库出现了应用无法连接的错误,而应用程序日志出现下面的错误信息:

ORA-12519, TNS:no appropriate service handler found

在告警日志中比较频繁的出现了下面的错误:

Fatal NI CONNECT error 12170.
Sat Sep 03 12:43:43 2011
VERSION INFORMATION:
TNS FOR Linux: Version 11.2.0.2.0 - Production
Oracle Bequeath NT Protocol Adapter FOR Linux: Version 11.2.0.2.0 - Production
TCP/IP NT Protocol Adapter FOR Linux: Version 11.2.0.2.0 - Production
TIME: 03-SEP-2011 12:43:43
TIME: 03-SEP-2011 12:43:43
Tns error struct:
Tns error struct:
Tracing NOT turned ON.
Tracing NOT turned ON.
ns main err code: 12535
ns main err code: 12535
ns main err code: 12535
TIME: 03-SEP-2011 12:43:43
Tns error struct:
Tns error struct:
ns main err code: 12535
Tracing NOT turned ON.
ns main err code: 12535
TNS-12535: TNS:operation timed OUT
TNS-12535: TNS:operation timed OUT
 
Tns error struct:
TNS-12535: TNS:operation timed OUT
ns secondary err code: 12606
ns secondary err code: 12606
TNS-12535: TNS:operation timed OUT
TNS-12535: TNS:operation timed OUT
nt main err code: 0
nt main err code: 0
ns secondary err code: 12606
ns secondary err code: 12606
ns main err code: 12535
ns secondary err code: 12606
nt secondary err code: 0
nt secondary err code: 0
nt main err code: 0
nt main err code: 0
nt main err code: 0
nt OS err code: 0
nt OS err code: 0
nt secondary err code: 0
nt secondary err code: 0
Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.90.102)(PORT=33196))
Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.90.102)(PORT=33191))
nt OS err code: 0
nt secondary err code: 0
nt OS err code: 0
Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.90.102)(PORT=33201))
TNS-12535: TNS:operation timed OUT
Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.90.102)(PORT=33195))
nt OS err code: 0
ns secondary err code: 12606
Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.90.102)(PORT=33174))
nt main err code: 0
WARNING: inbound connection timed OUT (ORA-3136)
Sat Sep 03 12:48:28 2011
WARNING: inbound connection timed OUT (ORA-3136)
Sat Sep 03 12:48:28 2011
WARNING: inbound connection timed OUT (ORA-3136)
Sat Sep 03 12:48:28 2011
WARNING: inbound connection timed OUT (ORA-3136)
Sat Sep 03 12:48:30 2011
Active SESSION History (ASH) performed an emergency FLUSH. This may mean that ASH IS undersized. IF emergency flushes are a recurring issue, you may consider increasing ASH SIZE BY setting the VALUE OF _ASH_SIZE TO a sufficiently LARGE VALUE. Currently, ASH SIZE IS 33554432 bytes. BOTH ASH SIZE AND the total NUMBER OF emergency flushes since instance startup can be monitored BY running the following query:
SELECT total_size,awr_flush_emergency_count FROM v$ash_info;
Sat Sep 03 12:48:32 2011
Sweep [inc][48817]: completed
Sweep [inc2][48817]: completed

检查了监听日志,在12:43的时刻出现了大量的ORA-12518错误,这个错误信息是:TNS:listener could not hand off client connection,随后不到1秒的时间,大量的连接错误导致了ORA-12519 TNS:no appropriate service handler found错误,这说明数据库的对连接的响应能力已经跟不上了。
而导致这两个问题出现的原因是大量的会话在短时间内连接到数据库,根据监听日志,仅12:42分这一分钟,连接数据库的会话就建立了超过2000个连接。而正常情况下,这个数据库一天的总连接数量也不过13000个左右。
Oracle在metalink文档ID 12535.1中,描述了11g告警日志中出现ORA-12170以及ORA-12535错误的原因,由于大量的客户端连接到服务器,导致数据库无法在短时间内处理连接风暴,从而引发了连接超时的错误。
最终发现,可能的中间件的连接重试策略配置存在一定的问题,导致当通信或其他问题引发连接中断后,会短时间内产生大量的连接重试,并不断的增加连接数量,最终引发了签的ORA-12519错误。

This entry was posted in ORACLE and tagged , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *