在10.2.0.4 RAC环境中,出现了更改SERVICE_NAME导致大量会话被KILL的现象。
告警日志中信息如下:
Wed Oct 24 20:06:16 2012 ALTER SYSTEM SET service_names='' SCOPE=MEMORY SID='orcl2'; Wed Oct 24 20:06:16 2012 ALTER SYSTEM SET service_names='orcl' SCOPE=MEMORY SID='orcl2'; Wed Oct 24 20:06:16 2012 Immediate KILL SESSION#: 1418, Serial#: 22066 Immediate KILL SESSION: sess: 0x18dc79b70 OS pid: 4879 Immediate KILL SESSION#: 1424, Serial#: 108 Immediate KILL SESSION: sess: 0x18dc81be0 OS pid: 15110 Immediate KILL SESSION#: 1425, Serial#: 22 Immediate KILL SESSION: sess: 0x18dc83148 OS pid: 15112 Immediate KILL SESSION#: 1426, Serial#: 9 Immediate KILL SESSION: sess: 0x18dc846b0 OS pid: 15157 Immediate KILL SESSION#: 1427, Serial#: 17 Immediate KILL SESSION: sess: 0x18dc85c18 OS pid: 15119 Immediate KILL SESSION#: 1429, Serial#: 24221 Immediate KILL SESSION: sess: 0x18dc886e8 OS pid: 1044 Immediate KILL SESSION#: 1430, Serial#: 9 Immediate KILL SESSION: sess: 0x18dc89c50 OS pid: 15126 . . . Immediate KILL SESSION#: 1605, Serial#: 60258 Immediate KILL SESSION: sess: 0x18dd73e68 OS pid: 11966 Immediate KILL SESSION#: 1606, Serial#: 18413 Immediate KILL SESSION: sess: 0x18dd753d0 OS pid: 11999 Immediate KILL SESSION#: 1607, Serial#: 18517 Immediate KILL SESSION: sess: 0x18dd76938 OS pid: 15378 Immediate KILL SESSION#: 1608, Serial#: 57825 Immediate KILL SESSION: sess: 0x18dd77ea0 OS pid: 1035 Wed Oct 24 20:06:27 2012 Immediate KILL SESSION#: 1616, Serial#: 30253 Immediate KILL SESSION: sess: 0x18dd829e0 OS pid: 11977 Immediate KILL SESSION#: 1626, Serial#: 34413 Immediate KILL SESSION: sess: 0x18dd8fff0 OS pid: 4863 |
显然大量的KILL SESSION和同一秒发生了ALTER SYSTEM SET SERVICE_NAME有直接关系,根据MOS文档Sessions Get Killed if Connection Use Default Service name (Same as db_name) [ID 730315.1],这是为公布的Bug 6955040 ALL THE SESSIONS LOST CONNECTION AFTER KILLING CRSD.BIN。
当CRSD进程被杀掉或自动崩溃,会导致CLUSTER检测不到VIP资源的运行,因此数据库会删除默认的服务名并断开所有使用默认服务名的连接。
Oracle在10.2.0.5和11.1.0.7中解决了这个问题,如果没有升级的计划,那么不要使用了DB_NAME相同的服务名进行连接。