测试控制全局死锁的隐含参数_lm_dd_interval时,突然想到这个问题。
RAC全局死锁检测时间:https://yangtingkun.net/?p=955
Oracle的死锁判断是没有优先级的,也就是说,当两个或多个会话发生死锁的时候,无法指定牺牲哪个会话,而是由Oracle随机决定。
不过对于RAC环境而言,死锁的检查不在是内部的随机实现,Oracle通过隐含参数_lm_dd_interval来控制死锁的检测时间。更重要的是,对于RAC环境而言,Oracle允许不同实例设置不同的值。而不同实例的检测死锁间隔不同,就意味着优先级的出现。
如果实例1上设置该值为默认值60秒,而实例2设置为30秒,那么当发生死锁后,永远是实例2上先检测到死锁,也就是说,实例2上会话会被牺牲掉。
这是两个实例上设置该参数相同的情况,两个会话分别连接到两个实例,产生死锁。实例1上的会话1:
SQL> SELECT name FROM v$database; NAME --------- ORCL SQL> SELECT instance_number, instance_name FROM v$instance; INSTANCE_NUMBER INSTANCE_NAME --------------- ---------------- 1 orcl1 SQL> SET sqlp 'I1S1> ' I1S1> SHOW parameter _lm NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ _lm_dd_interval INTEGER 30 I1S1> SET timing ON I1S1> UPDATE t_deadlock SET name = 'a1' WHERE id = 1; 1 ROW updated. Elapsed: 00:00:00.07 |
在实例2上连接会话2:
SQL> SELECT name FROM v$database; NAME --------- ORCL SQL> SELECT instance_number, instance_name FROM v$instance; INSTANCE_NUMBER INSTANCE_NAME --------------- ---------------- 2 orcl2 SQL> SET sqlp 'I2S2> ' I2S2> SHOW parameter _lm NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ _lm_dd_interval INTEGER 30 I2S2> SET timing ON I2S2> UPDATE t_deadlock SET name = 'b2' WHERE id = 2; 1 ROW updated. Elapsed: 00:00:00.04 I2S2> UPDATE t_deadlock SET name = 'a2' WHERE id = 1; |
会话1上锁定记录2,产生死锁:
I1S1> UPDATE t_deadlock SET name = 'b1' WHERE id = 2; |
第一次是实例2上的会话2被牺牲报错:
UPDATE t_deadlock SET name = 'a2' WHERE id = 1 * ERROR at line 1: ORA-00060: deadlock detected while waiting FOR resource Elapsed: 00:00:32.15 I2S2> UPDATE t_deadlock SET name = 'a2' WHERE id = 1; |
可以看到,会话2等待30秒后报错,此时会话2执行同样的语句再次引发死锁:
UPDATE t_deadlock SET name = 'b1' WHERE id = 2 * ERROR at line 1: ORA-00060: deadlock detected while waiting FOR resource Elapsed: 00:01:00.39 I1S1> UPDATE t_deadlock SET name = 'b1' WHERE id = 2; |
这次变成实例1上的会话1被牺牲报错,可以看到,会话1经历了两次死锁检测,因此执行时间为1分钟。会话1再次引入死锁:
UPDATE t_deadlock SET name = 'a2' WHERE id = 1 * ERROR at line 1: ORA-00060: deadlock detected while waiting FOR resource Elapsed: 00:01:01.69 I2S2> |
被牺牲的又变成了会话2。
上面这个测试是在两个实例的_lm_dd_interval参数设置相同的情况下,下面修改实例2上的参数设置为5秒:
I2S2> ALTER system SET "_lm_dd_interval" = 5 scope = spfile sid = 'orcl2'; System altered. Elapsed: 00:00:00.09 I2S2> shutdown immediate DATABASE closed. DATABASE dismounted. ORACLE instance shut down. I2S2> startup ORACLE instance started. Total System Global Area 281018368 bytes Fixed SIZE 2095672 bytes Variable SIZE 121636296 bytes DATABASE Buffers 150994944 bytes Redo Buffers 6291456 bytes DATABASE mounted. DATABASE opened. I2S2> SHOW parameter _lm NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ _lm_dd_interval INTEGER 5 I2S2> UPDATE t_deadlock SET name = 'b2' WHERE id = 2; 1 ROW updated. Elapsed: 00:00:00.06 |
实例2参数生效后连接会话更新该表,实例1上的会话1取消之前的修改,重新进行更新:
1 ROW updated. Elapsed: 00:10:08.98 I1S1> ROLLBACK; ROLLBACK complete. Elapsed: 00:00:00.00 I1S1> UPDATE t_deadlock SET name = 'a1' WHERE id = 1; 1 ROW updated. Elapsed: 00:00:00.01 I1S1> UPDATE t_deadlock SET name = 'b1' WHERE id = 2; |
下面在实例2上的会话2,引入死锁:
I2S2> UPDATE t_deadlock SET name = 'a2' WHERE id = 1; UPDATE t_deadlock SET name = 'a2' WHERE id = 1 * ERROR at line 1: ORA-00060: deadlock detected while waiting FOR resource Elapsed: 00:00:06.07 I2S2> UPDATE t_deadlock SET name = 'a2' WHERE id = 1; UPDATE t_deadlock SET name = 'a2' WHERE id = 1 * ERROR at line 1: ORA-00060: deadlock detected while waiting FOR resource Elapsed: 00:00:05.95 I2S2> UPDATE t_deadlock SET name = 'a2' WHERE id = 1; UPDATE t_deadlock SET name = 'a2' WHERE id = 1 * ERROR at line 1: ORA-00060: deadlock detected while waiting FOR resource Elapsed: 00:00:06.63 I2S2> UPDATE t_deadlock SET name = 'a2' WHERE id = 1; UPDATE t_deadlock SET name = 'a2' WHERE id = 1 * ERROR at line 1: ORA-00060: deadlock detected while waiting FOR resource Elapsed: 00:00:05.89 |
显然由于不同实例的_lm_dd_interval参数的值设置不同,现在每次死锁都会在设置值更小的实例2上被检测,实例2上的会话每次都会被死锁牺牲掉。尝试设置不同的参数值在不同实例上设置死锁检测优先级获得成功。