客户的数据库出现VIP地址无法访问的情况。
这一篇描述问题的诊断。
客户的RAC环境重建后,发现两个节点的VIP都无法访问,开始认为是网络问题,但是随后发现整个RAC重启后,其中一个节点的VIP可以访问,而另一个节点的VIP仍然无法访问,因此判断可能是RAC本身的问题。
听到问题描述后,第一个判断是否VIP没有启动,登录数据库服务器后进行了检查:
oracle@racdb1 $ lsnrctl STATUS LSNRCTL FOR Solaris: Version 10.2.0.4.0 - Production ON 18-APR-2012 10:22:02 Copyright (c) 1991, 2007, Oracle. ALL rights reserved. Connecting TO (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521)) STATUS OF the LISTENER ------------------------ Alias LISTENER_RACDB1 Version TNSLSNR FOR Solaris: Version 10.2.0.4.0 - Production START DATE 25-MAR-2012 17:16:47 Uptime 23 days 17 hr. 5 MIN. 15 sec Trace Level off Security ON: LOCAL OS Authentication SNMP OFF Listener Parameter File /u01/oracle/product/10.2/DATABASE/network/admin/listener.ora Listener Log File /u01/oracle/product/10.2/DATABASE/network/log/listener_racdb1.log Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.8.60.201)(PORT=1521))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.8.60.1)(PORT=1521))) Services Summary... Service "+ASM" has 1 instance(s). Instance "+ASM1", STATUS BLOCKED, has 1 handler(s) FOR this service... Service "+ASM_XPT" has 1 instance(s). Instance "+ASM1", STATUS BLOCKED, has 1 handler(s) FOR this service... Service "PLSExtProc" has 1 instance(s). Instance "PLSExtProc", STATUS UNKNOWN, has 1 handler(s) FOR this service... Service "racdb" has 2 instance(s). Instance "racdb1", STATUS READY, has 2 handler(s) FOR this service... Instance "racdb2", STATUS READY, has 1 handler(s) FOR this service... Service "racdbXDB" has 2 instance(s). Instance "racdb1", STATUS READY, has 1 handler(s) FOR this service... Instance "racdb2", STATUS READY, has 1 handler(s) FOR this service... Service "racdb_XPT" has 2 instance(s). Instance "racdb1", STATUS READY, has 2 handler(s) FOR this service... Instance "racdb2", STATUS READY, has 1 handler(s) FOR this service... The command completed successfully |
节点2:
oracle@racdb2 $ lsnrctl STATUS LSNRCTL FOR Solaris: Version 10.2.0.4.0 - Production ON 18-APR-2012 10:19:00 Copyright (c) 1991, 2007, Oracle. ALL rights reserved. Connecting TO (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521)) STATUS OF the LISTENER ------------------------ Alias LISTENER_RACDB2 Version TNSLSNR FOR Solaris: Version 10.2.0.4.0 - Production START DATE 25-MAR-2012 17:14:28 Uptime 23 days 17 hr. 4 MIN. 31 sec Trace Level off Security ON: LOCAL OS Authentication SNMP OFF Listener Parameter File /u01/oracle/product/10.2/DATABASE/network/admin/listener.ora Listener Log File /u01/oracle/product/10.2/DATABASE/network/log/listener_racdb2.log Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.8.60.202)(PORT=1521))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.8.60.3)(PORT=1521))) Services Summary... Service "+ASM" has 1 instance(s). Instance "+ASM2", STATUS BLOCKED, has 1 handler(s) FOR this service... Service "+ASM_XPT" has 1 instance(s). Instance "+ASM2", STATUS BLOCKED, has 1 handler(s) FOR this service... Service "PLSExtProc" has 1 instance(s). Instance "PLSExtProc", STATUS UNKNOWN, has 1 handler(s) FOR this service... Service "racdb" has 2 instance(s). Instance "racdb1", STATUS READY, has 1 handler(s) FOR this service... Instance "racdb2", STATUS READY, has 2 handler(s) FOR this service... Service "racdbXDB" has 2 instance(s). Instance "racdb1", STATUS READY, has 1 handler(s) FOR this service... Instance "racdb2", STATUS READY, has 1 handler(s) FOR this service... Service "racdb_XPT" has 2 instance(s). Instance "racdb1", STATUS READY, has 1 handler(s) FOR this service... Instance "racdb2", STATUS READY, has 2 handler(s) FOR this service... The command completed successfully |
两个节点的监听都是正常的:
oracle@racdb1 $ sqlplus / AS sysdba SQL*Plus: Release 10.2.0.4.0 - Production ON Wed Apr 18 10:20:06 2012 Copyright (c) 1982, 2007, Oracle. ALL Rights Reserved. Connected TO: Oracle DATABASE 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production WITH the Partitioning, REAL Application Clusters, OLAP, DATA Mining AND REAL Application Testing options SQL> SHOW parameter service_name NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ service_names string racdb SQL> SHOW parameter listener NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ local_listener string remote_listener string LISTENERS_RACDB oracle@racdb1 $ cd $ORACLE_HOME/network/admin oracle@racdb1 $ more listener.ora # listener.ora.racdb1 Network Configuration File: /u01/oracle/product/10.2/DATABASE/network/admin/listener.ora.racdb1 # Generated BY Oracle configuration tools. LISTENER_RACDB1 = (DESCRIPTION_LIST = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = racdb1-vip)(PORT = 1521)(IP = FIRST)) (ADDRESS = (PROTOCOL = TCP)(HOST = 10.8.60.1)(PORT = 1521)(IP = FIRST)) ) ) SID_LIST_LISTENER_RACDB1 = (SID_LIST = (SID_DESC = (SID_NAME = PLSExtProc) (ORACLE_HOME = /u01/oracle/product/10.2/DATABASE) (PROGRAM = extproc) ) ) oracle@racdb1 $ more tnsnames.ora # tnsnames.ora Network Configuration File: /u01/oracle/product/10.2/DATABASE/network/admin/tnsnames.ora # Generated BY Oracle configuration tools. LISTENERS_RACDB = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = racdb1-vip)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = racdb2-vip)(PORT = 1521)) ) RACDB2 = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = racdb2-vip)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = racdb) (INSTANCE_NAME = racdb2) ) ) RACDB1 = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = racdb1-vip)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = racdb) (INSTANCE_NAME = racdb1) ) ) RACDB = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = racdb1-vip)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = racdb2-vip)(PORT = 1521)) (LOAD_BALANCE = yes) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = racdb) ) ) EXTPROC_CONNECTION_DATA = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC0)) ) (CONNECT_DATA = (SID = PLSExtProc) (PRESENTATION = RO) ) ) oracle@racdb1 $ tnsping racdb TNS Ping Utility FOR Solaris: Version 10.2.0.4.0 - Production ON 18-APR-2012 10:24:08 Copyright (c) 1997, 2007, Oracle. ALL rights reserved. Used parameter files: Used TNSNAMES adapter TO resolve the alias Attempting TO contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = racdb1-vip)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = racdb2-vip)(PORT = 1521)) (LOAD_BALANCE = yes) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = racdb))) OK (0 msec) oracle@racdb1 $ tnsping racdb1 TNS Ping Utility FOR Solaris: Version 10.2.0.4.0 - Production ON 18-APR-2012 10:24:11 Copyright (c) 1997, 2007, Oracle. ALL rights reserved. Used parameter files: Used TNSNAMES adapter TO resolve the alias Attempting TO contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = racdb1-vip)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = racdb) (INSTANCE_NAME = racdb1))) OK (10 msec) oracle@racdb1 $ tnsping racdb2 TNS Ping Utility FOR Solaris: Version 10.2.0.4.0 - Production ON 18-APR-2012 10:24:12 Copyright (c) 1997, 2007, Oracle. ALL rights reserved. Used parameter files: Used TNSNAMES adapter TO resolve the alias Attempting TO contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = racdb2-vip)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = racdb) (INSTANCE_NAME = racdb2))) OK (0 msec)<pre> 数据库中设置了REMOTE_LISTENERS参数,而且TNSNAMES.ORA中的REMOTE_LISTENERS的配置也没有异常。通过tnsping命令检查配置同样没有发现异常。 <pre lang='SQL'>oracle@racdb2 $ sqlplus /nolog SQL*Plus: Release 10.2.0.4.0 - Production ON Wed Apr 18 10:19:18 2012 Copyright (c) 1982, 2007, Oracle. ALL Rights Reserved. SQL> conn a/a@10.8.60.3/racdb ERROR: ORA-01017: invalid username/password; logon denied SQL> conn a/a@10.8.60.1/racdb ERROR: ORA-01017: invalid username/password; logon denied SQL> conn a/a@10.8.60.202/racdb ERROR: ORA-01017: invalid username/password; logon denied SQL> conn a/a@10.8.60.201/racdb ERROR: ORA-01017: invalid username/password; logon denied SQL> exit |
节点2上通过PUBLIC IP和VIP分别连接两个实例,发现服务名配置没有问题,通过VIP也可以访问数据库,并没有出现VIP不同的情况。
oracle@racdb1 $ more /etc/hosts # # Internet host TABLE # ::1 localhost 127.0.0.1 localhost 10.8.60.1 racdb1 racdb1.com loghost 10.8.60.201 racdb1-vip 192.168.60.201 racdb1-priv 10.8.60.3 racdb2 10.8.60.202 racdb2-vip 192.168.60.202 racdb2-priv #BackupServer 10.8.48.26 gz-bak oracle@racdb2 $ more /etc/hosts # # Internet host TABLE # ::1 localhost 127.0.0.1 localhost 10.8.60.1 racdb1 10.8.60.201 racdb1-vip 192.168.60.201 racdb1-priv 10.8.60.3 racdb2 racdb2.com loghost 10.8.60.202 racdb2-vip 192.168.60.202 racdb2-priv #BackupServer 10.8.48.26 gz-bak |
分别在两个节点上检查hosts文件的配置,结果同样正常。
root@racdb1 # ./crs_stat -t Name TYPE Target State Host ------------------------------------------------------------ ora....b1.inst application ONLINE ONLINE racdb1 ora....b2.inst application ONLINE ONLINE racdb2 ora.racdb.db application ONLINE ONLINE racdb2 ora....SM1.asm application ONLINE ONLINE racdb1 ora....B1.lsnr application ONLINE ONLINE racdb1 ora.racdb1.gsd application ONLINE ONLINE racdb1 ora.racdb1.ons application ONLINE ONLINE racdb1 ora.racdb1.vip application ONLINE ONLINE racdb1 ora....SM2.asm application ONLINE ONLINE racdb2 ora....B2.lsnr application ONLINE ONLINE racdb2 ora.racdb2.gsd application ONLINE ONLINE racdb2 ora.racdb2.ons application ONLINE ONLINE racdb2 ora.racdb2.vip application ONLINE ONLINE racdb2 |
RAC的相同服务都正常启动。
root@racdb1 # ./oifcfg getif aggr1 192.168.60.0 global cluster_interconnect aggr2 10.8.60.0 global public root@racdb2 # ./oifcfg getif aggr1 192.168.60.0 global cluster_interconnect aggr2 10.8.60.0 global public |
使用oifcfg检查网卡配置也未发现异常。
root@racdb1 # ifconfig -a lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 INDEX 1 inet 127.0.0.1 netmask ff000000 aggr1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 INDEX 2 inet 192.168.60.201 netmask ffffff00 broadcast 192.168.60.255 ether 0:21:28:1a:89:43 aggr1:1: flags=1040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4> mtu 1500 INDEX 2 inet 10.8.60.201 netmask ffffff00 broadcast 10.8.60.255 aggr2: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 INDEX 3 inet 10.8.60.1 netmask ffffff00 broadcast 10.8.60.255 ether 0:21:28:1a:89:42 root@racdb2 # ifconfig -a lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 INDEX 1 inet 127.0.0.1 netmask ff000000 aggr1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 INDEX 2 inet 192.168.60.202 netmask ffffff00 broadcast 192.168.60.255 ether 0:21:28:1a:89:6b aggr2: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 INDEX 3 inet 10.8.60.3 netmask ffffff00 broadcast 10.8.60.255 ether 0:21:28:1a:89:6a aggr2:1: flags=1040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4> mtu 1500 INDEX 3 inet 10.8.60.202 netmask ffffff00 broadcast 10.8.60.255 |
通过检查当前的IP地址信息,终于发现了问题所在,节点2上的VIP是正常的,但是节点1的VIP漂到了PRIVATE IP对应的网卡上。
对于这种情况,当前RAC的各个节点都可以正常访问VIP,因为无论是PUBLIC网卡还是PRIVATE网卡,两个节点都可以访问。但是对于数据库外的其他服务器而言,是不可能访问RAC的PRIVATE网络的,因此造成了节点2的VIP可以访问,而节点1的VIP不可访问。
那么为什么Oracle的VIP会漂到PRIVATE网卡上呢:
root@racdb1 # exit oracle@racdb1 $ srvctl config nodeapps -n racdb1 racdb1 racdb1 /u01/oracle/product/10.2/DATABASE oracle@racdb1 $ srvctl config nodeapps -n racdb1 -a VIP EXISTS.: /racdb1-vip/10.8.60.201/255.255.255.0/aggr1:aggr2 oracle@racdb1 $ srvctl config nodeapps -n racdb2 -a VIP EXISTS.: /racdb2-vip/10.8.60.202/255.255.255.0/aggr1:aggr2 |
显然是RAC安装过程中配置错误导致的,在设置VIP使用的网卡信息时,应该选择PUBLIC网络,而当前将两个网卡都选择上了。这正好说明了为什么有时VIP是不能访问的,而重启RAC节点后,一个节点的VIP就可以访问了。
如果VIP启动的时候选择了PUBLIC网卡,那么VIP地址对外就是可见的,而如果VIP选择了PRIVATE网卡,那么VIP地址对外就不可见了。
Pingback: 数据库VIP地址无法访问(二) | yangtingkun