数据库VIP地址无法访问(一)

客户的数据库出现VIP地址无法访问的情况。
这一篇描述问题的诊断。
客户的RAC环境重建后,发现两个节点的VIP都无法访问,开始认为是网络问题,但是随后发现整个RAC重启后,其中一个节点的VIP可以访问,而另一个节点的VIP仍然无法访问,因此判断可能是RAC本身的问题。
听到问题描述后,第一个判断是否VIP没有启动,登录数据库服务器后进行了检查:

oracle@racdb1 $ lsnrctl STATUS
LSNRCTL FOR Solaris: Version 10.2.0.4.0 - Production ON 18-APR-2012 10:22:02
Copyright (c) 1991, 2007, Oracle.  ALL rights reserved.
Connecting TO (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS OF the LISTENER
------------------------
Alias                     LISTENER_RACDB1
Version                   TNSLSNR FOR Solaris: Version 10.2.0.4.0 - Production
START DATE                25-MAR-2012 17:16:47
Uptime                    23 days 17 hr. 5 MIN. 15 sec
Trace Level               off
Security                  ON: LOCAL OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/oracle/product/10.2/DATABASE/network/admin/listener.ora
Listener Log File         /u01/oracle/product/10.2/DATABASE/network/log/listener_racdb1.log
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.8.60.201)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.8.60.1)(PORT=1521)))
Services Summary...
Service "+ASM" has 1 instance(s).
  Instance "+ASM1", STATUS BLOCKED, has 1 handler(s) FOR this service...
Service "+ASM_XPT" has 1 instance(s).
  Instance "+ASM1", STATUS BLOCKED, has 1 handler(s) FOR this service...
Service "PLSExtProc" has 1 instance(s).
  Instance "PLSExtProc", STATUS UNKNOWN, has 1 handler(s) FOR this service...
Service "racdb" has 2 instance(s).
  Instance "racdb1", STATUS READY, has 2 handler(s) FOR this service...
  Instance "racdb2", STATUS READY, has 1 handler(s) FOR this service...
Service "racdbXDB" has 2 instance(s).
  Instance "racdb1", STATUS READY, has 1 handler(s) FOR this service...
  Instance "racdb2", STATUS READY, has 1 handler(s) FOR this service...
Service "racdb_XPT" has 2 instance(s).
  Instance "racdb1", STATUS READY, has 2 handler(s) FOR this service...
  Instance "racdb2", STATUS READY, has 1 handler(s) FOR this service...
The command completed successfully

节点2:

oracle@racdb2 $ lsnrctl STATUS
LSNRCTL FOR Solaris: Version 10.2.0.4.0 - Production ON 18-APR-2012 10:19:00
Copyright (c) 1991, 2007, Oracle.  ALL rights reserved.
Connecting TO (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS OF the LISTENER
------------------------
Alias                     LISTENER_RACDB2
Version                   TNSLSNR FOR Solaris: Version 10.2.0.4.0 - Production
START DATE                25-MAR-2012 17:14:28
Uptime                    23 days 17 hr. 4 MIN. 31 sec
Trace Level               off
Security                  ON: LOCAL OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/oracle/product/10.2/DATABASE/network/admin/listener.ora
Listener Log File         /u01/oracle/product/10.2/DATABASE/network/log/listener_racdb2.log
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.8.60.202)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.8.60.3)(PORT=1521)))
Services Summary...
Service "+ASM" has 1 instance(s).
  Instance "+ASM2", STATUS BLOCKED, has 1 handler(s) FOR this service...
Service "+ASM_XPT" has 1 instance(s).
  Instance "+ASM2", STATUS BLOCKED, has 1 handler(s) FOR this service...
Service "PLSExtProc" has 1 instance(s).
  Instance "PLSExtProc", STATUS UNKNOWN, has 1 handler(s) FOR this service...
Service "racdb" has 2 instance(s).
  Instance "racdb1", STATUS READY, has 1 handler(s) FOR this service...
  Instance "racdb2", STATUS READY, has 2 handler(s) FOR this service...
Service "racdbXDB" has 2 instance(s).
  Instance "racdb1", STATUS READY, has 1 handler(s) FOR this service...
  Instance "racdb2", STATUS READY, has 1 handler(s) FOR this service...
Service "racdb_XPT" has 2 instance(s).
  Instance "racdb1", STATUS READY, has 1 handler(s) FOR this service...
  Instance "racdb2", STATUS READY, has 2 handler(s) FOR this service...
The command completed successfully

两个节点的监听都是正常的:

oracle@racdb1 $ sqlplus / AS sysdba
SQL*Plus: Release 10.2.0.4.0 - Production ON Wed Apr 18 10:20:06 2012
Copyright (c) 1982, 2007, Oracle.  ALL Rights Reserved.
Connected TO:
Oracle DATABASE 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
WITH the Partitioning, REAL Application Clusters, OLAP, DATA Mining
AND REAL Application Testing options
SQL> SHOW parameter service_name
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
service_names                        string      racdb
SQL> SHOW parameter listener
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
local_listener                       string
remote_listener                      string      LISTENERS_RACDB
oracle@racdb1 $ cd $ORACLE_HOME/network/admin
oracle@racdb1 $ more listener.ora
# listener.ora.racdb1 Network Configuration File: /u01/oracle/product/10.2/DATABASE/network/admin/listener.ora.racdb1
# Generated BY Oracle configuration tools.
LISTENER_RACDB1 =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = TCP)(HOST = racdb1-vip)(PORT = 1521)(IP = FIRST))
      (ADDRESS = (PROTOCOL = TCP)(HOST = 10.8.60.1)(PORT = 1521)(IP = FIRST))
    )
  )
SID_LIST_LISTENER_RACDB1 =
  (SID_LIST =
    (SID_DESC =
      (SID_NAME = PLSExtProc)
      (ORACLE_HOME = /u01/oracle/product/10.2/DATABASE)
      (PROGRAM = extproc)
    )
  )
oracle@racdb1 $ more tnsnames.ora
# tnsnames.ora Network Configuration File: /u01/oracle/product/10.2/DATABASE/network/admin/tnsnames.ora
# Generated BY Oracle configuration tools.
LISTENERS_RACDB =
  (ADDRESS_LIST =
    (ADDRESS = (PROTOCOL = TCP)(HOST = racdb1-vip)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = racdb2-vip)(PORT = 1521))
  )
RACDB2 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = racdb2-vip)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = racdb)
      (INSTANCE_NAME = racdb2)
    )
  )
RACDB1 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = racdb1-vip)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = racdb)
      (INSTANCE_NAME = racdb1)
    )
  )
RACDB =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = racdb1-vip)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = racdb2-vip)(PORT = 1521))
    (LOAD_BALANCE = yes)
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = racdb)
    )
  )
EXTPROC_CONNECTION_DATA =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC0))
    )
    (CONNECT_DATA =
      (SID = PLSExtProc)
      (PRESENTATION = RO)
    )
  )
oracle@racdb1 $ tnsping racdb
TNS Ping Utility FOR Solaris: Version 10.2.0.4.0 - Production ON 18-APR-2012 10:24:08
Copyright (c) 1997,  2007, Oracle.  ALL rights reserved.
Used parameter files:
Used TNSNAMES adapter TO resolve the alias
Attempting TO contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = racdb1-vip)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = racdb2-vip)(PORT = 1521)) (LOAD_BALANCE = yes) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = racdb)))
OK (0 msec)
oracle@racdb1 $ tnsping racdb1
TNS Ping Utility FOR Solaris: Version 10.2.0.4.0 - Production ON 18-APR-2012 10:24:11
Copyright (c) 1997,  2007, Oracle.  ALL rights reserved.
Used parameter files:
Used TNSNAMES adapter TO resolve the alias
Attempting TO contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = racdb1-vip)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = racdb) (INSTANCE_NAME = racdb1)))
OK (10 msec)
oracle@racdb1 $ tnsping racdb2
TNS Ping Utility FOR Solaris: Version 10.2.0.4.0 - Production ON 18-APR-2012 10:24:12
Copyright (c) 1997,  2007, Oracle.  ALL rights reserved.
Used parameter files:
Used TNSNAMES adapter TO resolve the alias
Attempting TO contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = racdb2-vip)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = racdb) (INSTANCE_NAME = racdb2)))
OK (0 msec)<pre>
数据库中设置了REMOTE_LISTENERS参数,而且TNSNAMES.ORA中的REMOTE_LISTENERS的配置也没有异常。通过tnsping命令检查配置同样没有发现异常。
<pre lang='SQL'>oracle@racdb2 $ sqlplus /nolog
SQL*Plus: Release 10.2.0.4.0 - Production ON Wed Apr 18 10:19:18 2012
Copyright (c) 1982, 2007, Oracle.  ALL Rights Reserved.
SQL> conn a/a@10.8.60.3/racdb
ERROR:
ORA-01017: invalid username/password; logon denied
 
SQL> conn a/a@10.8.60.1/racdb
ERROR:
ORA-01017: invalid username/password; logon denied
 
SQL> conn a/a@10.8.60.202/racdb
ERROR:
ORA-01017: invalid username/password; logon denied
 
SQL> conn a/a@10.8.60.201/racdb
ERROR:
ORA-01017: invalid username/password; logon denied
 
SQL> exit

节点2上通过PUBLIC IP和VIP分别连接两个实例,发现服务名配置没有问题,通过VIP也可以访问数据库,并没有出现VIP不同的情况。

oracle@racdb1 $ more /etc/hosts
#
# Internet host TABLE
#
::1     localhost       
127.0.0.1       localhost       
10.8.60.1       racdb1  racdb1.com      loghost
10.8.60.201     racdb1-vip
192.168.60.201  racdb1-priv
10.8.60.3       racdb2
10.8.60.202     racdb2-vip
192.168.60.202  racdb2-priv
#BackupServer
10.8.48.26      gz-bak
oracle@racdb2 $ more /etc/hosts
#
# Internet host TABLE
#
::1     localhost       
127.0.0.1       localhost       
10.8.60.1       racdb1
10.8.60.201     racdb1-vip
192.168.60.201  racdb1-priv
10.8.60.3       racdb2  racdb2.com      loghost
10.8.60.202     racdb2-vip
192.168.60.202  racdb2-priv
#BackupServer
10.8.48.26      gz-bak

分别在两个节点上检查hosts文件的配置,结果同样正常。

root@racdb1 # ./crs_stat -t   
Name           TYPE           Target    State     Host        
------------------------------------------------------------
ora....b1.inst application    ONLINE    ONLINE    racdb1      
ora....b2.inst application    ONLINE    ONLINE    racdb2      
ora.racdb.db   application    ONLINE    ONLINE    racdb2      
ora....SM1.asm application    ONLINE    ONLINE    racdb1      
ora....B1.lsnr application    ONLINE    ONLINE    racdb1      
ora.racdb1.gsd application    ONLINE    ONLINE    racdb1      
ora.racdb1.ons application    ONLINE    ONLINE    racdb1      
ora.racdb1.vip application    ONLINE    ONLINE    racdb1      
ora....SM2.asm application    ONLINE    ONLINE    racdb2      
ora....B2.lsnr application    ONLINE    ONLINE    racdb2      
ora.racdb2.gsd application    ONLINE    ONLINE    racdb2      
ora.racdb2.ons application    ONLINE    ONLINE    racdb2      
ora.racdb2.vip application    ONLINE    ONLINE    racdb2

RAC的相同服务都正常启动。

root@racdb1 # ./oifcfg getif  
aggr1  192.168.60.0  global  cluster_interconnect
aggr2  10.8.60.0  global  public
root@racdb2 # ./oifcfg getif
aggr1  192.168.60.0  global  cluster_interconnect
aggr2  10.8.60.0  global  public

使用oifcfg检查网卡配置也未发现异常。

root@racdb1 # ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 INDEX 1
        inet 127.0.0.1 netmask ff000000 
aggr1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 INDEX 2
        inet 192.168.60.201 netmask ffffff00 broadcast 192.168.60.255
        ether 0:21:28:1a:89:43 
aggr1:1: flags=1040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4> mtu 1500 INDEX 2
        inet 10.8.60.201 netmask ffffff00 broadcast 10.8.60.255
aggr2: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 INDEX 3
        inet 10.8.60.1 netmask ffffff00 broadcast 10.8.60.255
        ether 0:21:28:1a:89:42
root@racdb2 # ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 INDEX 1
        inet 127.0.0.1 netmask ff000000 
aggr1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 INDEX 2
        inet 192.168.60.202 netmask ffffff00 broadcast 192.168.60.255
        ether 0:21:28:1a:89:6b 
aggr2: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 INDEX 3
        inet 10.8.60.3 netmask ffffff00 broadcast 10.8.60.255
        ether 0:21:28:1a:89:6a 
aggr2:1: flags=1040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4> mtu 1500 INDEX 3
        inet 10.8.60.202 netmask ffffff00 broadcast 10.8.60.255

通过检查当前的IP地址信息,终于发现了问题所在,节点2上的VIP是正常的,但是节点1的VIP漂到了PRIVATE IP对应的网卡上。
对于这种情况,当前RAC的各个节点都可以正常访问VIP,因为无论是PUBLIC网卡还是PRIVATE网卡,两个节点都可以访问。但是对于数据库外的其他服务器而言,是不可能访问RAC的PRIVATE网络的,因此造成了节点2的VIP可以访问,而节点1的VIP不可访问。
那么为什么Oracle的VIP会漂到PRIVATE网卡上呢:

root@racdb1 # exit
oracle@racdb1 $ srvctl config nodeapps -n racdb1
racdb1 racdb1 /u01/oracle/product/10.2/DATABASE
oracle@racdb1 $ srvctl config nodeapps -n racdb1 -a
VIP EXISTS.: /racdb1-vip/10.8.60.201/255.255.255.0/aggr1:aggr2
oracle@racdb1 $ srvctl config nodeapps -n racdb2 -a
VIP EXISTS.: /racdb2-vip/10.8.60.202/255.255.255.0/aggr1:aggr2

显然是RAC安装过程中配置错误导致的,在设置VIP使用的网卡信息时,应该选择PUBLIC网络,而当前将两个网卡都选择上了。这正好说明了为什么有时VIP是不能访问的,而重启RAC节点后,一个节点的VIP就可以访问了。
如果VIP启动的时候选择了PUBLIC网卡,那么VIP地址对外就是可见的,而如果VIP选择了PRIVATE网卡,那么VIP地址对外就不可见了。

This entry was posted in ORACLE and tagged , , , , , , . Bookmark the permalink.

One Response to 数据库VIP地址无法访问(一)

  1. Pingback: 数据库VIP地址无法访问(二) | yangtingkun

Leave a Reply

Your email address will not be published. Required fields are marked *