并行进程出现ORA-27090错误

客户的RAC数据库出现ORA-27090错误。
详细的错误信息如下:

Wed May  2 22:13:14 2012
Completed instance recovery at
 Thread 1: logseq 22430, block 190818, scn 5430629987
 4110 DATA blocks READ, 4503 DATA blocks written, 43017 redo blocks READ
Wed May  2 22:13:15 2012
Thread 1 advanced TO log SEQUENCE 22431 (thread recovery)
Wed May  2 22:13:16 2012
Errors IN file /u01/app/oracle/admin/orcl/bdump/orcl2_p007_15968.trc:
ORA-27090: Message 27090 NOT found;  product=RDBMS; facility=ORA
Linux-x86_64 Error: 4: Interrupted system CALL
Additional information: 3
Additional information: 128
Additional information: 65536
Wed May  2 22:13:16 2012
Errors IN file /u01/app/oracle/admin/orcl/bdump/orcl2_p008_15970.trc:
ORA-27090: Message 27090 NOT found;  product=RDBMS; facility=ORA
Linux-x86_64 Error: 4: Interrupted system CALL
Additional information: 3
Additional information: 128
Additional information: 65536
Wed May  2 22:13:16 2012
Errors IN file /u01/app/oracle/admin/orcl/bdump/orcl2_p004_15962.trc:
ORA-27090: Message 27090 NOT found;  product=RDBMS; facility=ORA
Linux-x86_64 Error: 4: Interrupted system CALL
Additional information: 3
Additional information: 128
Additional information: 65536
.
.
.
Wed May  2 22:13:16 2012
Errors IN file /u01/app/oracle/admin/orcl/bdump/orcl2_p014_15995.trc:
ORA-27090: Message 27090 NOT found;  product=RDBMS; facility=ORA
Linux-x86_64 Error: 4: Interrupted system CALL
Additional information: 3
Additional information: 128
Additional information: 65536
Wed May  2 22:13:29 2012
SMON: Parallel TRANSACTION recovery tried

问题发生在RAC一个节点崩溃后,所有的错误都发生在并行进程PNNN上。查看了一下MOS,发现这个问题是由于系统参数配置不当所致:ORA-27090: MESSAGE 27090 NOT FOUND; [ID 579108.1]。
内核参数的值:aio-max-nr设置太低,推荐设置为fs.aio-max-nr= 3145728。修改参数使用sysctl –p重新加载参数后,重启数据库即可。

Posted in ORACLE | Tagged , , , | Leave a comment

对象相关开发手册

这篇文档汇总了与对象和类型相关的知识。
按道理来说,应该先看PL/SQL文档,然后对于对象相关的问题,可以进一步阅读这篇文档。如果对于PL/SQL很熟悉,那么可以跳过PL/SQL文档,毕竟那篇文档中介绍的都是基础的内容和语法。
这篇文档中的对象并不特指TYPE类型,RECORD、COLLECTION以及TYPE都是这篇文档关注的,此外和这些类型相关的函数以及访问方式、程序开发语句都包含在这篇文档之内。
无论是PL/SQL的高级开发,还是TYPE对象的使用,这篇文档都应该细读,这篇文档的在线阅读地址:http://www.oracle.com/pls/db112/to_toc?pathname=appdev.112%2Fe11822%2Ftoc.htm&remark=portal+%28Books%29

Posted in BOOKS | Leave a comment

ORA-600(ktspfupdst-1)错误

客户9208数据库出现ORA-600[ktspfupdst-1]错误。
错误信息为:

Sat Mar 31 02:50:48 2012
Errors IN file /oracle/app/admin/orcl/udump/orcl1_ora_1896900.trc:
ORA-00600: internal error code, arguments: [ktspfupdst-1], [], [], [], [], [], [], []
ORA-00604: error occurred at recursive SQL level 1
ORA-01013: USER requested cancel OF CURRENT operation

其实从错误信息本身就可以初步判断,问题应该和用户取消当前的操作有关。问题多半是由于用户取消当前操作后,Oracle在进行恢复处理时,引发了异常。
详细TRACE信息为:

*** 2012-03-31 02:50:48.509
ksedmp: internal OR fatal error
ORA-00600: internal error code, arguments: [ktspfupdst-1], [], [], [], [], [], [], []
ORA-00604: error occurred at recursive SQL level 1
ORA-01013: USER requested cancel OF CURRENT operation
CURRENT SQL statement FOR this SESSION:
INSERT INTO TAB001_3
(
  COL_CODE      ,
  A_CODE          ,
  C_CODE        ,
  S_ID            ,
  SER_NUM        ,
  S_TYPE          ,
  C_ID            ,
  A_ID            ,
  N_TYPE           ,
  B_ID           ,
  P_ID            , 
  P_ID_N       ,
  S_STAT        ,
  U_TIME        ,
  P_MODE           ,
  S_KIND          ,
  V_TYPE         ,
  O_MODE          ,
  A_TIME       ,
  C_TIME      ,
  AC_TIME        ,
  A_R_TIME   ,
  S_MODE          ,
  T_CODE      ,
  S_NUMBER         ,
  P_CODE       ,
  D_CODE      ,
  S_FLAG      ,
  S_TIME      ,
  DE_CODE     ,
  D_C_CODE  ,
  D_STAFF         ,
  CI_TYPE       ,
  O_TIME          ,
  SU_FLAG          ,
  SU_FLAG       ,
  D_A_CODE,
  CH_CODE          ,
  GR_ID           ,
  RE_CODE        )
SELECT '30',c.b_region,'',a.id_no,TRIM(a.p_no),'',a.c_id,a.c_no,'',a.s_code,'',
       '',decode(substr(a.r_code,2,1),'A','1','K','1','C','2','I','4','J','4','a','5','b','5','3'),
       to_char(a.r_time,'yyyymmddhh24miss'),decode(b.c_flag,'A','2','B','1'),
       '','01','1',to_char(a.o_time,'yyyymmddhh24miss'),to_char(a.o_time,'yyyymmddhh24miss'),'',to_char(a.o_time,'yyyymmddhh24miss'),
       'S','','','','','2','',
       '','','','',to_char(a.o_time,'yyyymmddhh24miss'),'U',decode(substr(a.r_code,2,1),'A','0','K','0','1'),
       c.b_region,'',a.g_id,c.r_code
  FROM tab2 a,tab3 b, tab4 c
 WHERE a.s_code = b.s_code
   AND substr(a.b_code,0,2) = b.r_code
   AND b.r_code = c.r_code
   AND MOD(a.id_no,10) = 3
----- Call Stack Trace -----
calling              CALL     entry                argument VALUES IN hex
location             TYPE     point                (? means dubious VALUE)
-------------------- -------- -------------------- ----------------------------
ksedmp+0148          bl       ksedst               1029746FC ?
ksfdmp+0018          bl       01FD4014
kgerinv+00e8         bl       _ptrgl
kgeasnmierr+004c     bl       kgerinv              000000001 ? 000000000 ?
                                                   000000005 ? 000000001 ?
                                                   000000001 ?
ktspfupdst+0540      bl       kgeasnmierr          110006308 ? 1103994E8 ?
                                                   102A9239C ? 000000000 ?
                                                   000000005 ? 000000010 ?
                                                   000000020 ? 000000006 ?
ktspstchg+00e4       bl       ktspfupdst           000000060 ? 300000004 ?
                                                   FFFFFFFFFFF6E48 ?
                                                   50601CE000000ED ?
                                                   3B401B34C5D02F2A ?
                                                   B92000004000020 ?
kdoiur+062c          bl       ktspstchg            000000000 ? 700000C39D779E8 ?
                                                   000000000 ?
kcoubk+00e4          bl       _ptrgl
ktundo+0988          bl       kcoubk               1010CCD80 ? FFFFFFFFFFF76C0 ?
                                                   100ED51C0 ? FFFFFFFFFFF7150 ?
                                                   1101FAF78 ? 1102567C0 ?
                                                   700000C396A1300 ? 000000002 ?
ktubko+03bc          bl       ktundo               1840DFB30 ?
                                                   3B401B3400000002 ?
                                                   000000000 ? 000000000 ?
                                                   FFFFFFFFFFF85D8 ?
                                                   700000C80A1E880 ? 2FFFF8540 ?
                                                   FFFFFFFFFFF8780 ?
ktuabt+0638          bl       ktubko               DF000000DF ?
                                                   FFFFFFFFFFF8690 ? 000000000 ?
                                                   FFFFFFFFFFF85D8 ? 102973880 ?
                                                   700000C844FA418 ?
ktcrab+02b4          bl       ktuabt               700000C80A1E840 ? 200017CD8 ?
ktcrsp+026c          bl       ktcrab               100F698E4 ? 000000001 ?
ksures+0074          bl       ktcrsp               700000C844FA448 ?
opiexe+3380          bl       01FD4138
opiall0+102c         bl       opiexe               400000000 ? 110002A48 ?
                                                   FFFFFFFFFFFA0A0 ?
kpoal8+0a78          bl       opiall0              5EFFFFBED4 ? 22103A43F8 ?
                                                   FFFFFFFFFFFA5B8 ? 000000000 ?
                                                   FFFFFFFFFFFA508 ? 1103A4B00 ?
                                                   6FF00000738 ?
                                                   24000000007FFF ?
opiodr+08cc          bl       _ptrgl
ttcpip+0cc4          bl       _ptrgl
opitsk+0d60          bl       ttcpip               11000CF90 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
opiino+0758          bl       opitsk               000000000 ? 000000000 ?
opiodr+08cc          bl       _ptrgl
opidrv+032c          bl       opiodr               3C00000018 ? 4101FAF78 ?
                                                   FFFFFFFFFFFF7B0 ? 0A000F350 ?
sou2o+0028           bl       opidrv               3C0C000000 ? 4A00E8B50 ?
                                                   FFFFFFFFFFFF7B0 ?
main+0138            bl       01FD3A28
__start+0098         bl       main                 000000000 ? 000000000 ?

可以看到,当前执行的是一条插入语句。查询MOS发现,问题和ORA-00600 [ktspfupdst-1] During DML on ASSM Segment [ID 388599.1]描述的完全一致。
首先当前插入的表所在表空间使用的是ASSM,其次这个插入操作被取消,而且所有堆栈信息与bug 3370468描述的完全一致。
虽然这个bug在9.2.0.5被fixed,但是不排除在9.2.0.8被重新引入的可能性。Oracle给出的解决方案除了升级到10.2以及10.1.0.3以上外,还可以尝试用dbms_repair.segment_fix_status过程来修正问题对象。
如果只是单次的问题,且无法重现,最简单的方法莫过于直接MOVE,确保ASSM空间管理的异常不会导致表中后续数据的插入和读取。

Posted in BUG | Tagged , , , , , , | Leave a comment

Database Firewall安全手册总结

Firewall安全手册更多的内容偏重于图形化工具的配置。
文档内容并不算太多,而且绝大部分操作都是基于控制台的操作,看来FireWall的使用偏向于OEM类型,将更多的设置和实现通过图形化完成,而DBA习惯的命令行方式的配置并没有介绍。
除了介绍管理控制台之外,文档还描述了如何配置和使用分析器、审计过程和角色、访问和查看流量日志以及产生防火墙报告等。

Posted in BOOKS | Leave a comment

数据库VIP地址无法访问(二)

客户的数据库出现VIP地址无法访问的情况。
这一篇描述问题的解决。
数据库VIP地址无法访问(一):https://yangtingkun.net/?p=798

上一篇通过诊断找到了问题的原因,由于VIP绑定网卡的错误,导致VIP可能漂移到PRIVATE网卡上,从而导致VIP地址对于外部服务器不可见。
找到原因后,解决其实很简单,只需要改正VIP错误的配置即可,简单的代码类似下面的示例:

oracle@racdb1 $ srvctl config nodeapps -n racdb1 –a
VIP EXISTS.: /racdb1-vip/10.8.60.201/255.255.255.0/aggr1:aggr2
oracle@racdb1 $ srvctl config nodeapps -n racdb2 –a
VIP EXISTS.: /racdb2-vip/10.8.60.202/255.255.255.0/aggr1:aggr2

获取当前VIP配置信息。
停止VIP服务

$ srvctl stop nodeapps -n racdb1
$ srvctl stop nodeapps -n racdb2

停止数据库实例:

$ srvctl stop instance -d racdb -i racdb1
$ srvctl stop instance -d racdb -i racdb2

停止ASM实例:

$ srvctl stop asm -n racdb1
$ srvctl stop asm -n racdb2

以root修改VIP配置信息:

# srvctl MODIFY nodeapps -n racdb1 -A racdb1-vip/255.255.255.0/aggr2
# srvctl MODIFY nodeapps -n racdb2 -A racdb2-vip/255.255.255.0/aggr2

确认修改生效:

$ srvctl config nodeapps -n racdb1 -a
VIP EXISTS.: /racdb1-vip/10.8.60.201/255.255.255.0/aggr2
$ srvctl config nodeapps -n racdb2 -a
VIP EXISTS.: /racdb2-vip/10.8.60.202/255.255.255.0/aggr2

重启VIP服务

$ srvctl START nodeapps -n racdb1
$ srvctl START nodeapps -n racdb2

重启ASM实例:

$ srvctl START asm -n racdb1
$ srvctl START asm -n racdb2

重启数据库实例:

$ srvctl START instance -d racdb -i racdb1
$ srvctl START instance -d racdb -i racdb2

最后通过ipconfig验证,VIP地址绑定至PUBLIC网卡,确认修改成功。

Posted in ORACLE | Tagged , , , , | Leave a comment

数据库VIP地址无法访问(一)

客户的数据库出现VIP地址无法访问的情况。
这一篇描述问题的诊断。
客户的RAC环境重建后,发现两个节点的VIP都无法访问,开始认为是网络问题,但是随后发现整个RAC重启后,其中一个节点的VIP可以访问,而另一个节点的VIP仍然无法访问,因此判断可能是RAC本身的问题。
听到问题描述后,第一个判断是否VIP没有启动,登录数据库服务器后进行了检查:

oracle@racdb1 $ lsnrctl STATUS
LSNRCTL FOR Solaris: Version 10.2.0.4.0 - Production ON 18-APR-2012 10:22:02
Copyright (c) 1991, 2007, Oracle.  ALL rights reserved.
Connecting TO (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS OF the LISTENER
------------------------
Alias                     LISTENER_RACDB1
Version                   TNSLSNR FOR Solaris: Version 10.2.0.4.0 - Production
START DATE                25-MAR-2012 17:16:47
Uptime                    23 days 17 hr. 5 MIN. 15 sec
Trace Level               off
Security                  ON: LOCAL OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/oracle/product/10.2/DATABASE/network/admin/listener.ora
Listener Log File         /u01/oracle/product/10.2/DATABASE/network/log/listener_racdb1.log
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.8.60.201)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.8.60.1)(PORT=1521)))
Services Summary...
Service "+ASM" has 1 instance(s).
  Instance "+ASM1", STATUS BLOCKED, has 1 handler(s) FOR this service...
Service "+ASM_XPT" has 1 instance(s).
  Instance "+ASM1", STATUS BLOCKED, has 1 handler(s) FOR this service...
Service "PLSExtProc" has 1 instance(s).
  Instance "PLSExtProc", STATUS UNKNOWN, has 1 handler(s) FOR this service...
Service "racdb" has 2 instance(s).
  Instance "racdb1", STATUS READY, has 2 handler(s) FOR this service...
  Instance "racdb2", STATUS READY, has 1 handler(s) FOR this service...
Service "racdbXDB" has 2 instance(s).
  Instance "racdb1", STATUS READY, has 1 handler(s) FOR this service...
  Instance "racdb2", STATUS READY, has 1 handler(s) FOR this service...
Service "racdb_XPT" has 2 instance(s).
  Instance "racdb1", STATUS READY, has 2 handler(s) FOR this service...
  Instance "racdb2", STATUS READY, has 1 handler(s) FOR this service...
The command completed successfully

节点2:

oracle@racdb2 $ lsnrctl STATUS
LSNRCTL FOR Solaris: Version 10.2.0.4.0 - Production ON 18-APR-2012 10:19:00
Copyright (c) 1991, 2007, Oracle.  ALL rights reserved.
Connecting TO (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS OF the LISTENER
------------------------
Alias                     LISTENER_RACDB2
Version                   TNSLSNR FOR Solaris: Version 10.2.0.4.0 - Production
START DATE                25-MAR-2012 17:14:28
Uptime                    23 days 17 hr. 4 MIN. 31 sec
Trace Level               off
Security                  ON: LOCAL OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/oracle/product/10.2/DATABASE/network/admin/listener.ora
Listener Log File         /u01/oracle/product/10.2/DATABASE/network/log/listener_racdb2.log
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.8.60.202)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.8.60.3)(PORT=1521)))
Services Summary...
Service "+ASM" has 1 instance(s).
  Instance "+ASM2", STATUS BLOCKED, has 1 handler(s) FOR this service...
Service "+ASM_XPT" has 1 instance(s).
  Instance "+ASM2", STATUS BLOCKED, has 1 handler(s) FOR this service...
Service "PLSExtProc" has 1 instance(s).
  Instance "PLSExtProc", STATUS UNKNOWN, has 1 handler(s) FOR this service...
Service "racdb" has 2 instance(s).
  Instance "racdb1", STATUS READY, has 1 handler(s) FOR this service...
  Instance "racdb2", STATUS READY, has 2 handler(s) FOR this service...
Service "racdbXDB" has 2 instance(s).
  Instance "racdb1", STATUS READY, has 1 handler(s) FOR this service...
  Instance "racdb2", STATUS READY, has 1 handler(s) FOR this service...
Service "racdb_XPT" has 2 instance(s).
  Instance "racdb1", STATUS READY, has 1 handler(s) FOR this service...
  Instance "racdb2", STATUS READY, has 2 handler(s) FOR this service...
The command completed successfully

两个节点的监听都是正常的:

oracle@racdb1 $ sqlplus / AS sysdba
SQL*Plus: Release 10.2.0.4.0 - Production ON Wed Apr 18 10:20:06 2012
Copyright (c) 1982, 2007, Oracle.  ALL Rights Reserved.
Connected TO:
Oracle DATABASE 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
WITH the Partitioning, REAL Application Clusters, OLAP, DATA Mining
AND REAL Application Testing options
SQL> SHOW parameter service_name
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
service_names                        string      racdb
SQL> SHOW parameter listener
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
local_listener                       string
remote_listener                      string      LISTENERS_RACDB
oracle@racdb1 $ cd $ORACLE_HOME/network/admin
oracle@racdb1 $ more listener.ora
# listener.ora.racdb1 Network Configuration File: /u01/oracle/product/10.2/DATABASE/network/admin/listener.ora.racdb1
# Generated BY Oracle configuration tools.
LISTENER_RACDB1 =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = TCP)(HOST = racdb1-vip)(PORT = 1521)(IP = FIRST))
      (ADDRESS = (PROTOCOL = TCP)(HOST = 10.8.60.1)(PORT = 1521)(IP = FIRST))
    )
  )
SID_LIST_LISTENER_RACDB1 =
  (SID_LIST =
    (SID_DESC =
      (SID_NAME = PLSExtProc)
      (ORACLE_HOME = /u01/oracle/product/10.2/DATABASE)
      (PROGRAM = extproc)
    )
  )
oracle@racdb1 $ more tnsnames.ora
# tnsnames.ora Network Configuration File: /u01/oracle/product/10.2/DATABASE/network/admin/tnsnames.ora
# Generated BY Oracle configuration tools.
LISTENERS_RACDB =
  (ADDRESS_LIST =
    (ADDRESS = (PROTOCOL = TCP)(HOST = racdb1-vip)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = racdb2-vip)(PORT = 1521))
  )
RACDB2 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = racdb2-vip)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = racdb)
      (INSTANCE_NAME = racdb2)
    )
  )
RACDB1 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = racdb1-vip)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = racdb)
      (INSTANCE_NAME = racdb1)
    )
  )
RACDB =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = racdb1-vip)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = racdb2-vip)(PORT = 1521))
    (LOAD_BALANCE = yes)
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = racdb)
    )
  )
EXTPROC_CONNECTION_DATA =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC0))
    )
    (CONNECT_DATA =
      (SID = PLSExtProc)
      (PRESENTATION = RO)
    )
  )
oracle@racdb1 $ tnsping racdb
TNS Ping Utility FOR Solaris: Version 10.2.0.4.0 - Production ON 18-APR-2012 10:24:08
Copyright (c) 1997,  2007, Oracle.  ALL rights reserved.
Used parameter files:
Used TNSNAMES adapter TO resolve the alias
Attempting TO contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = racdb1-vip)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = racdb2-vip)(PORT = 1521)) (LOAD_BALANCE = yes) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = racdb)))
OK (0 msec)
oracle@racdb1 $ tnsping racdb1
TNS Ping Utility FOR Solaris: Version 10.2.0.4.0 - Production ON 18-APR-2012 10:24:11
Copyright (c) 1997,  2007, Oracle.  ALL rights reserved.
Used parameter files:
Used TNSNAMES adapter TO resolve the alias
Attempting TO contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = racdb1-vip)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = racdb) (INSTANCE_NAME = racdb1)))
OK (10 msec)
oracle@racdb1 $ tnsping racdb2
TNS Ping Utility FOR Solaris: Version 10.2.0.4.0 - Production ON 18-APR-2012 10:24:12
Copyright (c) 1997,  2007, Oracle.  ALL rights reserved.
Used parameter files:
Used TNSNAMES adapter TO resolve the alias
Attempting TO contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = racdb2-vip)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = racdb) (INSTANCE_NAME = racdb2)))
OK (0 msec)<pre>
数据库中设置了REMOTE_LISTENERS参数,而且TNSNAMES.ORA中的REMOTE_LISTENERS的配置也没有异常。通过tnsping命令检查配置同样没有发现异常。
<pre lang='SQL'>oracle@racdb2 $ sqlplus /nolog
SQL*Plus: Release 10.2.0.4.0 - Production ON Wed Apr 18 10:19:18 2012
Copyright (c) 1982, 2007, Oracle.  ALL Rights Reserved.
SQL> conn a/a@10.8.60.3/racdb
ERROR:
ORA-01017: invalid username/password; logon denied
 
SQL> conn a/a@10.8.60.1/racdb
ERROR:
ORA-01017: invalid username/password; logon denied
 
SQL> conn a/a@10.8.60.202/racdb
ERROR:
ORA-01017: invalid username/password; logon denied
 
SQL> conn a/a@10.8.60.201/racdb
ERROR:
ORA-01017: invalid username/password; logon denied
 
SQL> exit

节点2上通过PUBLIC IP和VIP分别连接两个实例,发现服务名配置没有问题,通过VIP也可以访问数据库,并没有出现VIP不同的情况。

oracle@racdb1 $ more /etc/hosts
#
# Internet host TABLE
#
::1     localhost       
127.0.0.1       localhost       
10.8.60.1       racdb1  racdb1.com      loghost
10.8.60.201     racdb1-vip
192.168.60.201  racdb1-priv
10.8.60.3       racdb2
10.8.60.202     racdb2-vip
192.168.60.202  racdb2-priv
#BackupServer
10.8.48.26      gz-bak
oracle@racdb2 $ more /etc/hosts
#
# Internet host TABLE
#
::1     localhost       
127.0.0.1       localhost       
10.8.60.1       racdb1
10.8.60.201     racdb1-vip
192.168.60.201  racdb1-priv
10.8.60.3       racdb2  racdb2.com      loghost
10.8.60.202     racdb2-vip
192.168.60.202  racdb2-priv
#BackupServer
10.8.48.26      gz-bak

分别在两个节点上检查hosts文件的配置,结果同样正常。

root@racdb1 # ./crs_stat -t   
Name           TYPE           Target    State     Host        
------------------------------------------------------------
ora....b1.inst application    ONLINE    ONLINE    racdb1      
ora....b2.inst application    ONLINE    ONLINE    racdb2      
ora.racdb.db   application    ONLINE    ONLINE    racdb2      
ora....SM1.asm application    ONLINE    ONLINE    racdb1      
ora....B1.lsnr application    ONLINE    ONLINE    racdb1      
ora.racdb1.gsd application    ONLINE    ONLINE    racdb1      
ora.racdb1.ons application    ONLINE    ONLINE    racdb1      
ora.racdb1.vip application    ONLINE    ONLINE    racdb1      
ora....SM2.asm application    ONLINE    ONLINE    racdb2      
ora....B2.lsnr application    ONLINE    ONLINE    racdb2      
ora.racdb2.gsd application    ONLINE    ONLINE    racdb2      
ora.racdb2.ons application    ONLINE    ONLINE    racdb2      
ora.racdb2.vip application    ONLINE    ONLINE    racdb2

RAC的相同服务都正常启动。

root@racdb1 # ./oifcfg getif  
aggr1  192.168.60.0  global  cluster_interconnect
aggr2  10.8.60.0  global  public
root@racdb2 # ./oifcfg getif
aggr1  192.168.60.0  global  cluster_interconnect
aggr2  10.8.60.0  global  public

使用oifcfg检查网卡配置也未发现异常。

root@racdb1 # ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 INDEX 1
        inet 127.0.0.1 netmask ff000000 
aggr1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 INDEX 2
        inet 192.168.60.201 netmask ffffff00 broadcast 192.168.60.255
        ether 0:21:28:1a:89:43 
aggr1:1: flags=1040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4> mtu 1500 INDEX 2
        inet 10.8.60.201 netmask ffffff00 broadcast 10.8.60.255
aggr2: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 INDEX 3
        inet 10.8.60.1 netmask ffffff00 broadcast 10.8.60.255
        ether 0:21:28:1a:89:42
root@racdb2 # ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 INDEX 1
        inet 127.0.0.1 netmask ff000000 
aggr1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 INDEX 2
        inet 192.168.60.202 netmask ffffff00 broadcast 192.168.60.255
        ether 0:21:28:1a:89:6b 
aggr2: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 INDEX 3
        inet 10.8.60.3 netmask ffffff00 broadcast 10.8.60.255
        ether 0:21:28:1a:89:6a 
aggr2:1: flags=1040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4> mtu 1500 INDEX 3
        inet 10.8.60.202 netmask ffffff00 broadcast 10.8.60.255

通过检查当前的IP地址信息,终于发现了问题所在,节点2上的VIP是正常的,但是节点1的VIP漂到了PRIVATE IP对应的网卡上。
对于这种情况,当前RAC的各个节点都可以正常访问VIP,因为无论是PUBLIC网卡还是PRIVATE网卡,两个节点都可以访问。但是对于数据库外的其他服务器而言,是不可能访问RAC的PRIVATE网络的,因此造成了节点2的VIP可以访问,而节点1的VIP不可访问。
那么为什么Oracle的VIP会漂到PRIVATE网卡上呢:

root@racdb1 # exit
oracle@racdb1 $ srvctl config nodeapps -n racdb1
racdb1 racdb1 /u01/oracle/product/10.2/DATABASE
oracle@racdb1 $ srvctl config nodeapps -n racdb1 -a
VIP EXISTS.: /racdb1-vip/10.8.60.201/255.255.255.0/aggr1:aggr2
oracle@racdb1 $ srvctl config nodeapps -n racdb2 -a
VIP EXISTS.: /racdb2-vip/10.8.60.202/255.255.255.0/aggr1:aggr2

显然是RAC安装过程中配置错误导致的,在设置VIP使用的网卡信息时,应该选择PUBLIC网络,而当前将两个网卡都选择上了。这正好说明了为什么有时VIP是不能访问的,而重启RAC节点后,一个节点的VIP就可以访问了。
如果VIP启动的时候选择了PUBLIC网卡,那么VIP地址对外就是可见的,而如果VIP选择了PRIVATE网卡,那么VIP地址对外就不可见了。

Posted in ORACLE | Tagged , , , , , , | 1 Comment

数据泵的跨版本问题

数据泵在10g刚推出的时候,跨版本的问题并不明显,而等到11.2推出之后,数据泵同样存在处理跨版本的问题。
Oracle原始的EXP/IMP导出、导入工具的扩版本特性对于从8/9版本就接触数据库的人来说,应该是基本常识了——低版本导出,目标版本导入。
而对于数据泵来说,很多人并没有意识到这一点,事实上数据泵如果从低版本导出,高版本导入是不需要额外处理的,而大多数的需求不是版本相同,就是版本升级,因此很多人没有注意到数据泵的版本问题。
不过如果数据泵从高版本导入大低版本,如果不加处理,就会出现下面的错误:

[orat1@hpserver2 ~]$ expdp test/test dumpfile=d_tmp:tab1.dp TABLES=tab1 
Export: Release 11.2.0.3.0 - Production ON Mon Apr 27 09:50:10 2012
Copyright (c) 1982, 2011, Oracle AND/OR its affiliates. ALL rights reserved.
Connected TO: Oracle DATABASE 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
WITH the Partitioning, OLAP, DATA Mining AND REAL Application Testing options
Starting "TEST"."SYS_EXPORT_TABLE_01": test/******** dumpfile=d_tmp:tab1.dp tables=tab1 
Estimate in progress using BLOCKS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Total estimation using BLOCKS method: 64 KB
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
. . exported "TEST"."TAB1" 6.593 KB 63 rows
Master table "TEST"."SYS_EXPORT_TABLE_01" successfully loaded/unloaded
******************************************************************************
Dump file set for TEST.SYS_EXPORT_TABLE_01 is:
/tmp/tab1.dp
Job "TEST"."SYS_EXPORT_TABLE_01" successfully completed at 09:50:17

11.2导入的文件,10.2直接导入报错:

[orat3@hpserver2 ~]$ impdp test/test directory=d_tmp dumpfile=tab1.dp
Import: Release 10.2.0.4.0 - 64bit Production ON Monday,27 April, 2012 9:50:25
Copyright (c) 2003, 2007, Oracle. ALL rights reserved.
Connected TO: Oracle DATABASE 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
WITH the Partitioning, OLAP, DATA Mining AND REAL Application Testing options
ORA-39001: invalid argument VALUE
ORA-39000: bad dump file specification
ORA-39142: incompatible version NUMBER 3.1 IN dump file "/tmp/tab1.dp"

正确的方式是在高版本导出的时候指定导出的版本,比如导入版本为10204,那么在11.2上导出时通过VERSION=10.2.0.4,确保导出的数据和结构和低版本兼容。

[orat1@hpserver2 ~]$ expdp test/test dumpfile=d_tmp:tab110204.dp TABLES=tab1 version=10.2.0.4
Export: Release 11.2.0.3.0 - Production ON Mon Apr 27 09:52:03 2012
Copyright (c) 1982, 2011, Oracle AND/OR its affiliates. ALL rights reserved.
Connected TO: Oracle DATABASE 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
WITH the Partitioning, OLAP, DATA Mining AND REAL Application Testing options
Starting "TEST"."SYS_EXPORT_TABLE_01": test/******** dumpfile=d_tmp:tab110204.dp tables=tab1 version=10.2.0.4 
Estimate in progress using BLOCKS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Total estimation using BLOCKS method: 64 KB
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
. . exported "TEST"."TAB1" 6.406 KB 63 rows
Master table "TEST"."SYS_EXPORT_TABLE_01" successfully loaded/unloaded
******************************************************************************
Dump file set for TEST.SYS_EXPORT_TABLE_01 is:
/tmp/tab110204.dp
Job "TEST"."SYS_EXPORT_TABLE_01" successfully completed at 09:52:09

这样在10.2.0.4数据库上就可以直接导入了。

[orat3@hpserver2 ~]$ impdp test/test directory=d_tmp dumpfile=tab110204.dp
Import: Release 10.2.0.4.0 - 64bit Production ON Monday, 27 April, 2012 9:54:35
Copyright (c) 2003, 2007, Oracle. ALL rights reserved.
Connected TO: Oracle DATABASE 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
WITH the Partitioning, OLAP, DATA Mining AND REAL Application Testing options
Master TABLE "TEST"."SYS_IMPORT_FULL_01" successfully loaded/unloaded
Starting "TEST"."SYS_IMPORT_FULL_01": test/******** directory=d_tmp dumpfile=tab110204.dp 
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
. . imported "TEST"."TAB1" 6.406 KB 63 rows
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
Job "TEST"."SYS_IMPORT_FULL_01" successfully completed at 09:54:36
Posted in ORACLE | Tagged , , , , , | Leave a comment

号码连续分段问题

客户有一个判断号码连续的问题,这种问题用分析函数解决是比较有代表性的。
表结构和数据为:

SQL> SELECT * FROM tab1;
        ID CA REG
---------- -- ---
       100 11 S
       101 11 S
       102 11 N
       103 11 N
       104 11 N
       105 0  S
       106 0  S
       107 0  S
       108 0  N
       109 0  N
       110 0  N
       111 11 S
       112 11 S
       113 11 N
       114 11 N
       115 11 N
       116 0  S
       117 0  S
       118 0  N
       119 0  N
       120 0  N
       121 11 S
       122 11 S
       123 11 S
       124 11 S
       125 11 S
       126 11 S
       127 11 S
       128 11 S
       129 11 S
       130 11 S
       200 11 S
       201 11 S
       202 11 N
       203 11 N
       204 11 N
       205 0  S
       206 0  S
       207 0  S
       208 0  N
       209 0  N
       210 0  N
       211 11 S
       212 11 S
       213 11 N
       214 11 N
       215 11 N
       300 11 S
       301 11 S
       302 11 N
       303 11 N
       304 11 N
       305 0  S
       306 0  S
       307 0  S
       308 0  N
       309 0  N
       310 0  N
       311 11 S
       312 11 S
       313 11 N
       314 11 N
       315 11 N
63 ROWS selected.

需求是找到一组连续ID的最小值和最大值,要求除了数值连续外,另外两个字段也必须相同,否则认为是新的号段。
简单的说,最终需要的结果类似:

100-101 11 S
102-104 11 N
105-107 00 S
108-110 00 N
... ...
200-201 11 S
202-204 11 N
205-207 00 S
208-210 00 N

其实这个问题使用分析函数并不复杂,第一步是通过构造列判断是否连续,然后进一步构造出分组的依据,这里使用了一个小技巧,将1变成0,而从保证一个号段内SUM汇总的结果是相同的,最后GROUP BY就可以得到最终结果:

SQL> SELECT MIN(id) || '-' || MAX(id) flag,
  2     card_type,
  3     region
  4  FROM
  5  (
  6     SELECT id, card_type, region, SUM(flag) OVER(ORDER BY id) flag
  7     FROM (
  8             SELECT id, card_type, region,
  9                     nvl(id-lag(id, 1) OVER(ORDER BY id)
 10                             + CASE WHEN card_type = lag(card_type, 1) OVER(ORDER BY id) THEN 0 ELSE 10 END
 11                             + CASE WHEN region = lag(region, 1) OVER(ORDER BY id) THEN 0 ELSE 10 END
 12                             , 1) -1 flag
 13             FROM tab1
 14     )
 15  )
 16  GROUP BY card_type, region, flag
 17  ORDER BY 1;
FLAG                 CA REG
-------------------- -- ---
100-101              11 S
102-104              11 N
105-107              0  S
108-110              0  N
111-112              11 S
113-115              11 N
116-117              0  S
118-120              0  N
121-130              11 S
200-201              11 S
202-204              11 N
205-207              0  S
208-210              0  N
211-212              11 S
213-215              11 N
300-301              11 S
302-304              11 N
305-307              0  S
308-310              0  N
311-312              11 S
313-315              11 N
21 ROWS selected.
Posted in ORACLE | Tagged , , , | Leave a comment

ORA-8103错误

最近碰到两次ORA-8103错误,简单总结一下。
一次是客户的10.2数据库出现了ORA-600[6002]错误,导致的问题是索引出现了逻辑损坏,本来问题很简单,只需要删除索引并重建,或者通过ONLINE REBUILD方式就可以了。但是索引删除后,扫描这张表出现了ORA-8103错误,这说明错误不仅出现在索引上,在数据块上同样存在逻辑错误,从而导致了前面的ORA-600[6002]错误。
第二个问题是11.2.0.2环境中出现的ORA-8103错误,错误发生在统计信息收集过程中:

Fri Mar 30 02:00:00 2012
DBMS_STATS: GATHER_STATS_JOB encountered errors.  CHECK the trace file.
Errors IN file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_j000_25932.trc:
ORA-20011: Approximate NDV failed: ORA-08103: object no longer EXISTS

TRACE文件的详细信息内容为:

*** 2012-03-31 14:08:25.269
*** SESSION ID:(485.61983) 2012-03-31 14:08:25.269
*** CLIENT ID:() 2012-03-31 14:08:25.269
*** SERVICE NAME:(SYS$USERS) 2012-03-31 14:08:25.269
*** MODULE NAME:(DBMS_SCHEDULER) 2012-03-31 14:08:25.269
*** ACTION NAME:(ORA$AT_OS_OPT_SY_1501) 2012-03-31 14:08:25.269
----- Call Stack Trace -----
calling              CALL     entry                argument VALUES IN hex      
location             TYPE     point                (? means dubious VALUE)     
-------------------- -------- -------------------- ----------------------------
skdstdst()+36        CALL     kgdsdst()            000000000 ? 000000000 ?
                                                   7FFF88CE8488 ? 000000001 ?
                                                   000000001 ? 000000002 ?
ksedst1()+98         CALL     skdstdst()           000000000 ? 000000000 ?
                                                   7FFF88CE8488 ? 000000001 ?
                                                   000000000 ? 000000002 ?
ksedst()+34          CALL     ksedst1()            000000000 ? 000000001 ?
                                                   7FFF88CE8488 ? 000000001 ?
                                                   000000000 ? 000000002 ?
kteinmap1()+4287     CALL     ksedst()             000000000 ? 000000001 ?
                                                   7FFF88CE8488 ? 000000001 ?
                                                   000000000 ? 000000002 ?
kteinmap()+6         CALL     kteinmap1()          2B55FE027010 ? 000000014 ?
                                                   7FFF88CE8488 ? 000000001 ?
                                                   000000000 ? 000000002 ?
kdselget()+13        CALL     kteinmap()           2B55FE027010 ? 000000014 ?
                                                   7FFF88CE8488 ? 000000001 ?
                                                   000000000 ? 000000002 ?
kdstsnb()+644        CALL     kdselget()           2B55FE027010 ? 000000014 ?
                                                   7FFF88CE8488 ? 000000001 ?
                                                   000000000 ? 000000002 ?
kdst_fetch()+234     CALL     kdstsnb()            2B55FE026E08 ? 000000014 ?
                                                   7FFF88CE8488 ? 000000001 ?
                                                   000000000 ? 000000002 ?
kdstf00001010000km(  CALL     kdst_fetch()         000000001 ? 2B55FE026E08 ?
)+3681                                             7FFF88CED870 ? 000000000 ?
                                                   000000000 ? 000000002 ?
kdsttgr()+110246     CALL     kdstf00001010000km(  000000032 ? 000000000 ?
                              )                    7FFF88CED870 ? 2B55FE026CF0 ?
                                                   008FD6CF8 ? 000000002 ?
qertbFetch()+2023    CALL     kdsttgr()            2B55FE026E08 ? 000000000 ?
                                                   96C2800B8 ? 2B55FE026CF0 ?
                                                   96C280128 ? 008FD6CF8 ?
qerandvFetch()+153   CALL     qertbFetch()         96C2800B8 ? 2B55FE026CF0 ?
                                                   008FD6CF8 ? 7FFF88CEE0D0 ?
                                                   100007FFF ? 2B55FE026D38 ?
qergsFetch()+821     CALL     qerandvFetch()       96B765418 ? 2B55FE027630 ?
                                                   008E6EDAE ? 7FFF88CEE1E0 ?
                                                   000007FFF ? 2B55FE026D38 ?
.
.
.
******************* END OF process map dump ************
*** 2012-03-31 14:08:29.159
ORA-20011: Approximate NDV failed: ORA-08103: object no longer EXISTS
*** 2012-03-31 14:08:29.159
DBMS_STATS: GATHER_STATS_JOB: GATHER_TABLE_STATS('"MMS"','"DBMS_TABCOMP_TEMP_UNCMP"','""', ...)
DBMS_STATS: ORA-20011: Approximate NDV failed: ORA-08103: object no longer EXISTS

导致问题的原因在于收集一张表的统计信息。
其实这两次碰到的ORA-8103问题相差不多,基本上都是表中存在逻辑坏块导致问题的产生。导致这个错误出现的原因是,Oracle根据块头的定位到一个数据块,但是发现这个数据块出现逻辑错误,可能是当前块的类型没有被标记为数据块,也有可能是当前块的DATA_OBJECT_ID存在错误。
如果这个报错发生在索引上,那么非常简单,只需要ONLINE REBUILD索引,或者将索引删除后重建即可。
当前碰到的两次错误,问题都发生在表上。那么在尝试解决问题之前,应该首先清除DB_CACHE,防止是内存中的错误导致问题的产生。
如果问题依旧,如果存在合适的备份可以进行恢复。比如通过RMAN的BLOCKRECOVER或者通过一个最新的逻辑备份进行恢复。
如果没有备份,则可以考虑利用DBMS_REPAIR包,将错误的BLOCK设置标识,防止随后的查询读取该BLOCK。如果想要直接读取,也可以考虑利用PL/SQL的方式将错误块以外的记录读出来。
关于ORA-8103错误,在MOS文档上ORA-8103 “object no longer exists” / Troubleshooting, Diagnostic and Solution [ID 8103.1]有非常详细的问题诊断和解决的错误,可以进行参考。

Posted in BUG | Tagged , , , , | Leave a comment

备份导致ORA-245错误

客户的11.2 RAC数据库在备份是出现ORA-245错误。
从告警日志中可以看到错误信息为:

Errors IN file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_18335.trc:
ORA-00245: control file backup operation failed

详细TRACE信息为:

Trace file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_18335.trc
Oracle DATABASE 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
WITH the Partitioning, REAL Application Clusters, Automatic Storage Management, OLAP,
DATA Mining AND REAL Application Testing options
ORACLE_HOME = /u01/app/oracle/product/11.2.0
System name:	Linux
Node name:	node-1
Release:	2.6.18-274.el5
Version:	#1 SMP Fri Jul 8 17:36:59 EDT 2011
Machine:	x86_64
Instance name: orcl1
Redo thread mounted BY this instance: 1
Oracle process NUMBER: 157
Unix process pid: 18335, image: oracle@node-1 (TNS V1-V3)
*** 2012-03-30 00:01:45.301
*** SESSION ID:(633.29287) 2012-03-30 00:01:45.301
*** CLIENT ID:() 2012-03-30 00:01:45.301
*** SERVICE NAME:(SYS$USERS) 2012-03-30 00:01:45.301
*** MODULE NAME:(rman@node-1 (TNS V1-V3)) 2012-03-30 00:01:45.301
*** ACTION NAME:(0000012 STARTED114) 2012-03-30 00:01:45.301
ORA-00245: control file backup operation failed

查询MOS发现是Oracle 11.2上Oracle改变了控制文件镜像的访问方式。11.2之前,RAC的各个节点上的ORACLE_HOME/dbs目录分别保存镜像。而11.2之后,Oracle要求这个镜像控制文件在两个节点上都可以访问到,因此可以通过设置控制文件镜像到共享磁盘的方式解决这个问题,语法类似:

RMAN> CONFIGURE SNAPSHOT CONTROLFILE NAME TO+DATA/orcl/snapcf_orcl.f’;
Posted in BUG | Tagged , , , , | Leave a comment