nkj827
作者nkj827·2020-04-05 19:58
项目经理·长春长信华天

AIX 6.1下RAC安装问题小结

字数 9715阅读 1755评论 0赞 16

AIX 6.1下 RAC 安装问题小结

AIX 6.1, RAC
| 最近在AIX 6.1上安装了一套RAC,实施得不多,遇到不少问题,记录一下: OS版本: HOST_NAM_1:/#oslevel -s 6100-04-02-1007 HA版本: HOST_NAM_1:/#lslpp -l cluster.* Fileset Level State Description ---------------------------------------------------------------------------- Path:/usr/lib/objrepos cluster.adt.es.client.include 5.5.0.0 COMMITTED ES Client Include Files cluster.adt.es.client.samples.clinfo 5.5.0.0 COMMITTED ES Client CLINFO Samples cluster.adt.es.client.samples.clstat 5.5.0.1 COMMITTED ES Client Clstat Samples ### RSH 报错 #rsh HOST_NAM_2 date rshd: 0826-813Permissionisdenied. 相关文件配置: #cat .rhosts HOST_NAM_1 root HOST_NAM_2 root HOST_NAM_1 oracle HOST_NAM_2 oracle #cat /etc/hosts.equiv HOST_NAM_1 root HOST_NAM_2 root HOST_NAM_1 oracle HOST_NAM_2 oracle 其中HOST_NAM_1、HOST_NAM_2是HOSTNAME。 这里主要是/etc/hosts文件中,HOSTNAME不能当作别名,或者,”.rhosts”、”hosts.equiv”里不要配别名,应该是跟解析有关。 原HOSTS配置: 175.16.1.11 HOST_NAM_1_boot1 HOST_NAM_1 175.16.1.12 HOST_NAM_2_boot1 HOST_NAM_2 192.168.10.17 HOST_NAM_1_boot2 192.168.10.18 HOST_NAM_2_boot2 192.168.10.16 HOST_NAM_2_vip 192.168.10.15 HOST_NAM_1_vip 改为: 175.16.1.11 HOST_NAM_1_boot1 175.16.1.12 HOST_NAM_2_boot1 192.168.10.17 HOST_NAM_1 HOST_NAM_1_boot2 192.168.10.18 HOST_NAM_2 HOST_NAM_2_boot2 192.168.10.16 HOST_NAM_2_vip 192.168.10.15 HOST_NAM_1_vip ### rootpre.sh报错 这个安装前在文档中有到看到,作为注意事项记录一下: The Oracle 10gR2 OUI and configuration assistant programs do not recognize AIX 6 V6.1 as a supported release. 执行rootpre.sh时会报: Configuring Asynchronous I/O.... Asynchronous I/Oisnotinstalledonthis system. You will needtoinstall it,andeither configure it yourselfusing 'smit aio'orrerun the Oracle root installation procedure. Configuring POSIX Asynchronous I/O.... Posix Asynchronous I/Oisnotinstalledonthis system. You will needtoinstall it,andeither configure it yourselfusing 'smit aio'orrerun the Oracle root installation procedure. 解决方法:下载6718715补丁,执行里面的rootpre.sh 参考文档:282036.1 ### VIPCA 报错 VIPCA时,VIP起不来,日志报错信息: Interface en4 checked failed(host=HOST_NAM_1) Invalid parameters,orfailedtobring up VIP(host=HOST_NAM_1) 原因:VIP绑定的是小机集成的网卡 Logical Host Ethernet Port (lp-hea) The entstat output for LHEA is different from a regular adapter 解决方法: 修改racgvip脚本,找到 $ENTSTAT -d $_IF 这行,修改为: $ENTSTAT-d $_IF|$GREP-iEq'.*lan.*state.*:.*operational.*|.*link.*status.*:.*up.*|.*port.*operational.*state.*:.*up.*|.*driver.*flags.*:.*up.*' 参考文档:959746.1 ### ONS 起不来 日志报错信息: Failedtoget IPforlocalhost(0) Failedtoget IPforlocalhost(0) Failedtoget IPforlocalhost(0) onsctl: ons failedtostart 解决方法: 原hosts文件中找不到localhost: 127.0.0.1 loopback 改为: 127.0.0.1 loopback localhost ### CRS 升级10.2.0.4报错 升级完成后执行root102.sh,报: # ./root102.sh Error : Pleasechangethe CRS_ORACLE_USER id oracle tohave the following OS capabilities : 解决方法: #chuser capabilities=CAP_BYPASS_RAC_VMM,CAP_PROPAGATE,CAP_NUMA_ATTACH oracle #lsuser -f oracle | grep capabilities capabilities=CAP_BYPASS_RAC_VMM,CAP_PROPAGATE,CAP_NUMA_ATTACH 这个报错之前遇到过,升级的文档中,也有提到。 ### CRS 升级10.2.0.4后,VIP起不来 这次的有点难搞,日志中没有太多的信息,只有一行: Invalid parameters,orfailedtobring up VIP(host=HOST_NAM_1) 后来使用crsctl对VIP进行debug,收集更多的信息: #crsctl debug log res "ora.host_nam_2.vip:5" SetResource Debug Module: ora.host_nam_2.vip Level:5 #srvctl start nodeapps -n host_nam_2 CRS-0233: Resourceorrelatives are currently involvedwithanother operation. host_nam_2ra.host_nam_2.vip:Thu Mar2513:57:25GMT+08:002010[360824]Checking interface existance host_nam_2ra.host_nam_2.vip:Thu Mar2513:57:25GMT+08:002010[360824]Calling getifbyip host_nam_2ra.host_nam_2.vip:Thu Mar2513:57:25GMT+08:002010[360824]getifbyip: startedfor192.168.10.16 host_nam_2:ora.host_nam_2.vip:Thu Mar2513:57:26GMT+08:002010[360824]getifbyip: checkingiffailoverishappening() host_nam_2:ora.host_nam_2.vip:Thu Mar2513:57:26GMT+08:002010[360824]getifbyip: failoverisnothappening() host_nam_2:ora.host_nam_2.vip:Thu Mar2513:57:26GMT+08:002010[360824]Completed getifbyip host_nam_2:ora.host_nam_2.vip:Thu Mar2513:57:26GMT+08:002010[360824]ping_vip 192.168.10.16 started host_nam_2:ora.host_nam_2.vip:Thu Mar2513:57:26GMT+08:002010[360824]Abouttoexecute :/usr/sbin/ping-c1-w1192.168.10.16 host_nam_2:ora.host_nam_2.vip:Thu Mar2513:57:27GMT+08:002010[360824]ping_vip: 192.168.10.16isnotpingable,_count=1 host_nam_2:ora.host_nam_2.vip:Thu Mar2513:57:27GMT+08:002010[360824]Completedwithinitial interface test host_nam_2:ora.host_nam_2.vip:Thu Mar2513:57:27GMT+08:002010[360824]Broadcast=192.168.10.255 host_nam_2:ora.host_nam_2.vip:Thu Mar2513:57:27GMT+08:002010[360824]Interface tests host_nam_2:ora.host_nam_2.vip:Thu Mar2513:57:27GMT+08:002010[360824]checkIf: startforif=en4 host_nam_2:ora.host_nam_2.vip:Thu Mar2513:57:27GMT+08:002010[360824]IsIfAlive: startforif=en4 host_nam_2:ora.host_nam_2.vip:Thu Mar2513:57:27GMT+08:002010[360824]defaultgw: started host_nam_2:ora.host_nam_2.vip:Thu Mar2513:57:27GMT+08:002010[360824]defaultgw: completedwith192.168.10.254 host_nam_2:ora.host_nam_2.vip:Thu Mar2513:57:27GMT+08:002010[360824]Abouttoexecute command:/usr/sbin/ping-S 192.168.10.18-c1-w1192.168.10.254 host_nam_2:ora.host_nam_2.vip:Thu Mar2513:57:28GMT+08:002010[360824]Abouttoexecute command:/usr/sbin/ping-S 192.168.10.18-c1-w1192.168.10.254 host_nam_2:ora.host_nam_2.vip:Thu Mar2513:57:29GMT+08:002010[360824]IsIfAlive: RX packets checkedif=en4 failed host_nam_2:ora.host_nam_2.vip:Thu Mar2513:57:29GMT+08:002010[360824]Interface en4 checked failed(host=HOST_NAM_2) host_nam_2:ora.host_nam_2.vip:Thu Mar2513:57:29GMT+08:002010[360824]IsIfAlive: endforif=en4 host_nam_2:ora.host_nam_2.vip:Thu Mar2513:57:29GMT+08:002010[360824]checkIf: endforif=en4 host_nam_2:ora.host_nam_2.vip:Invalid parameters,orfailedtobring up VIP(host=HOST_NAM_2) host_nam_2:ora.host_nam_2.vip:Thu Mar2513:57:30GMT+08:002010[307376]Checking interface existance host_nam_2:ora.host_nam_2.vip:Thu Mar2513:57:30GMT+08:002010[307376]Calling getifbyip host_nam_2:ora.host_nam_2.vip:Thu Mar2513:57:30GMT+08:002010[307376]getifbyip: startedfor192.168.10.16 host_nam_2:ora.host_nam_2.vip:Thu Mar2513:57:30GMT+08:002010[307376]getifbyip: checkingiffailoverishappening() host_nam_2:ora.host_nam_2.vip:Thu Mar2513:57:30GMT+08:002010[307376]getifbyip: failoverisnothappening() host_nam_2:ora.host_nam_2.vip:Thu Mar2513:57:30GMT+08:002010[307376]Completed getifbyip host_nam_2:ora.host_nam_2.vip:Thu Mar2513:57:30GMT+08:002010[307376]ping_vip 192.168.10.16 started host_nam_2:ora.host_nam_2.vip:Thu Mar2513:57:30GMT+08:002010[307376]Abouttoexecute :/usr/sbin/ping-c1-w1192.168.10.16 host_nam_2:ora.host_nam_2.vip:Thu Mar2513:57:31GMT+08:002010[307376]ping_vip: 192.168.10.16isnotpingable,_count=1 host_nam_2:ora.host_nam_2.vip:Thu Mar2513:57:31GMT+08:002010[307376]Completedwithinitial interface test host_nam_2:ora.host_nam_2.vip:Thu Mar2513:57:31GMT+08:002010[307376]Broadcast=192.168.10.255 CRS-1006: No more memberstoconsider CRS-0215: Couldnotstart resource'ora.host_nam_2.vip'. CRS-0210: Couldnotfind resource ora.host_nam_2.LISTENER_HOST_NAM_2.lsnr. 这才搜索到了相关信息: Bug8413088: VIP CANNOT STARTONAIX6.1BECAUSE NETSTAT HAS A NEWCOLUMN. Bug9157855: DURING RESTARTORWHEN ONE OF THE TWO NODE CLUSTERISDOWN,VIP RESOURCE FAILS 这个问题,在打完CRS PSU后,也同样有可能存在,可以通过修改racvip脚本来解决: 10.2.0.4: _O1=$NETSTAT -n -I $_IF | $AWK "{ if (/^$_IF/) {print\\\\\\\\$5; exit}}" _O2=$NETSTAT -n -I $_IF | $AWK "{ if (/^$_IF/) {print\\\\\\\\$5; exit}}" 打完PSU后: _O1=$NETSTAT -n -I $_IF -p tcp | $GREP -iE ".*packets received$" | $AWK "{print\\\\\\\\$1; exit}" _O2=$NETSTAT -n -I $_IF -p tcp | $GREP -iE ".*packets received$" | $AWK "{print\\\\\\\\$1; exit}" 最终改为: _O1=$NETSTAT -n -I $_IF -p ip | $GREP -iE ".*packets received$" | $AWK "{print\\\\\\\\$1; exit}" _O2=$NETSTAT -n -I $_IF -p ip | $GREP -iE ".*packets received$" | $AWK "{print\\\\\\\\$1; exit}" 这个问题很杯具,花了很多时间,安装前,阅读相关文档时,我就注意到了这个BUG,两次VIP起不来,我都拿去对比,看看是不是这个BUG。结果还是没发现,一直到debug出来。 ### RDBMS升10204时,报java进程没停 Oracle Universal Installer has detected that there are processes runninginthe currently selected Oracle Home.The following processes needtobe shutdown before continuing: java 刚开始时,还有CRS的进程,停掉CRS后,还有一个java始终过不去,用fuser查使用$ORACLE_HOME目录进程,全kill,把ps -ef | grep java 出来的进程,除了安装的进程外都杀了,还是不行。 最后在Metalink上找到了解决方法,升级前: cd/usr/sbin/ mv fuser fuser.orig touch/usr/sbin/fuser chmod+x/usr/sbin/fuser 升级完成后,再改回来: cd/usr/sbin/ cp fuser.orig fuser 这招很阴啊。。 参考文档:975597.1 ### 数据库打完PSU补丁后,启CRS报错 在CRS、database都升级、打补丁完成后,启CRS、VIP等资源时,报错: HOST_NAM_1:/#crsctl start crs exec(): 0509-036 Cannotloadprogram/app/oracle/product/10204/db_1/bin/crsctl.bin because of the following errors: 0509-150Dependent module libhasgen10.a(shr_hasgen10.o)couldnotbe loaded. 0509-022 Cannotloadmodule libhasgen10.a(shr_hasgen10.o). 0509-026 System error: A fileordirectoryinthe path name doesnotexist. HOST_NAM_1:/app/oracle/product/10204/db_1/bin#./srvctl stop nodeapps -n host_nam_1 ./srvctl[187]: %s_jreLocation%/bin/java:notfound. HOST_NAM_1:/app/oracle/product/10204/crs_1/lib#srvctl stop nodeapps -n host_nam_1 /app/oracle/product/10204/db_1/bin/srvctl[187]: %s_jreLocation%/bin/java:notfound. 解决方法:改变环境变量,使从crs_/bin/目录下运行这些命令。 ### DBCA时报错 DBCA时,在创建实例这步时,报错: ORA-27504: IPC error creating OSD context ORA-27300: OS system dependent operation:sendmsg failedwithstatus:59 ORA-27301: OS failure message: Message too long ORA-27302: failure occurred at: sskgxpsnd1 ORA-27303: additional information: MTU verification failedtosend msg 原因:The problem was caused by incorrect UDP and TCP packet settings. 解决方法:修改以下参数: no-o tcp_sendspace=262144 no-o tcp_recvspace=262144 no-o udp_sendspace=65536 no-o udp_recvspace=262144 no-o rfc1323=1 之前的参数都偏小: tcp_sendspace131072 tcp_recvspace131072 使用no -a查看参数设置 参考文档:300956.1 — The End — |

如果觉得我的文章对您有用,请点赞。您的支持将鼓励我继续创作!

16

添加新评论0 条评论

Ctrl+Enter 发表

作者其他文章

相关问题

X社区推广