HA手工切换报错,各位大神帮忙看看?

aix6108 ha 6.1 手工切换资源报错,模拟异常宕机情况能正常切换。hacmp日志详见附件。下面为我截取的一段日志,可能有问题:
+TK_bas1_rg:clstop_wpar[clearWparName+15] wparDir=/var/hacmp/adm/wpar
+TK_bas1_rg:clstop_wpar[clearWparName+17] [[ ! -d /var/hacmp/adm/wpar ]]
+TK_bas1_rg:clstop_wpar[clearWparName+19] rm -f /var/hacmp/adm/wpar/TK_bas1_rg
+TK_bas1_rg:clstop_wpar[+58] exit 0
+TK_bas1_rg:node_down_local[505] RC=0
+TK_bas1_rg:node_down_local[506] : exit status of clstop_wpar is: 0
+TK_bas1_rg:node_down_local[508] (( 0 != 0 ))
+TK_bas1_rg:node_down_local[517] (( 1 != 0 ))
+TK_bas1_rg:node_down_local[519] set_resource_status ERROR
+TK_bas1_rg:node_down_local[4] set +u
+TK_bas1_rg:node_down_local[5] NOT_DOIT=''
+TK_bas1_rg:node_down_local[6] set -u
+TK_bas1_rg:node_down_local[8] [[ '' == CLEANUP ]]
+TK_bas1_rg:node_down_local[12] [[ '' != TRUE ]]
+TK_bas1_rg:node_down_local[14] [[ REAL == EMUL ]]
+TK_bas1_rg:node_down_local[19] clchdaemons -d clstrmgr_scripts -t resource_locator -n tkbas1 -o TK_bas1_rg -v ERROR
+TK_bas1_rg:node_down_local[28] [[ ERROR == RELEASING ]]
+TK_bas1_rg:node_down_local[38] [[ NONE == RELEASE_SECONDARY ]]
+TK_bas1_rg:node_down_local[39] [[ NONE == SECONDARY_BECOMES_PRIMARY ]]
+TK_bas1_rg:node_down_local[43] cl_RMupdate rg_error TK_bas1_rg node_down_local
2017-12-16T22:36:29.248546
2017-12-16T22:36:29.254379
Reference string: Sat.Dec.16.22:36:29.BEIST.2017.node_down_local.TK_bas1_rg.ref
+TK_bas1_rg:node_down_local[520] : exit status of set_resource_status is: 0
+TK_bas1_rg:node_down_local[521] exit 1
Dec 16 22:36:29 EVENT FAILED: 1: node_down_local 1

+TK_bas1_rg:rg_move[+241] [ 1 -ne 0 ]
+TK_bas1_rg:rg_move[+243] cl_log 650 rg_move: Failure occurred while processing Resource Group TK_bas1_rg. Manual intervention required. rg_move TK_bas1_rg
+TK_bas1_rg:cl_log[+50] version=1.10
+TK_bas1_rg:cl_log[+94] SYSLOG_FILE=/usr/es/adm/cluster.log


Dec 16 2017 22:36:29 !!!!!!!!!! ERROR !!!!!!!!!!


Dec 16 2017 22:36:29 rg_move: Failure occurred while processing Resource Group TK_bas1_rg. Manual intervention required.
+TK_bas1_rg:rg_move[+244] STATUS=1
+TK_bas1_rg:rg_move[+247] UPDATESTATD=1
+TK_bas1_rg:rg_move[+254] process_resources
:process_resources[2538] version=1.132.1.2
:process_resources[2541] STATUS=0
:process_resources[2542] sddsrv_off=FALSE
:process_resources[2544] true
:process_resources[2546] : call rgpa, and it will tell us what to do next
:process_resources[2548] set -a
:process_resources[2549] clRGPA
:clRGPA[+49] [[ high = high ]]
:clRGPA[+49] version=1.16
:clRGPA[+51] usingVer=clrgpa
:clRGPA[+56] clrgpa
2017-12-16T22:36:29.367049 clrgpa
:clRGPA[+57] exit 0
:process_resources[2549] eval JOB_TYPE=NONE
:process_resources[1] JOB_TYPE=NONE
:process_resources[2550] RC=0
:process_resources[2551] set +a
:process_resources[2553] (( 0 != 0 ))
:process_resources[2559] RESOURCE_GROUPS=TK_bas1_rg
+TK_bas1_rg:process_resources[2560] GROUPNAME=TK_bas1_rg
+TK_bas1_rg:process_resources[2560] export GROUPNAME
+TK_bas1_rg:process_resources[2864] break
+TK_bas1_rg:process_resources[2875] : If sddsrv was turned off above, turn it back on again
+TK_bas1_rg:process_resources[2877] [[ FALSE == TRUE ]]
+TK_bas1_rg:process_resources[2883] exit 0
+TK_bas1_rg:rg_move[+292] [ -f /tmp/.NFSSTOPPED ]
+TK_bas1_rg:rg_move[+312] [ -f /tmp/.RPCLOCKDSTOPPED ]
+TK_bas1_rg:rg_move[+337] exit 1
Dec 16 22:36:29 EVENT FAILED: 1: rg_move tkbas1 1 RELEASE 1

:rg_move_release[+68] exit 1
Dec 16 22:36:29 EVENT FAILED: 1: rg_move_release tkbas1 1 1

        HACMP Event Summary

Event: TE_RG_MOVE
Start time: Sat Dec 16 22:33:49 2017

End time: Sat Dec 16 22:36:29 2017

Action: Resource: Script Name:

Releasing resource group: TK_bas1_rg node_down_local
Search on: Sat.Dec.16.22:33:50.BEIST.2017.node_down_local.TK_bas1_rg.ref
Releasing resource: All_servers stop_server
Search on: Sat.Dec.16.22:33:50.BEIST.2017.stop_server.All_servers.TK_bas1_rg.ref
Error encountered with resource: TK_bas1_app stop_server
Search on: Sat.Dec.16.22:36:09.BEIST.2017.stop_server.TK_bas1_app.TK_bas1_rg.ref
Resource offline: All_nonerror_servers stop_server
Search on: Sat.Dec.16.22:36:09.BEIST.2017.stop_server.All_nonerror_servers.TK_bas1_rg.ref
Releasing resource: All_filesystems cl_deactivate_fs
Search on: Sat.Dec.16.22:36:14.BEIST.2017.cl_deactivate_fs.All_filesystems.TK_bas1_rg.ref
Resource offline: All_non_error_filesystems cl_deactivate_fs
Search on: Sat.Dec.16.22:36:20.BEIST.2017.cl_deactivate_fs.All_non_error_filesystems.TK_bas1_rg.ref
Releasing resource: All_volume_groups cl_deactivate_vgs
Search on: Sat.Dec.16.22:36:20.BEIST.2017.cl_deactivate_vgs.All_volume_groups.TK_bas1_rg.ref
Resource offline: All_nonerror_volume_groups cl_deactivate_vgs
Search on: Sat.Dec.16.22:36:27.BEIST.2017.cl_deactivate_vgs.All_nonerror_volume_groups.TK_bas1_rg.ref
Releasing resource: All_service_addrs release_service_addr
Search on: Sat.Dec.16.22:36:27.BEIST.2017.release_service_addr.All_service_addrs.TK_bas1_rg.ref
Resource offline: All_nonerror_service_addrs release_service_addr
Search on: Sat.Dec.16.22:36:29.BEIST.2017.release_service_addr.All_nonerror_service_addrs.TK_bas1_rg.ref
Error encountered with group: TK_bas1_rg node_down_local

Search on: Sat.Dec.16.22:36:29.BEIST.2017.node_down_local.TK_bas1_rg.ref

Dec 16 22:36:29 EVENT START: event_error 1 TE_RG_MOVE

:event_error[+52] [[ high = high ]]
:event_error[+52] version=1.13
:event_error[+53] :event_error[+53] cl_get_path
HA_DIR=es
:event_error[+55] EXIT_STATUS=1
:event_error[+56] RP_NAME=1 TE_RG_MOVE
:event_error[+59] [ 2 -ne 2 ]
:event_error[+65] set -u
:event_error[+68] RP_NAME=RG_MOVE
:event_error[+69] RP_NAME=RG
:event_error[+72] :event_error[+72] cllsclstr -c
:event_error[+72] cut -d : -f2
:event_error[+72] grep -v cname
CLUSTER=TK_bas_Cluster
:event_error[+77] [ -x /usr/lpp/ssp/bin/spget_syspar ]
:event_error[+84] dspmsg scripts.cat 9646 WARNING: Cluster TK_bas_Cluster Failed while running event [RG], exit status was 1\n TK_bas_Cluster RG 1
:event_error[+84] 1> /dev/console
:event_error[+85] dspmsg scripts.cat 9646 WARNING: Cluster TK_bas_Cluster Failed while running event [RG], exit status was 1\n TK_bas_Cluster RG 1
WARNING: Cluster TK_bas_Cluster Failed while running event [RG], exit status was 1
:event_error[+90] [[ tkbas1 = tkbas1 ]]
:event_error[+94] dspmsg scripts.cat 9648 Check hacmp.out on this node for errors.\n
Check hacmp.out on this node for errors.
:event_error[+94] [[ RG = reconfig_resource* ]]
:event_error[+120] :
:event_error[+121] ps -edf

附件:

附件图标hacmp.txt (1.23 MB)

参与10

3同行回答

wangmjwangmj系统运维工程师CES
根据您的日志,感觉是HA停止服务的脚本执行失败了导致的。您可以看下在切换失败时,是否确认对应的服务没有被关闭掉。显示全部

根据您的日志,感觉是HA停止服务的脚本执行失败了导致的。您可以看下在切换失败时,是否确认对应的服务没有被关闭掉。

收起
银行 · 2017-12-17
浏览3733
  • 停脚本中最后一条语句返回值为1,准备在脚本最后一行加入 exit 0,再做测试。非常感谢!
    2017-12-18

提问者

wjs_luna
系统工程师某保险
擅长领域: 服务器PowerHAAIX

相关问题

相关资料

相关文章

问题状态

  • 发布时间:2017-12-17
  • 关注会员:4 人
  • 问题浏览:5220
  • 最近回答:2017-12-17
  • X社区推广