HA手工切换报错,各位大神帮忙看看?

aix6108 ha 6.1 手工切换资源报错,模拟异常宕机情况能正常切换。hacmp日志详见附件。下面为我截取的一段日志,可能有问题:
+TK_bas1_rg:clstop_wpar[clearWparName+15] wparDir=/var/hacmp/adm/wpar
+TK_bas1_rg:clstop_wpar[clearWparName+17] [[ ! -d /var/hacmp/adm/wpar ]]
+TK_bas1_rg:clstop_wpar[clearWparName+19] rm -f /var/hacmp/adm/wpar/TK_bas1_rg
+TK_bas1_rg:clstop_wpar[+58] exit 0
+TK_bas1_rg:node_down_local[505] RC=0
+TK_bas1_rg:node_down_local[506] : exit status of clstop_wpar is: 0
+TK_bas1_rg:node_down_local[508] (( 0 != 0 ))
+TK_bas1_rg:node_down_local[517] (( 1 != 0 ))
+TK_bas1_rg:node_down_local[519] set_resource_status ERROR
+TK_bas1_rg:node_down_local[4] set +u
+TK_bas1_rg:node_down_local[5] NOT_DOIT=''
+TK_bas1_rg:node_down_local[6] set -u
+TK_bas1_rg:node_down_local[8] [[ '' == CLEANUP ]]
+TK_bas1_rg:node_down_local[12] [[ '' != TRUE ]]
+TK_bas1_rg:node_down_local[14] [[ REAL == EMUL ]]
+TK_bas1_rg:node_down_local[19] clchdaemons -d clstrmgr_scripts -t resource_locator -n tkbas1 -o TK_bas1_rg -v ERROR
+TK_bas1_rg:node_down_local[28] [[ ERROR == RELEASING ]]
+TK_bas1_rg:node_down_local[38] [[ NONE == RELEASE_SECONDARY ]]
+TK_bas1_rg:node_down_local[39] [[ NONE == SECONDARY_BECOMES_PRIMARY ]]
+TK_bas1_rg:node_down_local[43] cl_RMupdate rg_error TK_bas1_rg node_down_local
2017-12-16T22:36:29.248546
2017-12-16T22:36:29.254379
Reference string: Sat.Dec.16.22:36:29.BEIST.2017.node_down_local.TK_bas1_rg.ref
+TK_bas1_rg:node_down_local[520] : exit status of set_resource_status is: 0
+TK_bas1_rg:node_down_local[521] exit 1
Dec 16 22:36:29 EVENT FAILED: 1: node_down_local 1

+TK_bas1_rg:rg_move[+241] [ 1 -ne 0 ]
+TK_bas1_rg:rg_move[+243] cl_log 650 rg_move: Failure occurred while processing Resource Group TK_bas1_rg. Manual intervention required. rg_move TK_bas1_rg
+TK_bas1_rg:cl_log[+50] version=1.10
+TK_bas1_rg:cl_log[+94] SYSLOG_FILE=/usr/es/adm/cluster.log


Dec 16 2017 22:36:29 !!!!!!!!!! ERROR !!!!!!!!!!


Dec 16 2017 22:36:29 rg_move: Failure occurred while processing Resource Group TK_bas1_rg. Manual intervention required.
+TK_bas1_rg:rg_move[+244] STATUS=1
+TK_bas1_rg:rg_move[+247] UPDATESTATD=1
+TK_bas1_rg:rg_move[+254] process_resources
:process_resources[2538] version=1.132.1.2
:process_resources[2541] STATUS=0
:process_resources[2542] sddsrv_off=FALSE
:process_resources[2544] true
:process_resources[2546] : call rgpa, and it will tell us what to do next
:process_resources[2548] set -a
:process_resources[2549] clRGPA
:clRGPA[+49] [[ high = high ]]
:clRGPA[+49] version=1.16
:clRGPA[+51] usingVer=clrgpa
:clRGPA[+56] clrgpa
2017-12-16T22:36:29.367049 clrgpa
:clRGPA[+57] exit 0
:process_resources[2549] eval JOB_TYPE=NONE
:process_resources[1] JOB_TYPE=NONE
:process_resources[2550] RC=0
:process_resources[2551] set +a
:process_resources[2553] (( 0 != 0 ))
:process_resources[2559] RESOURCE_GROUPS=TK_bas1_rg
+TK_bas1_rg:process_resources[2560] GROUPNAME=TK_bas1_rg
+TK_bas1_rg:process_resources[2560] export GROUPNAME
+TK_bas1_rg:process_resources[2864] break
+TK_bas1_rg:process_resources[2875] : If sddsrv was turned off above, turn it back on again
+TK_bas1_rg:process_resources[2877] [[ FALSE == TRUE ]]
+TK_bas1_rg:process_resources[2883] exit 0
+TK_bas1_rg:rg_move[+292] [ -f /tmp/.NFSSTOPPED ]
+TK_bas1_rg:rg_move[+312] [ -f /tmp/.RPCLOCKDSTOPPED ]
+TK_bas1_rg:rg_move[+337] exit 1
Dec 16 22:36:29 EVENT FAILED: 1: rg_move tkbas1 1 RELEASE 1

:rg_move_release[+68] exit 1
Dec 16 22:36:29 EVENT FAILED: 1: rg_move_release tkbas1 1 1

        HACMP Event Summary

Event: TE_RG_MOVE
Start time: Sat Dec 16 22:33:49 2017

End time: Sat Dec 16 22:36:29 2017

Action: Resource: Script Name:

Releasing resource group: TK_bas1_rg node_down_local
Search on: Sat.Dec.16.22:33:50.BEIST.2017.node_down_local.TK_bas1_rg.ref
Releasing resource: All_servers stop_server
Search on: Sat.Dec.16.22:33:50.BEIST.2017.stop_server.All_servers.TK_bas1_rg.ref
Error encountered with resource: TK_bas1_app stop_server
Search on: Sat.Dec.16.22:36:09.BEIST.2017.stop_server.TK_bas1_app.TK_bas1_rg.ref
Resource offline: All_nonerror_servers stop_server
Search on: Sat.Dec.16.22:36:09.BEIST.2017.stop_server.All_nonerror_servers.TK_bas1_rg.ref
Releasing resource: All_filesystems cl_deactivate_fs
Search on: Sat.Dec.16.22:36:14.BEIST.2017.cl_deactivate_fs.All_filesystems.TK_bas1_rg.ref
Resource offline: All_non_error_filesystems cl_deactivate_fs
Search on: Sat.Dec.16.22:36:20.BEIST.2017.cl_deactivate_fs.All_non_error_filesystems.TK_bas1_rg.ref
Releasing resource: All_volume_groups cl_deactivate_vgs
Search on: Sat.Dec.16.22:36:20.BEIST.2017.cl_deactivate_vgs.All_volume_groups.TK_bas1_rg.ref
Resource offline: All_nonerror_volume_groups cl_deactivate_vgs
Search on: Sat.Dec.16.22:36:27.BEIST.2017.cl_deactivate_vgs.All_nonerror_volume_groups.TK_bas1_rg.ref
Releasing resource: All_service_addrs release_service_addr
Search on: Sat.Dec.16.22:36:27.BEIST.2017.release_service_addr.All_service_addrs.TK_bas1_rg.ref
Resource offline: All_nonerror_service_addrs release_service_addr
Search on: Sat.Dec.16.22:36:29.BEIST.2017.release_service_addr.All_nonerror_service_addrs.TK_bas1_rg.ref
Error encountered with group: TK_bas1_rg node_down_local

Search on: Sat.Dec.16.22:36:29.BEIST.2017.node_down_local.TK_bas1_rg.ref

Dec 16 22:36:29 EVENT START: event_error 1 TE_RG_MOVE

:event_error[+52] [[ high = high ]]
:event_error[+52] version=1.13
:event_error[+53] :event_error[+53] cl_get_path
HA_DIR=es
:event_error[+55] EXIT_STATUS=1
:event_error[+56] RP_NAME=1 TE_RG_MOVE
:event_error[+59] [ 2 -ne 2 ]
:event_error[+65] set -u
:event_error[+68] RP_NAME=RG_MOVE
:event_error[+69] RP_NAME=RG
:event_error[+72] :event_error[+72] cllsclstr -c
:event_error[+72] cut -d : -f2
:event_error[+72] grep -v cname
CLUSTER=TK_bas_Cluster
:event_error[+77] [ -x /usr/lpp/ssp/bin/spget_syspar ]
:event_error[+84] dspmsg scripts.cat 9646 WARNING: Cluster TK_bas_Cluster Failed while running event [RG], exit status was 1\n TK_bas_Cluster RG 1
:event_error[+84] 1> /dev/console
:event_error[+85] dspmsg scripts.cat 9646 WARNING: Cluster TK_bas_Cluster Failed while running event [RG], exit status was 1\n TK_bas_Cluster RG 1
WARNING: Cluster TK_bas_Cluster Failed while running event [RG], exit status was 1
:event_error[+90] [[ tkbas1 = tkbas1 ]]
:event_error[+94] dspmsg scripts.cat 9648 Check hacmp.out on this node for errors.\n
Check hacmp.out on this node for errors.
:event_error[+94] [[ RG = reconfig_resource* ]]
:event_error[+120] :
:event_error[+121] ps -edf

附件:

附件图标hacmp.txt (1.23 MB)

参与10

3同行回答

wangmjwangmj系统运维工程师CES
根据您的日志,感觉是HA停止服务的脚本执行失败了导致的。您可以看下在切换失败时,是否确认对应的服务没有被关闭掉。显示全部

根据您的日志,感觉是HA停止服务的脚本执行失败了导致的。您可以看下在切换失败时,是否确认对应的服务没有被关闭掉。

收起
银行 · 2017-12-17
浏览3672
  • 停脚本中最后一条语句返回值为1,准备在脚本最后一行加入 exit 0,再做测试。非常感谢!
    2017-12-18
baochengchenbaochengchen系统工程师华际
分段调试是一个办法; 单独测试ha功能,,然后再测试你的脚本。基本上问题不大显示全部

分段调试是一个办法;

单独测试ha功能,,然后再测试你的脚本。基本上问题不大

收起
系统集成 · 2017-12-17
浏览3508
wjs_lunawjs_luna系统工程师某保险
还有一段在听app时有error的log/hacmp/TK_bas1_stop.sh[91]: kill: bad argument count+TK_bas1_rg:stop_server[+133] [ 1 -ne 0 ]+TK_bas1_rg:stop_server[+135] cl_log 312 Failed to stop TK_bas1_app. TK_bas1_app+TK_bas1_rg:cl_log[+50] version=1.10+TK_bas1_rg:...显示全部

还有一段在听app时有error的log
/hacmp/TK_bas1_stop.sh[91]: kill: bad argument count
+TK_bas1_rg:stop_server[+133] [ 1 -ne 0 ]
+TK_bas1_rg:stop_server[+135] cl_log 312 Failed to stop TK_bas1_app. TK_bas1_app
+TK_bas1_rg:cl_log[+50] version=1.10
+TK_bas1_rg:cl_log[+94] SYSLOG_FILE=/usr/es/adm/cluster.log


Dec 16 2017 22:36:09 !!!!!!!!!! ERROR !!!!!!!!!!


Dec 16 2017 22:36:09 Failed to stop TK_bas1_app.
+TK_bas1_rg:stop_server[+136] STATUS=1
+TK_bas1_rg:stop_server[+138] cl_RMupdate resource_error TK_bas1_app stop_server
2017-12-16T22:36:09.468795
2017-12-16T22:36:09.475333
Reference string: Sat.Dec.16.22:36:09.BEIST.2017.stop_server.TK_bas1_app.TK_bas1_rg.ref
+TK_bas1_rg:stop_server[+161] ALLNOERRSERV=All_nonerror_servers
+TK_bas1_rg:stop_server[+162] [ REAL = EMUL ]
+TK_bas1_rg:stop_server[+167] cl_RMupdate resource_down All_nonerror_servers stop_server
2017-12-16T22:36:09.502670
2017-12-16T22:36:09.508333
Reference string: Sat.Dec.16.22:36:09.BEIST.2017.stop_server.All_nonerror_servers.TK_bas1_rg.ref
+TK_bas1_rg:stop_server[+170] exit 1
Dec 16 22:36:09 EVENT FAILED: 1: stop_server TK_bas1_app 1

+TK_bas1_rg:node_down_local[249] STATUS=1
+TK_bas1_rg:node_down_local[258] server_release_lpar_resources TK_bas1_app
+TK_bas1_rg:server_release_lpar_resources[831] [[ high == high ]]
+TK_bas1_rg:server_release_lpar_resources[831] version=1.14.5.3
+TK_bas1_rg:server_release_lpar_resources[833] typeset HOSTNAME
+TK_bas1_rg:server_release_lpar_resources[834] typeset MANAGED_SYSTEM
+TK_bas1_rg:server_release_lpar_resources[835] typeset HMC_IP
+TK_bas1_rg:server_release_lpar_resources[836] added_apps=''
+TK_bas1_rg:server_release_lpar_resources[836] typeset added_apps
+TK_bas1_rg:server_release_lpar_resources[837] APPLICATIONS=''
+TK_bas1_rg:server_release_lpar_resources[837] typeset APPLICATIONS
+TK_bas1_rg:server_release_lpar_resources[838] mem_release_type=''
+TK_bas1_rg:server_release_lpar_resources[838] typeset mem_release_type
+TK_bas1_rg:server_release_lpar_resources[840] mem_resource=0
+TK_bas1_rg:server_release_lpar_resources[840] typeset mem_resource
+TK_bas1_rg:server_release_lpar_resources[841] cpu_resource=0
+TK_bas1_rg:server_release_lpar_resources[841] typeset cpu_resource
+TK_bas1_rg:server_release_lpar_resources[842] cuod_mem_resource=0
+TK_bas1_rg:server_release_lpar_resources[842] typeset cuod_mem_resource
+TK_bas1_rg:server_release_lpar_resources[843] cuod_cpu_resource=0
+TK_bas1_rg:server_release_lpar_resources[843] typeset cuod_cpu_resource
+TK_bas1_rg:server_release_lpar_resources[845] display_event_summary=false
+TK_bas1_rg:server_release_lpar_resources[845] typeset display_event_summary
+TK_bas1_rg:server_release_lpar_resources[847] lmb_size=0
+TK_bas1_rg:server_release_lpar_resources[847] typeset lmb_size
+TK_bas1_rg:server_release_lpar_resources[849] typeset -i check_cuod
+TK_bas1_rg:server_release_lpar_resources[850] RC=0
+TK_bas1_rg:server_release_lpar_resources[850] typeset -i RC
+TK_bas1_rg:server_release_lpar_resources[854] : Look for any added application servers, beyond those running at the moment
+TK_bas1_rg:server_release_lpar_resources[856] getopts :g: opt
+TK_bas1_rg:server_release_lpar_resources[864] shift 0
+TK_bas1_rg:server_release_lpar_resources[866] APPLICATIONS=TK_bas1_app
+TK_bas1_rg:server_release_lpar_resources[869] : Set up values we are going to need to talk to the HMC, if they have not
+TK_bas1_rg:server_release_lpar_resources[870] : been set up before.
+TK_bas1_rg:server_release_lpar_resources[872] [[ -z '' ]]
+TK_bas1_rg:server_release_lpar_resources[873] hostname
+TK_bas1_rg:server_release_lpar_resources[873] HOSTNAME=tkbas1
+TK_bas1_rg:server_release_lpar_resources[876] [[ -z tkbas1 ]]
+TK_bas1_rg:server_release_lpar_resources[880] [[ -z '' ]]
+TK_bas1_rg:server_release_lpar_resources[881] clodmget -q name='tkbas1 and object=HMC_IP' -f value -n HACMPnode
+TK_bas1_rg:server_release_lpar_resources[881] HMC_IP=''
+TK_bas1_rg:server_release_lpar_resources[883] [[ -z '' ]]
+TK_bas1_rg:server_release_lpar_resources[885] : Node is not defined as an LPAR node if there is no HMC to talk to
+TK_bas1_rg:server_release_lpar_resources[887] exit 0
+TK_bas1_rg:node_down_local[259] : exit status of server_release_lpar_resources TK_bas1_app is: 0
+TK_bas1_rg:node_down_local[265] [[ -n '' ]]
+TK_bas1_rg:node_down_local[284] [[ -n '' ]]
+TK_bas1_rg:node_down_local[303] [[ -n '' ]]
+TK_bas1_rg:node_down_local[325] [[ -n '' ]]
+TK_bas1_rg:node_down_local[344] CROSSMOUNT=0
+TK_bas1_rg:node_down_local[344] typeset -i CROSSMOUNT
+TK_bas1_rg:node_down_local[345] export CROSSMOUNT
+TK_bas1_rg:node_down_local[347] [[ -n '' ]]
+TK_bas1_rg:node_down_local[387] (( 0 == 0 ))
+TK_bas1_rg:node_down_local[393] wc -l
+TK_bas1_rg:node_down_local[393] odmget HACMPnode
+TK_bas1_rg:node_down_local[393] sort
+TK_bas1_rg:node_down_local[393] uniq
+TK_bas1_rg:node_down_local[393] grep 'name ='
+TK_bas1_rg:node_down_local[393] (( 2 == 2 ))
+TK_bas1_rg:node_down_local[395] cut -f2 '-d"'
+TK_bas1_rg:node_down_local[395] odmget HACMPgroup
+TK_bas1_rg:node_down_local[395] grep 'group ='
+TK_bas1_rg:node_down_local[395] RESOURCE_GROUPS=TK_bas1_rg
+TK_bas1_rg:node_down_local[400] cut -f2 '-d"'
+TK_bas1_rg:node_down_local[399] odmget -q group='TK_bas1_rg AND name=EXPORT_FILESYSTEM' HACMPresource
+TK_bas1_rg:node_down_local[400] grep 'value ='
+TK_bas1_rg:node_down_local[399] EXPORTLIST=''
+TK_bas1_rg:node_down_local[400] [[ -n '' ]]
+TK_bas1_rg:node_down_local[423] [[ false == true ]]
+TK_bas1_rg:node_down_local[432] [[ -n '' ]]
+TK_bas1_rg:node_down_local[443] [[ '' != TRUE ]]
+TK_bas1_rg:node_down_local[444] clcallev release_vg_fs ALL bas1vg '' ''

Dec 16 22:36:09 EVENT START: release_vg_fs ALL bas1vg

收起
保险 · 2017-12-17
浏览3968

提问者

wjs_luna
系统工程师某保险
擅长领域: 服务器PowerHAAIX

相关问题

相关资料

相关文章

问题状态

  • 发布时间:2017-12-17
  • 关注会员:4 人
  • 问题浏览:5145
  • 最近回答:2017-12-17
  • X社区推广