紧急救援--hacmp6.1配置时出现离奇现象
大家好;
最近实施HACMP6.1时出现这样问题;
环境:power720(2台)+aix7.1+hacmp6.1+ds5020
ip规划:vi /etc/hosts
16.0.0.1 aglzdb1_boot1
16.0.0.2 aglzdb2_boot1
15.0.0.1 aglzdb1_boot2
15.0.0.2 aglzdb2_boot2
10.11.31.1 aglzdb1_per aglzdb1
10.11.31.2 aglzdb2_per aglzdb2
10.11.31.3 aglzdb_svc
配置完hacmp;同步ok
display hacmp配置信息是
Cluster Name: aglz_cluster
Cluster Connection Authentication Mode: Standard
Cluster Message Authentication Mode: None
Cluster Message Encryption: None
Use Persistent Labels for Communication: No
There are 2 node(s) and 2 network(s) defined
NODE aglzdb1:
Network net_ether_01
aglzdb_svc 10.11.31.3
aglzdb1_boot2 15.0.0.1
aglzdb1_boot1 16.0.0.1
Network net_rs232_01
aglzdb1_tty0 /dev/tty0
NODE aglzdb2:
Network net_ether_01
aglzdb_svc 10.11.31.3
aglzdb2_boot1 16.0.0.2
aglzdb2_boot2 15.0.0.2
Network net_rs232_01
aglzdb2_tty0 /dev/tty0
Resource Group aglz_resource_group
Startup Policy Online On Home Node Only
Fallover Policy Fallover To Next Priority Node In The List
Fallback Policy Fallback To Higher Priority Node In The List
Participating Nodes aglzdb1 aglzdb2
Service IP Label aglzdb_svc
Total Heartbeats Missed: 0
Cluster Topology Start Time: 10/12/2013 02:22:07
启动双机现象:双机启动后,过程显示ok;只有nfs那个警告;在主节点上10.11.31.3 aglzdb_svc会挂上;但是不到半分钟vip就会消失;两个节点都没有;而且datavg从头到尾都没有varyon
下面是日志:有人说vg可能有问题;我就吧app server和vg从资源组里面踢掉;现在只剩下一个aglzdb_svc;启动时正常;vip没有消失;感觉是vg的问题;我就从存储里面划了一个新lun;重新建一个vg; 现在资源组里面加上这个新添加的卷组;还是vip一会就消失了,这个新卷组一直没有varyon;去掉这个vg;vip是正常的;后面测试期间 app server一直就没设置;
大家给点建议吧
查看hacmp.out里面启动一些日志
HACMP: Additional messages will be logged here as the cluster events are run
:check_for_site_up[+54] [[ high = high ]]
:check_for_site_up[+54] version=1.4
:check_for_site_up[+55] :check_for_site_up[+55] cl_get_path
HA_DIR=es
:check_for_site_up[+57] STATUS=0
:check_for_site_up[+59] set +u
:check_for_site_up[+61] [ ]
:check_for_site_up[+72] exit 0
Oct 12 09:57:27 EVENT START: node_up aglzdb1
:node_up[+137] [[ high = high ]]
:node_up[+137] version=1.10.11.24
:node_up[+139] NFS_CROSS_MOUNT_ENABLE_VAR=
:node_up[+141] export NODENAME=aglzdb1
:node_up[+143] HPS_CMD=/usr/es/sbin/cluster/events/utils/cl_HPS_init
:node_up[+144] typeset -i STATUS=0
:node_up[+145] typeset -i RC=0
:node_up[+148] [[ -z ]]
:node_up[+150] EMULATE=REAL
:node_up[+153] set -u
:node_up[+155] (( 1 < 1 ))
:node_up[+167] [[ aglzdb1 = aglzdb1 ]]
:node_up[+169] rm -f /usr/es/sbin/cluster/etc/ha_nodehalt.lock
:node_up[+173] [[ 1 -eq 2 ]]
:node_up[+187] [[ REAL == REAL ]]
:node_up[+187] /usr/sbin/rsct/bin/dms/startdms -s topsvcs
Dead Man Switch Enabled
:node_up[+192] [[ FALSE = FALSE ]]
:node_up[+194] echo RG_DEPENDENCY is set to FALSE
RG_DEPENDENCY is set to FALSE
:node_up[+199] set -a
:node_up[+200] clsetenvgrp aglzdb1 node_up
:clsetenvgrp[+50] [[ high = high ]]
:clsetenvgrp[+50] version=1.16
:clsetenvgrp[+52] usingVer=clSetenvgrp
:clsetenvgrp[+57] clSetenvgrp aglzdb1 node_up
executing clSetenvgrp
clSetenvgrp completed successfully
:clsetenvgrp[+58] exit 0
:node_up[+200] eval FORCEDOWN_GROUPS="" RESOURCE_GROUPS="" HOMELESS_GROUPS="" HOMELESS_FOLLOWER_GROUPS="" ERRSTATE_GROUPS="" PRINCIPAL_ACTIONS="" ASSOCIATE_ACTIONS="" AUXILLIARY_ACTIONS=""
:node_up[+200] FORCEDOWN_GROUPS= RESOURCE_GROUPS= HOMELESS_GROUPS= HOMELESS_FOLLOWER_GROUPS= ERRSTATE_GROUPS= PRINCIPAL_ACTIONS= ASSOCIATE_ACTIONS= AUXILLIARY_ACTIONS=
:node_up[+201] RC=0
:node_up[+202] set +a
:node_up[+203] : exit status of clsetenvgrp aglzdb1 node_up is: 0
:node_up[+204] (( 0 != 0 ))
:node_up[+212] rm -f /tmp/.RPCLOCKDSTOPPED
:node_up[+218] process_resources FENCE
:process_resources[2423] [[ high == high ]]
:process_resources[2423] version=1.125
:process_resources[2425] STATUS=0
:process_resources[2426] sddsrv_off=FALSE
:process_resources[2428] true
:process_resources[2430] : call rgpa, and it will tell us what to do next
:process_resources[2432] set -a
:process_resources[2433] clRGPA FENCE
:clRGPA[+49] [[ high = high ]]
:clRGPA[+49] version=1.16
:clRGPA[+51] usingVer=clrgpa
:clRGPA[+56] clrgpa FENCE
2013-10-12T09:57:27.669456 clrgpa
:clRGPA[+57] exit 0
:process_resources[2433] eval JOB_TYPE=NONE
:process_resources[1] JOB_TYPE=NONE
:process_resources[2434] RC=0
:process_resources[2435] set +a
:process_resources[2437] (( 0 != 0 ))
:process_resources[2443] RESOURCE_GROUPS=''
:process_resources[2444] GROUPNAME=''
:process_resources[2444] export GROUPNAME
:process_resources[2748] break
:process_resources[2759] : If sddsrv was turned off above, turn it back on again
:process_resources[2761] [[ FALSE == TRUE ]]
:process_resources[2767] exit 0
:node_up[+228] [[ aglzdb1 = aglzdb1 ]]
:node_up[+228] [[ REAL = EMUL ]]
:node_up[+240] rm -f /usr/es/sbin/cluster/etc/.hacmp_wlm_config_changed
:node_up[+243] cl_wlm_reconfig node_up
:node_up[+243] EMULATE=REAL
:cl_wlm_reconfig[+297] [[ high = high ]]
:cl_wlm_reconfig[+297] version=1.14
:cl_wlm_reconfig[+298] :cl_wlm_reconfig[+298] cl_get_path
HA_DIR=es
:cl_wlm_reconfig[+299] SCD=/usr/es/sbin/cluster/etc/objrepos/stage
:cl_wlm_reconfig[+300] ACD=/usr/es/sbin/cluster/etc/objrepos/active
:cl_wlm_reconfig[+302] EMULATE=REAL
:cl_wlm_reconfig[+304] CALLING_EVENT=node_up
:cl_wlm_reconfig[+306] HA_WLM_CLASSES=
:cl_wlm_reconfig[+308] :cl_wlm_reconfig[+308] awk BEGIN { FS = ":" } $1 !~ /^#.*/ { print $1 }
:cl_wlm_reconfig[+308] /usr/es/sbin/cluster/utilities/clwlmruntime -l -d /usr/es/sbin/cluster/etc/objrepos/active
HA_WLM_CONFIG=HA_WLM_config
:cl_wlm_reconfig[+309] [[ -z HA_WLM_config ]]
:cl_wlm_reconfig[+318] WLM_CONFIG_FILES=classes limits shares rules
:cl_wlm_reconfig[+321] [[ reconfig_resources = node_up ]]
:cl_wlm_reconfig[+326] build_class_list
:cl_wlm_reconfig[build_class_list+4] PRIMARY=
:cl_wlm_reconfig[build_class_list+5] SECONDARY=
:cl_wlm_reconfig[build_class_list+8] GROUP=
:cl_wlm_reconfig[build_class_list+9] NODES=
:cl_wlm_reconfig[build_class_list+10] STARTUP_PREF=
:cl_wlm_reconfig[build_class_list+11] FALLOVER_PREF=
:cl_wlm_reconfig[build_class_list+12] FALLBACK_PREF=
:cl_wlm_reconfig[build_class_list+13] /usr/es/sbin/cluster/utilities/clgetgrp -c
:cl_wlm_reconfig[build_class_list+14] read line
:cl_wlm_reconfig[build_class_list+13] grep -v -E ^#
:cl_wlm_reconfig[build_class_list+16] :cl_wlm_reconfig[build_class_list+16] cut -d: -f1
:cl_wlm_reconfig[build_class_list+16] echo aglz_resource_group::ignore:aglzdb1 aglzdb2:OHN:FNPN:FBHPN: :
GROUP=aglz_resource_group
:cl_wlm_reconfig[build_class_list+17] :cl_wlm_reconfig[build_class_list+17] cut -d: -f4
:cl_wlm_reconfig[build_class_list+17] echo aglz_resource_group::ignore:aglzdb1 aglzdb2:OHN:FNPN:FBHPN: :
NODES=aglzdb1 aglzdb2
:cl_wlm_reconfig[build_class_list+18] :cl_wlm_reconfig[build_class_list+18] cut -d: -f5
:cl_wlm_reconfig[build_class_list+18] echo aglz_resource_group::ignore:aglzdb1 aglzdb2:OHN:FNPN:FBHPN: :
STARTUP_PREF=OHN
:cl_wlm_reconfig[build_class_list+19] :cl_wlm_reconfig[build_class_list+19] cut -d: -f6
:cl_wlm_reconfig[build_class_list+19] echo aglz_resource_group::ignore:aglzdb1 aglzdb2:OHN:FNPN:FBHPN: :
FALLOVER_PREF=FNPN
:cl_wlm_reconfig[build_class_list+20] :cl_wlm_reconfig[build_class_list+20] cut -d: -f7
:cl_wlm_reconfig[build_class_list+20] echo aglz_resource_group::ignore:aglzdb1 aglzdb2:OHN:FNPN:FBHPN: :
FALLBACK_PREF=FBHPN
:cl_wlm_reconfig[build_class_list+20] [[ -z aglz_resource_groupaglzdb1 aglzdb2OHNFNPNFBHPN ]]
:cl_wlm_reconfig[build_class_list+20] [[ OHN = OHN ]]
:cl_wlm_reconfig[build_class_list+20] [[ aglzdb1 = aglzdb1 ]]
:cl_wlm_reconfig[build_class_list+35] PRIMARY= aglz_resource_group
:cl_wlm_reconfig[build_class_list+14] read line
:cl_wlm_reconfig[build_class_list+68] :cl_wlm_reconfig[build_class_list+68] odmget -q group = aglz_resource_group and name = 'WLM_PRIMARY' HACMPresource
:cl_wlm_reconfig[build_class_list+68] sed s/"//g
:cl_wlm_reconfig[build_class_list+68] awk $1 = /value/ { print $3 }
WLM_PRIMARY=
:cl_wlm_reconfig[build_class_list+68] [[ -n ]]
:cl_wlm_reconfig[+327] [[ -z ]]
:cl_wlm_reconfig[+329] exit 3
:node_up[+244] WLM_STATUS=3
:node_up[+247] (( 0 == 3 ))
:node_up[+265] :node_up[+265] cl_rrmethods2call ss_load
:cl_rrmethods2call[+49] [[ high = high ]]
:cl_rrmethods2call[+49] version=1.17
:cl_rrmethods2call[+50] :cl_rrmethods2call[+50] cl_get_path
HA_DIR=es
:cl_rrmethods2call[+76] RRMETHODS=
:cl_rrmethods2call[+77] NEED_RR_ENV_VARS=no
:cl_rrmethods2call[+79] [[ aglzdb1 = aglzdb1 ]]
:cl_rrmethods2call[+99] NEED_RR_ENV_VARS=yes
:cl_rrmethods2call[+114] [[ yes = yes ]]
:cl_rrmethods2call[+116] cllsres
:cl_rrmethods2call[+116] 2> /dev/null
:cl_rrmethods2call[+116] eval APPLICATIONS="aglz_server" FILESYSTEM="" FORCED_VARYON="false" FSCHECK_TOOL="fsck" FS_BEFORE_IPADDR="false" RECOVERY_METHOD="sequential" SERVICE_LABEL="aglzdb_svc" SSA_DISK_FENCING="false" VG_AUTO_IMPORT="false" VOLUME_GROUP="datavg"
:cl_rrmethods2call[+116] APPLICATIONS=aglz_server FILESYSTEM= FORCED_VARYON=false FSCHECK_TOOL=fsck FS_BEFORE_IPADDR=false RECOVERY_METHOD=sequential SERVICE_LABEL=aglzdb_svc SSA_DISK_FENCING=false VG_AUTO_IMPORT=false VOLUME_GROUP=datavg
:cl_rrmethods2call[+120] [[ -n ]]
:cl_rrmethods2call[+125] [[ -n ]]
:cl_rrmethods2call[+130] [[ -n ]]
:cl_rrmethods2call[+135] [[ -n ]]
:cl_rrmethods2call[+140] [[ -n ]]
:cl_rrmethods2call[+145] echo
:cl_rrmethods2call[+146] exit 0
METHODS=
:node_up[+281] :node_up[+281] odmget -qnodename = aglzdb1 HACMPadapter
:node_up[+281] grep hps
:node_up[+281] grep type
SP_SWITCH=