银行

hacmp故障问题

问题描述如下: 两台8203-E4A  (520)AIX 5.3  hacmp5.2.0.5  现在应用是是正常跑的。但是hacmp状态异常。。。。


A:# oslevel -s
5300-08-01-0819
# lssrc -s clstrmgrES
Subsystem         Group            PID          Status
clstrmgrES       cluster          213214       stopping

  cluster.es.server.rte      5.2.0.5  COMMITTED  ES Base Server Runtime
# lssrc -ls clstrmgrES
Current state: ST_BARRIER
i_local_nodeid 1, i_local_siteid -1, my_handle 2
ml_idx[1]=0     ml_idx[2]=1     
tp is 20185298
Events on event queue:
te_type 4, te_nodeid 1, te_network -1
te_type 4, te_nodeid 2, te_network -1
There are 0 events on the Ibcast queue
There are 0 events on the RM Ibcast queue
CLversion: 7
sccsid = "@(#)36   1.137   src/43haes/usr/sbin/cluster/hacmprd/main.C, hacmp.pe, 51haes_r520 12/9/04 14:52:34"
local node vrmf is 5204
cluster fix level is "4"
The following timer(s) are currently active:
Current DNP values
DNP Values for NodeId - 1  NodeName - TDMRAC1
    PgSpFree = 0  PvPctBusy = 0  PctTotalTimeIdle = 0.000000
DNP Values for NodeId - 2  NodeName - TDMRAC2
    PgSpFree = 0  PvPctBusy = 0  PctTotalTimeIdle = 0.000000


A:
# oslevel -s
5300-08-01-0819
# lssrc -s clstrmgrES
Subsystem         Group            PID          Status
clstrmgrES       cluster          315440       stopping
cluster.es.server.rte      5.2.0.5  COMMITTED  ES Base Server Runtime
# lssrc -ls clstrmgrES
Current state: ST_RP_FAILED
i_local_nodeid 0, i_local_siteid -1, my_handle 1
ml_idx[1]=0     ml_idx[2]=1     
tp is 20161b68
Events on event queue:
te_type 4, te_nodeid 1, te_network -1
There are 0 events on the Ibcast queue
There are 0 events on the RM Ibcast queue
CLversion: 7
sccsid = "@(#)36   1.137   src/43haes/usr/sbin/cluster/hacmprd/main.C, hacmp.pe, 51haes_r520 12/9/04 14:52:34"
local node vrmf is 5204
cluster fix level is "4"
The following timer(s) are currently active:
Current DNP values
DNP Values for NodeId - 1  NodeName - TDMRAC1
    PgSpFree = 0  PvPctBusy = 0  PctTotalTimeIdle = 0.000000
DNP Values for NodeId - 2  NodeName - TDMRAC2
    PgSpFree = 0  PvPctBusy = 0  PctTotalTimeIdle = 0.000000

附件:

附件图标 hacmp_log.zip (2.56 MB)

参与9

8 同行回答

北京荣歆咨询 北京荣歆咨询 系统架构师 北京荣歆咨询有限公司
回复 4# zaizai397 貌似其HA就没配好,需要HA的话就重配HA呗。显示全部
回复 4# zaizai397

貌似其HA就没配好,需要HA的话就重配HA呗。 收起
IT咨询服务 · 2015-05-08
浏览2917
北京荣歆咨询 北京荣歆咨询 系统架构师 北京荣歆咨询有限公司
在hacmp.out.1日志中可知2013年10月4日至今HA没有正常运行过。详见摘选:HACMP Event SummaryEvent: node_down TDMRAC1 graceful Start time: Fri Oct  4 09:36:36 2013End time: Fri Oct  4 09:36:38 2013Action:        &...显示全部
在hacmp.out.1日志中可知2013年10月4日至今HA没有正常运行过。详见摘选:
HACMP Event Summary
Event: node_down TDMRAC1 graceful
Start time: Fri Oct  4 09:36:36 2013

End time: Fri Oct  4 09:36:38 2013

Action:                Resource:                        Script Name:
----------------------------------------------------------------------------
Releasing resource group:        sg_tdmrac        process_resources
Search on: Fri.Oct.4.09:36:37.BEIST.2013.process_resources.sg_tdmrac.ref
Releasing resource:        All_volume_groups        cl_deactivate_vgs
Search on: Fri.Oct.4.09:36:37.BEIST.2013.cl_deactivate_vgs.All_volume_groups.sg_tdmrac.ref
Error encountered with resource:        rac2vg        cl_deactivate_vgs
Search on: Fri.Oct.4.09:36:37.BEIST.2013.cl_deactivate_vgs.rac2vg.sg_tdmrac.ref
Additional error information:        varyoffvg rac2vg failed with return code 1
。。。。。。。
WARNING: Cluster TDMRAC has been running recovery program '/usr/es/sbin/cluster/events/node_down.rp' for 360 seconds. Please check cluster status.
WARNING: Cluster TDMRAC has been running recovery program '/usr/es/sbin/cluster/events/node_down.rp' for 390 seconds. Please check cluster status.
WARNING: Cluster TDMRAC has been running recovery program '/usr/es/sbin/cluster/events/node_down.rp' for 420 seconds. Please check cluster status.
。。。。。。。。
WARNING: Cluster TDMRAC has been running recovery program '/usr/es/sbin/cluster/events/node_down.rp' for 49980180 seconds. Please check cluster status.
WARNING: Cluster TDMRAC has been running recovery program '/usr/es/sbin/cluster/events/node_down.rp' for 49983780 seconds. Please check cluster status.

2013年10月4日执行了graceful方式停止TDMRAC1,由于rac2vg未能正常释放,导致nodedown不成功,恢复程序在后台运行了约5千万秒(约一年加7个多月)。 收起
IT咨询服务 · 2015-05-07
浏览3224

提问者

zaizai397
zaizai397 0 0 1
系统运维工程师 湖南三湘银行
评论16

问题状态

  • 发布时间:2015-05-07
  • 关注会员:1 人
  • 问题浏览:8623
  • 最近回答:2015-05-11
  • X社区推广