某企业HA cluster log, IP switch down时引起双节点halt,系统版本7100-03-03,HA版本6.1sp13
Error description
In HACMP 6 with rsct.core.utils 3.1.4.9 or higher, if all
IP networks are lost and at least one non-IP network is
functioning, the Group Services subsystem will core dump when
trying to send packets to be routed through Topology Services
(across the non-IP connection). This will cause a node halt.
Customers with PowerHA 7, or HACMP 6 customers with no non-IP
networks (such as rs232 or disk) are not in danger. Also this
will not happen if only one node is still running, since there
will be no other cluster members to send messages to.
日志如下
Nov 21 01:35:46 masterserv1 daemon:notice topsvcs[8192030]: (Recorded using libct_ffdc.a cv 2):::Error ID: 6zV5DL.mbpHK/ONs/o.Ama/...................:::Reference ID:
:::Template ID: 173c787f:::Details File: :::Location: rsct,nim_control.C,1.39.1.41,6717 :::TS_LOC_DOWN_ST Possible malfunction on local adapter Adapter i
nterface name en2 Adapter offset 1 Adapter IP address 192.200.192.52
Nov 21 01:35:49 masterserv1 user:notice HACMP for AIX: EVENT START: fail_standby masterserv1 192.200.192.52
Nov 21 01:35:49 masterserv1 user:notice HACMP for AIX: EVENT COMPLETED: fail_standby masterserv1 192.200.192.52 0
Nov 21 01:35:51 masterserv1 user:notice HACMP for AIX: EVENT START: fail_standby masterserv2 192.200.192.53
Nov 21 01:35:51 masterserv1 user:notice HACMP for AIX: EVENT COMPLETED: fail_standby masterserv2 192.200.192.53 0
Nov 21 01:40:34 masterserv1 daemon:notice topsvcs[8192030]: (Recorded using libct_ffdc.a cv 2):::Error ID: 6zV5DL.GgpHK/DLG.o.Ama/...................:::Reference ID:
:::Template ID: 173c787f:::Details File: :::Location: rsct,nim_control.C,1.39.1.41,6717 :::TS_LOC_DOWN_ST Possible malfunction on local adapter Adapter i
nterface name en0 Adapter offset 0 Adapter IP address 102.200.192.52
Nov 21 01:40:36 masterserv1 local0:crit clstrmgrES[15925314]: Sat Nov 21 01:40:36 announcementCb: Called, state=ST_UNSTABLE, provider token 1
Nov 21 01:40:36 masterserv1 local0:crit clstrmgrES[15925314]: Sat Nov 21 01:40:36 announcementCb: GsToken 2, AdapterToken 3, rm_GsToken 1
Nov 21 01:40:36 masterserv1 local0:crit clstrmgrES[15925314]: Sat Nov 21 01:40:36 announcementCb: GRPSVCS announcment code=512; exiting
Nov 21 01:40:36 masterserv1 local0:crit clstrmgrES[15925314]: Sat Nov 21 01:40:36 CHECK FOR FAILURE OF RSCT SUBSYSTEMS (topsvcs or grpsvcs)
Nov 21 01:40:36 masterserv1 daemon:err|error haemd[15204586]: LPP=PSSP,Fn=emd_gsi.c,SID=1.4.1.37,L#=1395, haemd: 2521-032 Cannot d
ispatch group services (1).
Nov 21 01:40:36 masterserv1 user:notice HACMP for AIX: clexit.rc : Unexpected termination of clstrmgrES.
Nov 21 01:40:36 masterserv1 user:notice HACMP for AIX: clexit.rc : Halting system immediately!!!
原因是补丁IV55293: HAGSD CORE DUMP WHEN IP NETWORKS LOST, 需要升级rsct文件集。
官网解释:
http://www-01.ibm.com/support/docview.wss?uid=isg1IV55293
来自社区交流活动“AIX系统日常运维中故障分析及处理在线技术交流”
由社区会员qb306发布
如果觉得我的文章对您有用,请点赞。您的支持将鼓励我继续创作!
赞1
添加新评论0 条评论