崔增顺
作者崔增顺2017-02-19 17:31
系统运维工程师, 民生银行

【案例分享】某企业HACMP软件,在网络交换机变更是引起down机

字数 3014阅读 4288评论 0赞 1

某企业HA cluster log, IP switch down时引起双节点halt,系统版本7100-03-03,HA版本6.1sp13

Error description

In HACMP 6 with rsct.core.utils 3.1.4.9 or higher, if all

IP networks are lost and at least one non-IP network is

functioning, the Group Services subsystem will core dump when

trying to send packets to be routed through Topology Services

(across the non-IP connection). This will cause a node halt.

Customers with PowerHA 7, or HACMP 6 customers with no non-IP

networks (such as rs232 or disk) are not in danger. Also this

will not happen if only one node is still running, since there

will be no other cluster members to send messages to.

日志如下

Nov 21 01:35:46 masterserv1 daemon:notice topsvcs[8192030]: (Recorded using libct_ffdc.a cv 2):::Error ID: 6zV5DL.mbpHK/ONs/o.Ama/...................:::Reference ID:

:::Template ID: 173c787f:::Details File: :::Location: rsct,nim_control.C,1.39.1.41,6717 :::TS_LOC_DOWN_ST Possible malfunction on local adapter Adapter i

nterface name en2 Adapter offset 1 Adapter IP address 192.200.192.52

Nov 21 01:35:49 masterserv1 user:notice HACMP for AIX: EVENT START: fail_standby masterserv1 192.200.192.52

Nov 21 01:35:49 masterserv1 user:notice HACMP for AIX: EVENT COMPLETED: fail_standby masterserv1 192.200.192.52 0

Nov 21 01:35:51 masterserv1 user:notice HACMP for AIX: EVENT START: fail_standby masterserv2 192.200.192.53

Nov 21 01:35:51 masterserv1 user:notice HACMP for AIX: EVENT COMPLETED: fail_standby masterserv2 192.200.192.53 0

Nov 21 01:40:34 masterserv1 daemon:notice topsvcs[8192030]: (Recorded using libct_ffdc.a cv 2):::Error ID: 6zV5DL.GgpHK/DLG.o.Ama/...................:::Reference ID:

:::Template ID: 173c787f:::Details File: :::Location: rsct,nim_control.C,1.39.1.41,6717 :::TS_LOC_DOWN_ST Possible malfunction on local adapter Adapter i

nterface name en0 Adapter offset 0 Adapter IP address 102.200.192.52

Nov 21 01:40:36 masterserv1 local0:crit clstrmgrES[15925314]: Sat Nov 21 01:40:36 announcementCb: Called, state=ST_UNSTABLE, provider token 1

Nov 21 01:40:36 masterserv1 local0:crit clstrmgrES[15925314]: Sat Nov 21 01:40:36 announcementCb: GsToken 2, AdapterToken 3, rm_GsToken 1

Nov 21 01:40:36 masterserv1 local0:crit clstrmgrES[15925314]: Sat Nov 21 01:40:36 announcementCb: GRPSVCS announcment code=512; exiting

Nov 21 01:40:36 masterserv1 local0:crit clstrmgrES[15925314]: Sat Nov 21 01:40:36 CHECK FOR FAILURE OF RSCT SUBSYSTEMS (topsvcs or grpsvcs)

Nov 21 01:40:36 masterserv1 daemon:err|error haemd[15204586]: LPP=PSSP,Fn=emd_gsi.c,SID=1.4.1.37,L#=1395, haemd: 2521-032 Cannot d

ispatch group services (1).

Nov 21 01:40:36 masterserv1 user:notice HACMP for AIX: clexit.rc : Unexpected termination of clstrmgrES.

Nov 21 01:40:36 masterserv1 user:notice HACMP for AIX: clexit.rc : Halting system immediately!!!

原因是补丁IV55293: HAGSD CORE DUMP WHEN IP NETWORKS LOST, 需要升级rsct文件集。

官网解释:

http://www-01.ibm.com/support/docview.wss?uid=isg1IV55293

来自社区交流活动“AIX系统日常运维中故障分析及处理在线技术交流
由社区会员qb306发布

如果觉得我的文章对您有用,请点赞。您的支持将鼓励我继续创作!

1

添加新评论0 条评论

Ctrl+Enter 发表

本文隶属于专栏

PowerVC专栏
本专栏主要分享PwerVM和PowerVC相关方面的架构、实施、运维等经验,以及企业私有云建设的相关经验及总结。

关于TWT  使用指南  社区专家合作  厂商入驻社区  企业招聘  投诉建议  版权与免责声明  联系我们
© 2019  talkwithtrend — talk with trend,talk with technologist 京ICP备09031017号-30