互联网服务HA

HA网络中断导致系统关机

powerHA采用网络心跳,现将node1节点的网线全部拔掉,业务能够正常切换到node2节点上,再将网线接回去出现node2节点自动关机,系统错误日志如下:
LABEL:          MUSENT_LINK_DOWN
IDENTIFIER:     F1814D51
Date/Time:       Thu Mar  7 13:21:10 GMT+08:00 2013
Sequence Number: 78
Machine Id:      00F7E46A4C00
Node Id:         hd02
Class:           H
Type:            TEMP
WPAR:            Global
Resource Name:   ent0            
Resource Class:  adapter
Resource Type:   e41457162004000
Location:        U78AA.001.WZSHV0Y-P1-C7-T1
VPD:            
      PCIe2 4-port 1GbE Adapter:
        FRU Number..................74Y4064
        EC Level....................D77125A
        Customer Card ID Number.....576F
        Part Number.................00E1681
        Feature Code/Marketing ID...5899
        Serial Number...............YL502025502B
        Manufacture ID..............5CF3FC5D61AC
        Network Address.............5CF3FC5D61AC
        ROM Level.(alterable).......10040150
Description
ETHERNET DOWN
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
Detail Data
FILE NAME
line: 315 file: musent_limbo.c
PCI ETHERNET STATISTICS
0000 1510 0463 0953 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000
DEVICE DRIVER INTERNAL STATE
0000 0000 0000 0000 0000 0000
SOURCE ADDRESS
0000 0000 0000
Diagnostic Analysis
Diagnostic Log sequence number: 29
Resource tested: ent0
Menu Number:  2E41902
Description:

No trouble was found with this adapter.  However
Error Log Analysis indicates that there recently may
have been a network problem.
If your Ethernet adapter is connected to a network,
and if you are experiencing problems with network
communications, check for a loose or defective
cable or connection.
If a switch or another system is directly attached
to the Ethernet adapter, verify it is powered up,
configured, and functioning correctly.
---------------------------------------------------------------------------
LABEL:          RMCD_INFO_0_ST
IDENTIFIER:     A6DF45AA
Date/Time:       Thu Mar  7 11:54:09 GMT+08:00 2013
Sequence Number: 77
Machine Id:      00F7E46A4C00
Node Id:         hd02
Class:           O
Type:            INFO
WPAR:            Global
Resource Name:   RMCdaemon      
Description
The daemon is started.
Probable Causes
The Resource Monitoring and Control daemon has been started.
User Causes
The startsrc -s ctrmc command has been executed or
the rmcctrl -s command has been executed.
Recommended Actions
Confirm that the daemon should be started.
Detail Data
DETECTING MODULE
RSCT,rmcd.c,1.75.1.2,229                     
ERROR ID
6eKora0Vx.CF/VY4/2gk09....................
REFERENCE CODE


---------------------------------------------------------------------------
LABEL:          REBOOT_ID
IDENTIFIER:     2BFA76F6
Date/Time:       Thu Mar  7 11:51:20 GMT+08:00 2013
Sequence Number: 75
Machine Id:      00F7E46A4C00
Node Id:         hd02
Class:           S
Type:            TEMP
WPAR:            Global
Resource Name:   SYSPROC         
Description
SYSTEM SHUTDOWN BY USER
Probable Causes
SYSTEM SHUTDOWN
Detail Data
USER ID
           0
0=SOFT IPL 1=HALT 2=TIME REBOOT
           1
TIME TO REBOOT (FOR TIMED REBOOT ONLY)
           0
---------------------------------------------------------------------------
LABEL:          ERRLOG_ON
IDENTIFIER:     9DBCFDEE
Date/Time:       Thu Mar  7 11:53:32 GMT+08:00 2013
Sequence Number: 74
Machine Id:      00F7E46A4C00
Node Id:         hd02
Class:           O
Type:            TEMP
WPAR:            Global
Resource Name:   errdemon        
Description
ERROR LOGGING TURNED ON
Probable Causes
ERRDEMON STARTED AUTOMATICALLY
User Causes
/USR/LIB/ERRDEMON COMMAND
Recommended Actions
NONE
---------------------------------------------------------------------------
LABEL:          TS_STOP_ST
IDENTIFIER:     6D19271E
Date/Time:       Thu Mar  7 11:47:33 GMT+08:00 2013
Sequence Number: 73
Machine Id:      00F7E46A4C00
Node Id:         hd02
Class:           O
Type:            INFO
WPAR:            Global
Resource Name:   topsvcs         
Description
Topology Services daemon stopped
Probable Causes
Daemon stopped by SRC
Daemon stopped by signal
User Causes
Daemon stopped by user
Recommended Actions
Confirm that this is desirable
Detail Data
DETECTING MODULE
rsct,comm.C,1.156,690                        
ERROR ID
6SQG4h/Jr.CF/IDM.2gk09....................
REFERENCE CODE
6UpNEL0Y5MBF/nKj/2gk09....................
Topology Services daemon stopped by:
Signal SIGTERM
---------------------------------------------------------------------------
LABEL:          OPMSG
IDENTIFIER:     AA8AB241
Date/Time:       Thu Mar  7 11:47:31 GMT+08:00 2013
Sequence Number: 72
Machine Id:      00F7E46A4C00
Node Id:         hd02
Class:           O
Type:            TEMP
WPAR:            Global
Resource Name:   OPERATOR        
Description
OPERATOR NOTIFICATION
User Causes
ERRLOGGER COMMAND
Recommended Actions
REVIEW DETAILED DATA
Detail Data
MESSAGE FROM ERRLOGGER COMMAND
clexit.rc : Unexpected termination of clstrmgrES
---------------------------------------------------------------------------
LABEL:          CORE_DUMP
IDENTIFIER:     A924A5FC
Date/Time:       Thu Mar  7 11:47:31 GMT+08:00 2013
Sequence Number: 71
Machine Id:      00F7E46A4C00
Node Id:         hd02
Class:           S
Type:            PERM
WPAR:            Global
Resource Name:   SYSPROC         
Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED
Probable Causes
SOFTWARE PROGRAM
User Causes
USER GENERATED SIGNAL
Recommended Actions
CORRECT THEN RETRY
Failure Causes
SOFTWARE PROGRAM
Recommended Actions
RERUN THE APPLICATION PROGRAM
IF PROBLEM PERSISTS THEN DO THE FOLLOWING
CONTACT APPROPRIATE SERVICE REPRESENTATIVE
Detail Data
SIGNAL NUMBER
           6
USER'S PROCESS ID:
              10158178
FILE SYSTEM SERIAL NUMBER
           4
INODE NUMBER
           0       23085
CORE FILE NAME
/var/hacmp/core
PROGRAM NAME
clstrmgr
STACK EXECUTION DISABLED
           0
COME FROM ADDRESS REGISTER
fflush_un 7F0
PROCESSOR ID
  hw_fru_id: N/A
  hw_cpu_id: N/A
ADDITIONAL INFORMATION
pthread_k A0
??
_p_raise 4C
raise 44
abort C8
die__Fi 6F4
announcem 330
kill_grp_ 158
ha_gs_dis 2EB0
ha_gs_dis 4C
DoMainLoo 760
main 810
__start 9C
Symptom Data
REPORTABLE
1
INTERNAL ERROR
1
SYMPTOM CODE
PIDS/5765E6200 LVLS/520 PCSS/SPI2 FLDS/clstrmgr SIG/6 FLDS/die__Fi VALU/6f4
---------------------------------------------------------------------------
LABEL:          SRC_SVKO
IDENTIFIER:     BC3BE5A3
Date/Time:       Thu Mar  7 11:47:31 GMT+08:00 2013
Sequence Number: 70
Machine Id:      00F7E46A4C00
Node Id:         hd02
Class:           S
Type:            PERM
WPAR:            Global
Resource Name:   SRC            
Description
SOFTWARE PROGRAM ERROR
Probable Causes
APPLICATION PROGRAM
Failure Causes
SOFTWARE PROGRAM
Recommended Actions
MANUALLY RESTART SUBSYSTEM IF NEEDED
Detail Data
SYMPTOM CODE
      393222
SOFTWARE ERROR CODE
       -9017
ERROR CODE
           0
DETECTING MODULE
[url=mailto:]'srchevn.c'@line:'376'[/url]
FAILING MODULE
clstrmgrES
---------------------------------------------------------------------------
LABEL:          SRC_RSTRT
IDENTIFIER:     CB4A951F
Date/Time:       Thu Mar  7 11:47:31 GMT+08:00 2013
Sequence Number: 69
Machine Id:      00F7E46A4C00
Node Id:         hd02
Class:           S
Type:            INFO
WPAR:            Global
Resource Name:   SRC            
Description
SOFTWARE PROGRAM ERROR
Probable Causes
APPLICATION PROGRAM
Failure Causes
SOFTWARE PROGRAM
Recommended Actions
VERIFY SUBSYSTEM RESTARTED AUTOMATICALLY
Detail Data
SYMPTOM CODE
           0
SOFTWARE ERROR CODE
       -9035
ERROR CODE
           0
DETECTING MODULE
[url=mailto:]'srchevn.c'@line:'234'[/url]
FAILING MODULE
emsvcs
---------------------------------------------------------------------------
LABEL:          SRC_SVKO
IDENTIFIER:     BC3BE5A3
Date/Time:       Thu Mar  7 11:47:31 GMT+08:00 2013
Sequence Number: 68
Machine Id:      00F7E46A4C00
Node Id:         hd02
Class:           S
Type:            PERM
WPAR:            Global
Resource Name:   SRC            
Description
SOFTWARE PROGRAM ERROR
Probable Causes
APPLICATION PROGRAM
Failure Causes
SOFTWARE PROGRAM
Recommended Actions
MANUALLY RESTART SUBSYSTEM IF NEEDED
Detail Data
SYMPTOM CODE
        2560
SOFTWARE ERROR CODE
       -9017
ERROR CODE
           0
DETECTING MODULE
[url=mailto:]'srchevn.c'@line:'376'[/url]
FAILING MODULE
grpsvcs
---------------------------------------------------------------------------
LABEL:          HA002_ER
IDENTIFIER:     12081DC6
Date/Time:       Thu Mar  7 11:47:31 GMT+08:00 2013
Sequence Number: 67
Machine Id:      00F7E46A4C00
Node Id:         hd02
Class:           S
Type:            PERM
WPAR:            Global
Resource Name:   haemd           
Description
SOFTWARE PROGRAM ERROR
Probable Causes
SUBSYSTEM
Failure Causes
SUBSYSTEM
Recommended Actions
REPORT DETAILED DATA
CONTACT APPROPRIATE SERVICE REPRESENTATIVE
Detail Data
DETECTING MODULE
LPP=PSSP,Fn=emd_gsi.c,SID=1.4.1.37,L#=1395,                                    
DIAGNOSTIC EXPLANATION
haemd: 2521-032 Cannot dispatch group services (1).
---------------------------------------------------------------------------
LABEL:          GS_DOM_MERGE_ER
IDENTIFIER:     9DEC29E1
Date/Time:       Thu Mar  7 11:47:31 GMT+08:00 2013
Sequence Number: 66
Machine Id:      00F7E46A4C00
Node Id:         hd02
Class:           O
Type:            PERM
WPAR:            Global
Resource Name:   grpsvcs         
Description
Group Services daemon exit to merge domains
Probable Causes
Network between two node groups has repaired
Failure Causes
Network communication has been blocked.
Topology Services has been partitioned.
Recommended Actions
Check the network connection.
Check the Topology Services.
Verify that Group Services daemon has been restarted
Call IBM Service if problem persists
Detail Data
DETECTING MODULE
RSCT,NS.C,1.107.1.56,4755                     
ERROR ID
6Vb0vR0Hr.CF/54i.2gk09....................
REFERENCE CODE


DIAGNOSTIC EXPLANATION
NS::Ack(): The master requests to dissolve my domain because of the merge with other domain 1.2

不知有谁给帮忙看下,谢谢!
参与3

2同行回答

xieyadongxieyadong系统工程师南方电网
关注一哈显示全部
关注一哈收起
系统集成 · 2013-11-19
浏览3505
zwz99999zwz99999系统工程师dcits
你网络心跳是怎么配置的?系统什么版本?ha那个版本?是否有bug也不一定!显示全部
你网络心跳是怎么配置的?系统什么版本?ha那个版本?是否有bug也不一定!收起
系统集成 · 2013-11-19
浏览3199

提问者

wp35678q
软件开发工程师电力

相关问题

相关资料

相关文章

问题状态

  • 发布时间:2013-11-18
  • 关注会员:1 人
  • 问题浏览:7463
  • 最近回答:2013-11-19
  • X社区推广