系统集成网络安装部署

网络中断导致HACMP已经node关闭日志,请大侠们讲讲日志过程

个为老大,我刚刚接触hacmp,遇到问题对日志看不太懂,请老大们给讲讲日志里两台机器在做什么。

附件:

附件图标新建文件夹.rar (53.13 KB)

参与17

17同行回答

zbc2602zbc2602技术总监北京南天软件有限公司
问题就是网络造成的显示全部
问题就是网络造成的收起
系统集成 · 2009-12-09
浏览1460
zbc2602zbc2602技术总监北京南天软件有限公司
Unexpected termination of clstrmgrES.如何找到被异常终止的原因。显示全部
Unexpected termination of clstrmgrES.
如何找到被异常终止的原因。收起
系统集成 · 2009-11-26
浏览1449
myciciymyciciyIT顾问某金融科技公司
Nov 22 16:36:35 node83f884d user:notice HACMP for AIX: clexit.rc : Unexpected termination of clstrmgrES.Nov 22 16:36:35 node83f884d user:notice HACMP for AIX: clexit.rc : Halting system immediately!!!显示全部
Nov 22 16:36:35 node83f884d user:notice HACMP for AIX: clexit.rc : Unexpected termination of clstrmgrES.
Nov 22 16:36:35 node83f884d user:notice HACMP for AIX: clexit.rc : Halting system immediately!!!收起
银行 · 2009-11-26
浏览1451
zbc2602zbc2602技术总监北京南天软件有限公司
关键信息LABEL:          GS_DOM_MERGE_ERIDENTIFIER:     9DEC29E1Date/Time:       Sun Nov 22 16:36:34 CST 2009Sequence Number: 1562Machine Id:      00CF884D4C00No...显示全部
关键信息
LABEL:          GS_DOM_MERGE_ER
IDENTIFIER:     9DEC29E1

Date/Time:       Sun Nov 22 16:36:34 CST 2009
Sequence Number: 1562
Machine Id:      00CF884D4C00
Node Id:         node83f884d
Class:           O
Type:            PERM
Resource Name:   grpsvcs         

Description
Group Services daemon exit to merge domains

Probable Causes
Network between two node groups has repaired

Failure Causes
Network communication has been blocked.
Topology Services has been partitioned.

        Recommended Actions
        Check the network connection.
Check the Topology Services.
Verify that Group Services daemon has been restarted
Call IBM Service if problem persists

Detail Data
DETECTING MODULE
RSCT,NS.C,1.107.1.35,4370                     
ERROR ID
6Vb0vR0mnP09/olj0C4.2g0...................
REFERENCE CODE
                                          
DIAGNOSTIC EXPLANATION
The master requests to dissolve my domain because of the merge with other domain 1.10


以下是errpt日志截图

LABEL:          ERRLOG_ON
IDENTIFIER:     9DBCFDEE

Date/Time:       Mon Nov 23 09:12:20 CST 2009
Sequence Number: 1568
Machine Id:      00CF884D4C00
Node Id:         localhost
Class:           O
Type:            TEMP
Resource Name:   errdemon        

Description
ERROR LOGGING TURNED ON

Probable Causes
ERRDEMON STARTED AUTOMATICALLY

User Causes
/USR/LIB/ERRDEMON COMMAND

        Recommended Actions
        NONE

---------------------------------------------------------------------------
LABEL:          OPMSG
IDENTIFIER:     AA8AB241

Date/Time:       Sun Nov 22 16:36:35 CST 2009
Sequence Number: 1567
Machine Id:      00CF884D4C00
Node Id:         node83f884d
Class:           O
Type:            TEMP
Resource Name:   OPERATOR        

Description
OPERATOR NOTIFICATION

User Causes
ERRLOGGER COMMAND

        Recommended Actions
        REVIEW DETAILED DATA

Detail Data
MESSAGE FROM ERRLOGGER COMMAND
clexit.rc : Unexpected termination of clstrmgrES
---------------------------------------------------------------------------
LABEL:          SRC_SVKO
IDENTIFIER:     BC3BE5A3

Date/Time:       Sun Nov 22 16:36:34 CST 2009
Sequence Number: 1566
Machine Id:      00CF884D4C00
Node Id:         node83f884d
Class:           S
Type:            PERM
Resource Name:   SRC            

Description
SOFTWARE PROGRAM ERROR

Probable Causes
APPLICATION PROGRAM

Failure Causes
SOFTWARE PROGRAM

        Recommended Actions
        MANUALLY RESTART SUBSYSTEM IF NEEDED

Detail Data
SYMPTOM CODE
        1024
SOFTWARE ERROR CODE
       -9017
ERROR CODE
           0
DETECTING MODULE
'srchevn.c'@line:'350'
FAILING MODULE
clstrmgrES
---------------------------------------------------------------------------
LABEL:          SRC_RSTRT
IDENTIFIER:     BA431EB7

Date/Time:       Sun Nov 22 16:36:34 CST 2009
Sequence Number: 1565
Machine Id:      00CF884D4C00
Node Id:         node83f884d
Class:           S
Type:            PERM
Resource Name:   SRC            

Description
SOFTWARE PROGRAM ERROR

Probable Causes
APPLICATION PROGRAM

Failure Causes
SOFTWARE PROGRAM

        Recommended Actions
        VERIFY SUBSYSTEM RESTARTED AUTOMATICALLY

Detail Data
SYMPTOM CODE
           0
SOFTWARE ERROR CODE
       -9035
ERROR CODE
           0
DETECTING MODULE
'srchevn.c'@line:'217'
FAILING MODULE
emsvcs
---------------------------------------------------------------------------
LABEL:          SRC_SVKO
IDENTIFIER:     BC3BE5A3

Date/Time:       Sun Nov 22 16:36:34 CST 2009
Sequence Number: 1564
Machine Id:      00CF884D4C00
Node Id:         node83f884d
Class:           S
Type:            PERM
Resource Name:   SRC            

Description
SOFTWARE PROGRAM ERROR

Probable Causes
APPLICATION PROGRAM

Failure Causes
SOFTWARE PROGRAM

        Recommended Actions
        MANUALLY RESTART SUBSYSTEM IF NEEDED

Detail Data
SYMPTOM CODE
        2560
SOFTWARE ERROR CODE
       -9017
ERROR CODE
           0
DETECTING MODULE
'srchevn.c'@line:'350'
FAILING MODULE
grpsvcs
---------------------------------------------------------------------------
LABEL:          HA002_ER
IDENTIFIER:     12081DC6

Date/Time:       Sun Nov 22 16:36:34 CST 2009
Sequence Number: 1563
Machine Id:      00CF884D4C00
Node Id:         node83f884d
Class:           S
Type:            PERM
Resource Name:   haemd           

Description
SOFTWARE PROGRAM ERROR

Probable Causes
SUBSYSTEM

Failure Causes
SUBSYSTEM

        Recommended Actions
        REPORT DETAILED DATA
        CONTACT APPROPRIATE SERVICE REPRESENTATIVE

Detail Data
DETECTING MODULE
LPP=PSSP,Fn=emd_gsi.c,SID=1.4.1.36,L#=1361,                                    
DIAGNOSTIC EXPLANATION
haemd: 2521-032 Cannot dispatch group services (1).

---------------------------------------------------------------------------
LABEL:          GS_DOM_MERGE_ER
IDENTIFIER:     9DEC29E1

Date/Time:       Sun Nov 22 16:36:34 CST 2009
Sequence Number: 1562
Machine Id:      00CF884D4C00
Node Id:         node83f884d
Class:           O
Type:            PERM
Resource Name:   grpsvcs         

Description
Group Services daemon exit to merge domains

Probable Causes
Network between two node groups has repaired

Failure Causes
Network communication has been blocked.
Topology Services has been partitioned.

        Recommended Actions
        Check the network connection.
Check the Topology Services.
Verify that Group Services daemon has been restarted
Call IBM Service if problem persists

Detail Data
DETECTING MODULE
RSCT,NS.C,1.107.1.35,4370                     
ERROR ID
6Vb0vR0mnP09/olj0C4.2g0...................
REFERENCE CODE
                                          
DIAGNOSTIC EXPLANATION
The master requests to dissolve my domain because of the merge with other domain 1.10
---------------------------------------------------------------------------
LABEL:          GOENT_RCVRY_EXIT
IDENTIFIER:     F3931284

Date/Time:       Sun Nov 22 16:36:28 CST 2009
Sequence Number: 1561
Machine Id:      00CF884D4C00
Node Id:         node83f884d
Class:           H
Type:            INFO
Resource Name:   ent5            
Resource Class:  adapter
Resource Type:   14106902
Location:        U5791.001.99B0FGG-P2-C06-T1
VPD:            
        Product Specific.(  ).......10/100/1000 Base-TX PCI-X Adapter
        Part Number.................03N6524
        FRU Number..................03N6524
        EC Level....................H14006
        Manufacture ID..............YL1021
        Network Address.............001125C070AF
        ROM Level.(alterable).......GOL021

Description
ETHERNET NETWORK RECOVERY MODE

        Recommended Actions
        PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
FILE NAME
line: 204 file: goent_intr.c
PCI ETHERNET STATISTICS
00C4 44C2 0063 0853 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 087E 3F51 0000 0003 3E47 3923 0000 0000 0805 323A 003C 57D5 070D C054
0000 0000 013C DCFB 0000 0000 B4FF 8862 0000 0000 0000 0000 0000 0001 0000 1493
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0018 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 BB83 00F0 0068 0C00 0000 0000 01A0 0000 0000
0000 0000 0000 0000 0000 C9E1 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000
DEVICE DRIVER INTERNAL STATE
5555 5555 0000 0000 0000 0000
SOURCE ADDRESS
0011 25C0 70AF
---------------------------------------------------------------------------
LABEL:          GOENT_RCVRY_EXIT
IDENTIFIER:     F3931284

Date/Time:       Sun Nov 22 16:36:28 CST 2009
Sequence Number: 1560
Machine Id:      00CF884D4C00
Node Id:         node83f884d
Class:           H
Type:            INFO
Resource Name:   ent3            
Resource Class:  adapter
Resource Type:   14106902
Location:        U5791.001.99B0FK5-P2-C06-T1
VPD:            
        Product Specific.(  ).......10/100/1000 Base-TX PCI-X Adapter
        Part Number.................03N6524
        FRU Number..................03N6524
        EC Level....................H14006
        Manufacture ID..............YL1021
        Network Address.............001125C092B9
        ROM Level.(alterable).......GOL021

Description
ETHERNET NETWORK RECOVERY MODE

        Recommended Actions
        PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
FILE NAME
line: 204 file: goent_intr.c
PCI ETHERNET STATISTICS
00C4 44C2 0063 0853 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 336D C341 0000 001C 0E4E 0779 0000 0000 306D EEAB 003C 57D4 070D BF56
0000 0000 2BA6 E691 0000 0083 FE0A 110B 0000 0000 0000 0000 0000 0001 0000 1591
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0124 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 BB83 00F0 0068 0C00 0000 0000 01A0 0000 0000
0000 0000 0000 0000 0000 C9E1 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000
DEVICE DRIVER INTERNAL STATE
5555 5555 0000 0000 0000 0000
SOURCE ADDRESS
0011 25C0 92B9
---------------------------------------------------------------------------
LABEL:          TS_LOC_DOWN_ST
IDENTIFIER:     173C787F

Date/Time:       Sun Nov 22 16:24:02 CST 2009
Sequence Number: 1559
Machine Id:      00CF884D4C00
Node Id:         node83f884d
Class:           S
Type:            INFO
Resource Name:   topsvcs         

Description
Possible malfunction on local adapter

Probable Causes
Local adapter mal-functioned
Local adapter lost connection to network
Local adapter mis-configured

Failure Causes
Local adapter mal-functioned
Local adapter lost connection to network
Local adapter mis-configured

        Recommended Actions
        Verify adapter configuration
        Verify network connectivity

Detail Data
DETECTING MODULE
rsct,nim_control.C,1.39.1.10,4865            
ERROR ID
6zV5DL.0cP09/Qqp/C4.2g0...................
REFERENCE CODE
                                          
Adapter interface name
en5
Adapter offset
           1
Adapter IP address
192.168.100.1收起
系统集成 · 2009-11-25
浏览1605
aixp6aixp6软件开发工程师18m
Nov 22 16:24:08 node83f884d user:notice HACMP for AIX: EVENT COMPLETED: node_down_complete node8371e2d 0 Nov 22 16:36:33 node83f884d user:notice HACMP for AIX: EVENT COMPLETED: join_standby node83f884d 192.168.100.1 0 node8371e2d 已经standby,但是node8...显示全部
Nov 22 16:24:08 node83f884d user:notice HACMP for AIX: EVENT COMPLETED: node_down_complete node8371e2d 0

Nov 22 16:36:33 node83f884d user:notice HACMP for AIX: EVENT COMPLETED: join_standby node83f884d 192.168.100.1 0

node8371e2d 已经standby,但是node83f884d 状态不正常,所以把node83f884d 给down了。个人猜是dead man switch 的原因。 errpt还在么,贴一下看看收起
政府机关 · 2009-11-25
浏览1446
skyzqqskyzqq系统运维工程师中国联通河南省分公司
Nov 22 16:36:35 node83f884d user:notice HACMP for AIX: clexit.rc : Unexpected termination of clstrmgrES.显示全部
Nov 22 16:36:35 node83f884d user:notice HACMP for AIX: clexit.rc : Unexpected termination of clstrmgrES.收起
电信运营商 · 2009-11-25
浏览1437
skyzqqskyzqq系统运维工程师中国联通河南省分公司
oracle 9i的rac的话实际上HA应该什么都不作显示全部
oracle 9i的rac的话实际上HA应该什么都不作收起
电信运营商 · 2009-11-25
浏览1444
zbc2602zbc2602技术总监北京南天软件有限公司
重点是这个announcementCb: Called, state=ST_UNSTABLE, provider token 1显示全部
重点是这个announcementCb: Called, state=ST_UNSTABLE, provider
token 1收起
系统集成 · 2009-11-25
浏览1866
zbc2602zbc2602技术总监北京南天软件有限公司
以上日志应该是导致宕机的原因。那位老大给讲讲啊显示全部
以上日志应该是导致宕机的原因。那位老大给讲讲啊收起
系统集成 · 2009-11-25
浏览1875
zbc2602zbc2602技术总监北京南天软件有限公司
不过我先在看出点门道了,贴上来大家给分析一下。Nov 22 16:36:33 node83f884d user:notice HACMP for AIX: EVENT START: join_standby node83f884d 192.168.100.1 Nov 22 16:36:33 node83f884d user:notice HACMP for AIX: EVENT COMPLETED: join_standby node83f884d 19...显示全部
不过我先在看出点门道了,贴上来大家给分析一下。
Nov 22 16:36:33 node83f884d user:notice HACMP for AIX: EVENT START: join_standby node83f884d 192.168.100.1
Nov 22 16:36:33 node83f884d user:notice HACMP for AIX: EVENT COMPLETED: join_standby node83f884d 192.168.100.1 0
Nov 22 16:36:34 node83f884d local0:crit clstrmgrES[909374]: Sun Nov 22 16:36:34 announcementCb: Called, state=ST_UNSTABLE, provider
token 1
Nov 22 16:36:34 node83f884d daemon:err|error haemd[1179652]: LPP=PSSP,Fn=emd_gsi.c,SID=1.4.1.36,L#=1361,                           
         haemd: 2521-032 Cannot dispatch group services (1).
Nov 22 16:36:34 node83f884d local0:crit clstrmgrES[909374]: Sun Nov 22 16:36:34 announcementCb: GsToken 2, AdapterToken 3, rm_GsToke
n 1
Nov 22 16:36:34 node83f884d local0:crit clstrmgrES[909374]: Sun Nov 22 16:36:34 announcementCb: GRPSVCS announcment code=512; exitin
g
Nov 22 16:36:34 node83f884d local0:crit clstrmgrES[909374]: Sun Nov 22 16:36:34  CHECK FOR FAILURE OF RSCT SUBSYSTEMS (topsvcs or gr
psvcs)
Nov 22 16:36:35 node83f884d user:notice HACMP for AIX: clexit.rc : Unexpected termination of clstrmgrES.
Nov 22 16:36:35 node83f884d user:notice HACMP for AIX: clexit.rc : Halting system immediately!!!
Nov 23 09:12:41 node83f884d daemon:notice RMCdaemon[921796]: (Recorded using libct_ffdc.a cv 2):::Error ID: 6eKora0dNe09/yXE0C4.2g0.
..................:::Reference ID:  :::Template ID: a6df45aa:::Details File:  :::Location: RSCT,rmcd.c,1.51,209                     
     :::RMCD_INFO_0_ST The daemon is started.收起
系统集成 · 2009-11-25
浏览2096

提问者

zbc2602
技术总监北京南天软件有限公司
擅长领域: 数据库系统运维数据安全

问题状态

  • 发布时间:2009-11-24
  • 关注会员:0 人
  • 问题浏览:10518
  • 最近回答:2009-12-09
  • X社区推广