轨道交通故障诊断

P550当机问题

两台550做HA,系统为5303-03-00-0000
HA版本为5.2
上周五节点2当机,上面的资源切换到了1节点上。下面是报错信息,请专家分析下。DENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
C86ACB7E   0524193713 I H dac0           ARRAY CONFIGURATION CHANGED
C86ACB7E   0524193713 I H dac0           ARRAY CONFIGURATION CHANGED
C86ACB7E   0524193713 I H dac0           ARRAY CONFIGURATION CHANGED
C86ACB7E   0524193713 I H dac0           ARRAY CONFIGURATION CHANGED
C86ACB7E   0524193713 I H dac0           ARRAY CONFIGURATION CHANGED
A6DF45AA   0524193613 I O RMCdaemon      The daemon is started.
EC0BCCD4   0524193613 T H ent2           ETHERNET DOWN
C86ACB7E   0524193613 I H dac0           ARRAY CONFIGURATION CHANGED
2BFA76F6   0524161613 T S SYSPROC        SYSTEM SHUTDOWN BY USER
9DBCFDEE   0524193513 T O errdemon       ERROR LOGGING TURNED ON
BA431EB7   0524161613 P S SRC            SOFTWARE PROGRAM ERROR
AA8AB241   0524161613 T O OPERATOR       OPERATOR NOTIFICATION
BC3BE5A3   0524161613 P S SRC            SOFTWARE PROGRAM ERROR
BC3BE5A3   0524161613 P S SRC            SOFTWARE PROGRAM ERROR
12081DC6   0524161613 P S haemd          SOFTWARE PROGRAM ERROR
AA8AB241   0524161613 T O clstrmgrES     OPERATOR NOTIFICATION
BC3BE5A3   0524161613 P S SRC            SOFTWARE PROGRAM ERROR
64368504   0524161613 P O grpsvcs        Connection failure between Group Service
A63BEB70   0524161613 P S SYSPROC        SOFTWARE PROGRAM ABNORMALLY TERMINATED
4BD1A134   0524161613 P S topsvcs        Using too much CPU: exiting
813FE820   0524151913 U S MQSeries       SOFTWARE PROGRAM ERROR
813FE820   0524081613 U S MQSeries       SOFTWARE PROGRAM ERROR
813FE820   0524080213 U S MQSeries       SOFTWARE PROGRAM ERROR
[:root:/]errpt -aj 64368504
---------------------------------------------------------------------------
LABEL:          GS_TS_RETCODE_ER
IDENTIFIER:     64368504
Date/Time:       Fri May 24 16:16:31 BEIDT 2013
Sequence Number: 215426
Machine Id:      00CE9EDE4C00
Node Id:         huhetd2
Class:           O
Type:            PERM
Resource Name:   grpsvcs         
Description
Connection failure between Group Services and Topology Services
Probable Causes
Topology Services daemon is not running
Topology Services daemon has died
Topology Services library has detected an error
Failure Causes
Group Services detects an error condition of Topology Services
        Recommended Actions
        Check the Topology Services daemon
Verify that Group Services daemon has been restarted
Call IBM Service if problem persists
Detail Data
DETECTING MODULE
RSCT,PMClient.C,1.72,1049                     
ERROR ID
62IcBY/DDlbF/egI.q7UL8....................
REFERENCE CODE
                                          
DIAGNOSTIC EXPLANATION
topsvcs subsystem died with hb_errno = 16, grpsvcs will also exit.
root:/]errpt -aj 64368504
---------------------------------------------------------------------------
LABEL:          GS_TS_RETCODE_ER
IDENTIFIER:     64368504
Date/Time:       Fri May 24 16:16:31 BEIDT 2013
Sequence Number: 215426
Machine Id:      00CE9EDE4C00
Node Id:         huhetd2
Class:           O
Type:            PERM
Resource Name:   grpsvcs         
Description
Connection failure between Group Services and Topology Services
Probable Causes
Topology Services daemon is not running
Topology Services daemon has died
Topology Services library has detected an error
Failure Causes
Group Services detects an error condition of Topology Services
        Recommended Actions
        Check the Topology Services daemon
Verify that Group Services daemon has been restarted
Call IBM Service if problem persists
Detail Data
DETECTING MODULE
RSCT,PMClient.C,1.72,1049                     
ERROR ID
62IcBY/DDlbF/egI.q7UL8....................
REFERENCE CODE
                                          
DIAGNOSTIC EXPLANATION
topsvcs subsystem died with hb_errno = 16, grpsvcs will also exit.
参与11

10同行回答

shenadamshenadam系统工程师sjo
这日志是一些接管的信息。当前系统资源使用率高吗显示全部
这日志是一些接管的信息。当前系统资源使用率高吗收起
互联网服务 · 2013-05-29
浏览2840
xiaohei80xiaohei80运维工程师信息技术所
节点2宕机时候,在HA里没什么信息,看了下节点1的,有相关信息more hacmp.out.1+ [[ high = high ]]+ version=1.2+ + cl_get_pathHA_DIR=es+ STATUS=0+ set +u+ [ ]+ exit 0                     &nbs...显示全部
节点2宕机时候,在HA里没什么信息,看了下节点1的,有相关信息
more hacmp.out.1
+ [[ high = high ]]
+ version=1.2
+ + cl_get_path
HA_DIR=es
+ STATUS=0
+ set +u
+ [ ]
+ exit 0
                        HACMP Event Summary
Event: /usr/es/sbin/cluster/events/check_for_site_down huhetd2
Start time: Fri May 24 16:15:54 2013

End time: Fri May 24 16:15:55 2013

Action:         Resource:                       Script Name:
----------------------------------------------------------------------------
No resources changed as a result of this event
----------------------------------------------------------------------------

May 24 16:15:55 EVENT START: node_down huhetd2

:node_down[79] [[ high = high ]]
:node_down[79] version=1.47
:node_down[80] :node_down[80] cl_get_path
HA_DIR=es
:node_down[82] export NODENAME=huhetd2
:node_down[83] export PARAM=
:node_down[85] UPDATESTATDFILE=/usr/es/sbin/cluster/etc/updatestatd
:node_down[94] STATUS=0
:node_down[96] [[ -z  ]]
:node_down[97] EMULATE=REAL
:node_down[100] set -u
:node_down[102] ((  1 < 1  ))
:node_down[107] rm -f /tmp/.RPCLOCKDSTOPPED
:node_down[108] rm -f /usr/es/sbin/cluster/etc/updatestatd
:node_down[110] [[  = forced ]]
:node_down[128] UPDATESTATD=0
:node_down[129] export UPDATESTATD
:node_down[134] [[ FALSE = FALSE ]]
:node_down[142] set -a
:node_down[143] clsetenvgrp huhetd2 node_down
:clsetenvgrp[50] [[ high = high ]]
:clsetenvgrp[50] version=1.16
:clsetenvgrp[52] usingVer=clSetenvgrp
:clsetenvgrp[57] clSetenvgrp huhetd2 node_down
executing clSetenvgrp
clSetenvgrp: argc = 3
clSetenvgrp completed successfully
:clsetenvgrp[58] exit 0
:node_down[143] eval FORCEDOWN_GROUPS="" RESOURCE_GROUPS="" HOMELESS_GROUPS="" ERRSTATE_GROUPS="" PRINCIPAL_ACTIONS="" ASSOCIATE_ACT
IONS="" AUXILLIARY_ACTIONS="" SIBLING_GROUPS="" SIBLING_NODES_BY_GROUP="" SIBLING_ACQUIRING_GROUPS="" SIBLING_ACQUIRING_NODES_BY_GRO
UP="" SIBLING_RELEASING_GROUPS="" SIBLING_RELEASING_NODES_BY_GROUP="" SIBLING_PRE_EVENT_LOC=""
:node_down[143] FORCEDOWN_GROUPS= RESOURCE_GROUPS= HOMELESS_GROUPS= ERRSTATE_GROUPS= PRINCIPAL_ACTIONS= ASSOCIATE_ACTIONS= AUXILLIAR
Y_ACTIONS= SIBLING_GROUPS= SIBLING_NODES_BY_GROUP= SIBLING_ACQUIRING_GROUPS= SIBLING_ACQUIRING_NODES_BY_GROUP= SIBLING_RELEASING_GRO
UPS= SIBLING_RELEASING_NODES_BY_GROUP= SIBLING_PRE_EVENT_LOC=
:node_down[144] RC=0
:node_down[145] set +a
:node_down[146] ((  0 != 0  ))
:node_down[157] process_resources
:process_resources[1608] [[ high = high ]]
:process_resources[1608] version=1.57.1.3
:process_resources[1609] :process_resources[1609] cl_get_path
HA_DIR=es
:process_resources[1611] STATUS=0
:process_resources[1612] sddsrv_off=FALSE
:process_resources[1615] [ ! -n  ]
:process_resources[1617] EMULATE=REAL
:process_resources[1620] true
:process_resources[1622] set -a
:process_resources[1625] clRGPA
:clRGPA[49] [[ high = high ]]
:clRGPA[49] version=1.16
:clRGPA[51] usingVer=clrgpa
:clRGPA[56] clrgpa
:clRGPA[57] exit 0
:process_resources[1625] eval JOB_TYPE=ACQUIRE RESOURCE_GROUPS="huheres2"
:process_resources[1625] JOB_TYPE=ACQUIRE RESOURCE_GROUPS=huheres2
:process_resources[1627] RC=0
:process_resources[1628] set +a
:process_resources[1630] [ 0 -ne 0 ]
:process_resources[1835] set_resource_group_state ACQUIRING
:process_resources[3] STAT=0
huheres2:process_resources[6] export GROUPNAME
huheres2:process_resources[7] [ ACQUIRING != DOWN ]
huheres2:process_resources[9] [ REAL = EMUL ]
huheres2:process_resources[14] clchdaemons -d clstrmgr_scripts -t resource_locator -n huhetd1 -o huheres2 -v ACQUIRING
huheres2:process_resources[15] [ 0 -ne 0 ]
huheres2:process_resources[26] [ ACQUIRING = ACQUIRING ]
huheres2:process_resources[28] cl_RMupdate acquiring huheres2 process_resources
Reference string: Fri.May.24.16:15:55.BEIDT.2013.process_resources.huheres2.ref
huheres2:process_resources[29] continue
huheres2:process_resources[65] return 0
huheres2:process_resources[1620] true
huheres2:process_resources[1622] set -a
huheres2:process_resources[1625] clRGPA
huheres2:clRGPA[49] [[ high = high ]]
huheres2:clRGPA[49] version=1.16
huheres2:clRGPA[51] usingVer=clrgpa
huheres2:clRGPA[56] clrgpa
huheres2:clRGPA[57] exit 0
huheres2:process_resources[1625] eval JOB_TYPE=TAKEOVER_LABELS ACTION=ACQUIRE IP_LABELS="huhetd2_svc" RESOURCE_GROUPS="huheres2 " CO
MMUNICATION_LINKS=""
huheres2:process_resources[1625] JOB_TYPE=TAKEOVER_LABELS ACTION=ACQUIRE IP_LABELS=huhetd2_svc RESOURCE_GROUPS=huheres2  COMMUNICATI
ON_LINKS=
huheres2:process_resources[1627] RC=0
huheres2:process_resources[1628] set +a
huheres2:process_resources[1630] [ 0 -ne 0 ]
huheres2:process_resources[1654] export GROUPNAME=huheres2
huheres2 :process_resources[1654] [[ ACQUIRE = ACQUIRE ]]
huheres2 :process_resources[1656] acquire_takeover_labels
huheres2 :process_resources[4] clcallev acquire_takeover_addr

May 24 16:15:55 EVENT START: acquire_takeover_addr

huheres2 :acquire_takeover_addr[538] [[ high = high ]]
huheres2 :acquire_takeover_addr[538] version=1.55
huheres2 :acquire_takeover_addr[539] huheres2 :acquire_takeover_addr[539] cl_get_path
HA_DIR=es
huheres2 :acquire_takeover_addr[542] TELINIT=false
huheres2 :acquire_takeover_addr[543] TELINIT_FILE=/usr/es/sbin/cluster/.telinit
huheres2 :acquire_takeover_addr[545] typeset -i telinit_wait_count=36
huheres2 :acquire_takeover_addr[547] DELAY=5
huheres2 :acquire_takeover_addr[550] STATUS=0
huheres2 :acquire_takeover_addr[552] [ ! -n  ]
huheres2 :acquire_takeover_addr[554] EMULATE=REAL
huheres2 :acquire_takeover_addr[558] PROC_RES=false
huheres2 :acquire_takeover_addr[562] [[ TAKEOVER_LABELS != 0 ]]
huheres2 :acquire_takeover_addr[562] [[ TAKEOVER_LABELS != GROUP ]]
huheres2 :acquire_takeover_addr[563] PROC_RES=true
huheres2 :acquire_takeover_addr[564] _SNA_CONNECTIONS=
huheres2 :acquire_takeover_addr[565] _IP_LABELS=huhetd2_svc
huheres2 :acquire_takeover_addr[580] saveNSORDER=UNDEFINED
huheres2 :acquire_takeover_addr[581] NSORDER=local
huheres2 :acquire_takeover_addr[581] export NSORDER
huheres2 :acquire_takeover_addr[585] BOOT_ADDR=
huheres2 :acquire_takeover_addr[586] SERVICE_ADDR=
huheres2:acquire_takeover_addr[592] export GROUPNAME
huheres2:acquire_takeover_addr[592] [[ true = true ]]
huheres2:acquire_takeover_addr[595] read SERVICELABELS
huheres2:acquire_takeover_addr[595] get_list_head huhetd2_svc
huheres2:acquire_takeover_addr[596] read IP_LABELS
huheres2:acquire_takeover_addr[596] get_list_tail huhetd2_svc
huheres2:acquire_takeover_addr[598] read SNA_CONNECTIONS
huheres2:acquire_takeover_addr[598] get_list_head
huheres2:acquire_takeover_addr[599] export SNA_CONNECTIONS
huheres2:acquire_takeover_addr[600] read _SNA_CONNECTIONS
huheres2:acquire_takeover_addr[600] get_list_tail
huheres2:acquire_takeover_addr[606] ALLSRVADDRS=All_service_addrs
huheres2:acquire_takeover_addr[607] [ REAL = EMUL ]
huheres2:acquire_takeover_addr[612] cl_RMupdate resource_acquiring All_service_addrs acquire_takeover_addr
Reference string: Fri.May.24.16:15:55.BEIDT.2013.acquire_takeover_addr.All_service_addrs.huheres2.ref
huheres2:acquire_takeover_addr[622] clgetif -a huhetd2_svc
huheres2:acquire_takeover_addr[622] 2> /dev/null
huheres2:acquire_takeover_addr[623] [ 3 -ne 0 ]
huheres2:acquire_takeover_addr[630] STATUS=1
huheres2:acquire_takeover_addr[631] huheres2:acquire_takeover_addr[631] name_to_addr huhetd2_svc
huheres2:acquire_takeover_addr[2] cllsif -cSn huhetd2_svc
huheres2:acquire_takeover_addr[2] uniq
huheres2:acquire_takeover_addr[2] cut -d: -f7
huheres2:acquire_takeover_addr[2] echo 10.94.2.17
huheres2:acquire_takeover_addr[3] exit 0
addr_dot_addr=10.94.2.17
huheres2:acquire_takeover_addr[633] cllsif -cSn 10.94.2.17
huheres2:acquire_takeover_addr[633] cut -d: -f3,4
huheres2:acquire_takeover_addr[633] tr :  
huheres2:acquire_takeover_addr[634] read NETWORK NET_TYPE
huheres2:acquire_takeover_addr[634] [[ -z net_ether_01 ]]
huheres2:acquire_takeover_addr[659] huheres2:acquire_takeover_addr[659] cut -f3 -d:
huheres2:acquire_takeover_addr[659] cllsnw -cSw -n net_ether_01
ALIAS=disable
huheres2:acquire_takeover_addr[660] [ ether = hps -o disable = true ]
huheres2:acquire_takeover_addr[676] huheres2:acquire_takeover_addr[676] cllsif -cSi huhetd1
huheres2:acquire_takeover_addr[676] grep :standby:
huheres2:acquire_takeover_addr[676] cut -d: -f1,3
huheres2:acquire_takeover_addr[676] grep -w net_ether_01
huheres2:acquire_takeover_addr[676] cut -d : -f1
STDBYS=huhetd1_stdby
huheres2:acquire_takeover_addr[680] huheres2:acquire_takeover_addr[680] best_boot_addr net_ether_01 huhetd1_stdby
huheres2:acquire_takeover_addr[2] NETWORK=net_ether_01
huheres2:acquire_takeover_addr[3] shift
huheres2:acquire_takeover_addr[4] candidate_boots=huhetd1_stdby
huheres2:acquire_takeover_addr[8] huheres2:acquire_takeover_addr[8] echo huhetd1_stdby
huheres2:acquire_takeover_addr[8] wc -l
huheres2:acquire_takeover_addr[8] tr   \n
num_candidates=       1
huheres2:acquire_takeover_addr[8] [[        1 -eq 1 ]]
huheres2:acquire_takeover_addr[10] echo huhetd1_stdby收起
轨道交通 · 2013-05-29
浏览3284
王少一王少一系统工程师通联支付网络服务股份有限公司
回复 7# yszw0817     你看一下HA的补丁描述列表,有类似问题描述显示全部
回复 7# yszw0817


    你看一下HA的补丁描述列表,有类似问题描述收起
互联网服务 · 2013-05-27
浏览2887
lock-onlock-on系统工程师starsino
心跳网挂掉了!!!!????显示全部
心跳网挂掉了!!!!????收起
系统集成 · 2013-05-27
浏览2886
yszw0817yszw0817存储架构师北京
4BD1A134   0524161613 P S topsvcs        Using too much CPU: exitingHA打下补丁再观察一下王少一 发表于 2013-5-27 11:26 何以见得?显示全部
4BD1A134   0524161613 P S topsvcs        Using too much CPU: exiting
HA打下补丁再观察一下
王少一 发表于 2013-5-27 11:26



何以见得?收起
硬件生产 · 2013-05-27
浏览2866
午夜幽魂午夜幽魂系统运维工程师计算机有限公司
如楼上的兄弟们所说,4BD1A134   0524161613 P S topsvcs        Using too much CPU: exiting这个列出来看看,再把HA的日志列出来看看显示全部
如楼上的兄弟们所说,4BD1A134   0524161613 P S topsvcs        Using too much CPU: exiting
这个列出来看看,
再把HA的日志列出来看看收起
系统集成 · 2013-05-27
浏览2952
clvlbllclvlbll系统工程师IBM
4BD1A134   0524161613 P S topsvcs        Using too much CPU: exitingCPU爆啦?显示全部
4BD1A134   0524161613 P S topsvcs        Using too much CPU: exiting
CPU爆啦?收起
互联网服务 · 2013-05-27
浏览2809
王少一王少一系统工程师通联支付网络服务股份有限公司
4BD1A134   0524161613 P S topsvcs        Using too much CPU: exitingHA打下补丁再观察一下显示全部
4BD1A134   0524161613 P S topsvcs        Using too much CPU: exiting
HA打下补丁再观察一下收起
互联网服务 · 2013-05-27
浏览3069
yszw0817yszw0817存储架构师北京
信息少了。把/tmp/hacmp.out贴出来看看。看到这个:4BD1A134   0524161613 P S topsvcs        Using too much CPU: exiting显示全部
信息少了。把/tmp/hacmp.out贴出来看看。看到这个:
4BD1A134   0524161613 P S topsvcs        Using too much CPU: exiting收起
硬件生产 · 2013-05-27
浏览2908
xiaohei80xiaohei80运维工程师信息技术所
第一次见这种问题,求各位专家指导显示全部
第一次见这种问题,求各位专家指导收起
轨道交通 · 2013-05-27
浏览2786

提问者

xiaohei80
运维工程师信息技术所

相关问题

相关资料

相关文章

问题状态

  • 发布时间:2013-05-27
  • 关注会员:1 人
  • 问题浏览:10881
  • 最近回答:2013-05-29
  • X社区推广