用户生产环境,小机是p720,11月4日和30日晚上两台小机自动关机,出错信息如下:
# uname -a
AIX mesrac2 1 6 00F756C54C00
# oslevel -r
6100-07
# lsattr -El mem0
ent_mem_cap I/O memory entitlement in Kbytes False
goodsize 15552 Amount of usable physical memory in Mbytes False
mem_exp_factor Memory expansion factor False
size 15552 Total amount of physical memory in Mbytes False
var_mem_weight Variable memory capacity weight False
# lsps -s
Total Paging Space Percent Used
16384MB 1%
# lslpp -l |grep cluster
bos.cluster.rte 6.1.7.0 COMMITTED Cluster Aware AIX
bos.cluster.solid 6.1.7.0 COMMITTED POWER HA Business Resiliency
cluster.adt.es.client.include
cluster.adt.es.client.samples.clinfo
cluster.adt.es.client.samples.clstat
cluster.adt.es.client.samples.libcl
cluster.adt.es.java.demo.monitor
cluster.doc.en_US.es.html 5.5.0.1 COMMITTED HAES Web-based HTML
cluster.doc.en_US.es.pdf 5.5.0.0 COMMITTED HAES PDF Documentation - U.S.
cluster.es.cfs.rte 5.5.0.3 COMMITTED ES Cluster File System Support
cluster.es.client.clcomd 5.5.0.3 COMMITTED ES Cluster Communication
cluster.es.client.lib 5.5.0.3 COMMITTED ES Client Libraries
cluster.es.client.rte 5.5.0.3 COMMITTED ES Client Runtime
cluster.es.client.utils 5.5.0.3 COMMITTED ES Client Utilities
cluster.es.client.wsm 5.5.0.3 COMMITTED Web based Smit
cluster.es.cspoc.cmds 5.5.0.4 COMMITTED ES CSPOC Commands
cluster.es.cspoc.dsh 5.5.0.0 COMMITTED ES CSPOC dsh
cluster.es.cspoc.rte 5.5.0.3 COMMITTED ES CSPOC Runtime Commands
cluster.es.nfs.rte 5.5.0.0 COMMITTED ES NFS Support
cluster.es.plugins.dhcp 5.5.0.1 COMMITTED ES Plugins - dhcp
cluster.es.plugins.dns 5.5.0.1 COMMITTED ES Plugins - Name Server
cluster.es.plugins.printserver
cluster.es.server.cfgast 5.5.0.0 COMMITTED ES Two-Node Configuration
cluster.es.server.diag 5.5.0.3 COMMITTED ES Server Diags
cluster.es.server.events 5.5.0.4 COMMITTED ES Server Events
cluster.es.server.rte 5.5.0.4 COMMITTED ES Base Server Runtime
cluster.es.server.simulator
cluster.es.server.testtool
cluster.es.server.utils 5.5.0.4 COMMITTED ES Server Utilities
cluster.license 5.5.0.0 COMMITTED HACMP Electronic License
cluster.msg.en_US.cspoc 5.5.0.0 COMMITTED HACMP CSPOC Messages - U.S.
cluster.msg.en_US.es.client
cluster.msg.en_US.es.server
mcr.rte 6.1.7.0 COMMITTED Metacluster Checkpoint and
bos.cluster.rte 6.1.7.0 COMMITTED Cluster Aware AIX
bos.cluster.solid 6.1.7.0 COMMITTED POWER HA Business Resiliency
cluster.es.client.clcomd 5.5.0.3 COMMITTED ES Cluster Communication
cluster.es.client.lib 5.5.0.3 COMMITTED ES Client Libraries
cluster.es.client.rte 5.5.0.3 COMMITTED ES Client Runtime
cluster.es.client.wsm 5.5.0.0 COMMITTED Web based Smit
cluster.es.cspoc.rte 5.5.0.0 COMMITTED ES CSPOC Runtime Commands
cluster.es.nfs.rte 5.5.0.0 COMMITTED ES NFS Support
cluster.es.server.diag 5.5.0.0 COMMITTED ES Server Diags
cluster.es.server.events 5.5.0.0 COMMITTED ES Server Events
cluster.es.server.rte 5.5.0.4 COMMITTED ES Base Server Runtime
cluster.es.server.simulator
cluster.es.server.utils 5.5.0.4 COMMITTED ES Server Utilities
mcr.rte 6.1.7.0 COMMITTED Metacluster Checkpoint and
#errpt |more
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
AFA89905 1201100612 I O grpsvcs Group Services daemon started
97419D60 1201100612 I O topsvcs Topology Services daemon started
A6DF45AA 1201085712 I O RMCdaemon The daemon is started.
F3931284 1201085612 I H ent0 ETHERNET NETWORK RECOVERY MODE
EC0BCCD4 1201085612 T H ent0 ETHERNET DOWN
2BFA76F6 1201085412 T S SYSPROC SYSTEM SHUTDOWN BY USER
9DBCFDEE 1201085712 T O errdemon ERROR LOGGING TURNED ON
FE2DEE00 1130223212 P S SYSXAIXIF DUPLICATE IP ADDRESS DETECTED IN THE NET
A924A5FC 1130223212 P S SYSPROC SOFTWARE PROGRAM ABNORMALLY TERMINATED
6D19271E 1130223212 I O topsvcs Topology Services daemon stopped
AA8AB241 1130223212 T O OPERATOR OPERATOR NOTIFICATION
4B219AEA 1130223212 P U LIBLVM Concurrent LVM daemon forced Volume Grou
CAD234BE 1130223212 U H LVDD QUORUM LOST, VOLUME GROUP CLOSING
CAD234BE 1130223212 U H LVDD QUORUM LOST, VOLUME GROUP CLOSING
4B219AEA 1130223212 P U LIBLVM Concurrent LVM daemon forced Volume Grou
9BD08D55 1130223212 I U LIBLVM Unable to start gsclvmd
BC3BE5A3 1130223212 P S SRC SOFTWARE PROGRAM ERROR
CAD234BE 1130223212 U H LVDD QUORUM LOST, VOLUME GROUP CLOSING
4B219AEA 1130223212 P U LIBLVM Concurrent LVM daemon forced Volume Grou
9BD08D55 1130223212 I U LIBLVM Unable to start gsclvmd
BC3BE5A3 1130223212 P S SRC SOFTWARE PROGRAM ERROR
AEA055D0 1130223212 I S livedump Live dump complete
CAD234BE 1130223212 U H LVDD QUORUM LOST, VOLUME GROUP CLOSING
DB14100E 1130223212 P U LIBLVM Group Services detected a failure
CB4A951F 1130223212 I S SRC SOFTWARE PROGRAM ERROR
12081DC6 1130223212 P S haemd SOFTWARE PROGRAM ERROR
A924A5FC 1130223212 P S SYSPROC SOFTWARE PROGRAM ABNORMALLY TERMINATED
463A893D 1130223012 P O grpsvcs Internal logic error in Group Services d
#errpt -aj 463A893D |more
LABEL: GS_ERROR_ER
IDENTIFIER: 463A893D
Date/Time: Fri Nov 30 22:30:47 GMT+08:00 2012
Sequence Number: 224
Machine Id: 00F756C54C00
Node Id: mesrac2
Class: O
Type: PERM
WPAR: Global
Resource Name: grpsvcs
Description
Internal logic error in Group Services daemon
Probable Causes
An internal logic failure occurs in daemon
Unexpected program failure
Failure Causes
Unrecoverable logic failure in daemon
Recommended Actions
Verify that Group Services daemon is still running
Verify that Group Services daemon has been restarted
Call IBM Service if problem persists
Detail Data
DETECTING MODULE
RSCT,pgsd.C,1.62.1.23,238
ERROR ID
6xYcC4/LAAiE/M3e.K5.e.1...................
REFERENCE CODE
DIAGNOSTIC EXPLANATION
Memory allocation failed. Please check the memory availability.
上面是第2个节点的出错信息,第1节点除了463A893D这个错误没有,其它的跟上面一样。还有11月4日出错的信息,463A893D这个出错信息在第1个节点上,什么原因请各位指导。
收起