IT分销/经销PowerVMAIXPowerHA

aix HA问题?

IBM 小型机安装powerVM  分区做的HA
HA手动切换失败

报错日志:

LABEL: GS_START_ST
IDENTIFIER: AFA89905

Date/Time:       Tue Apr 28 13:17:07 2020
Sequence Number: 31157
Machine Id:      00C7EDA74C00
Node Id:         l79fis94
Class:           O
Type:            INFO
WPAR:            Global
Resource Name:   grpsvcs

Description
Group Services daemon started

Probable Causes
Daemon started during system startup
Daemon re-started automatically by SRC
Daemon started during installation
Daemon started manually by user

User Causes
Daemon started manually by user

Recommended Actions
Check that Group Services daemon is running

Detail Data
DETECTING MODULE
RSCT,pgsd.C,1.62.1.27,710                     
ERROR ID 
63Y7ej0HlvdS/uQp.0XAt8....................
REFERENCE CODE
                                          
DIAGNOSTIC EXPLANATION

HAGS daemon started by SRC. Log file is /var/ha/log/grpsvcs_trace.

LABEL: TS_START_ST
IDENTIFIER: 97419D60

Date/Time:       Tue Apr 28 13:17:05 2020
Sequence Number: 31156
Machine Id:      00C7EDA74C00
Node Id:         l79fis94
Class:           O
Type:            INFO
WPAR:            Global
Resource Name:   topsvcs

Description
Topology Services daemon started

Probable Causes
Daemon started during system start-up
Daemon re-started automatically by SRC
Daemon started during installation
Daemon started manually by user

User Causes
Daemon started manually by user

Recommended Actions
Confirm that this is desirable

Detail Data
DETECTING MODULE
rsct,bootstrp.C,1.215.1.13,4956               
ERROR ID 
6UpNEL0FlvdS/mdl/0XAt8....................
REFERENCE CODE
                                          
Topology Services daemon started by:
SRC
Topology Services daemon log file location
/var/ha/log/topsvcs.28.131705.l79fis84_94.en_US
Topology Services daemon run directory

/var/ha/run/topsvcs.l79fis84_94/

LABEL: RMCD_INFO_0_ST
IDENTIFIER: A6DF45AA

Date/Time:       Tue Apr 28 13:13:39 2020
Sequence Number: 31155
Machine Id:      00C7EDA74C00
Node Id:         l79fis94
Class:           O
Type:            INFO
WPAR:            Global
Resource Name:   RMCdaemon

Description
The daemon is started.

Probable Causes
The Resource Monitoring and Control daemon has been started.

User Causes
The startsrc -s ctrmc command has been executed or
the rmcctrl -s command has been executed.

Recommended Actions
Confirm that the daemon should be started.

Detail Data
DETECTING MODULE
RSCT,rmcd.c,1.95,239                          
ERROR ID 
6eKora01ivdS/ObE/0XAt8....................
REFERENCE CODE

                                          

LABEL: REBOOT_ID
IDENTIFIER: 2BFA76F6

Date/Time:       Tue Apr 28 13:13:05 2020
Sequence Number: 31152
Machine Id:      00C7EDA74C00
Node Id:         l79fis94
Class:           S
Type:            TEMP
WPAR:            Global
Resource Name:   SYSPROC

Description
SYSTEM SHUTDOWN BY USER

Probable Causes
SYSTEM SHUTDOWN

Detail Data
USER ID
           0
0=SOFT IPL 1=HALT 2=TIME REBOOT
           1
TIME TO REBOOT (FOR TIMED REBOOT ONLY)

           0

LABEL: ERRLOG_ON
IDENTIFIER: 9DBCFDEE

Date/Time:       Tue Apr 28 13:13:23 2020
Sequence Number: 31151
Machine Id:      00C7EDA74C00
Node Id:         l79fis94
Class:           O
Type:            TEMP
WPAR:            Global
Resource Name:   errdemon

Description
ERROR LOGGING TURNED ON

Probable Causes
ERRDEMON STARTED AUTOMATICALLY

User Causes
/USR/LIB/ERRDEMON COMMAND

Recommended Actions
NONE


LABEL: ERRLOG_OFF
IDENTIFIER: 192AC071

Date/Time:       Tue Apr 28 13:07:36 2020
Sequence Number: 31150
Machine Id:      00C7EDA74C00
Node Id:         l79fis94
Class:           O
Type:            TEMP
WPAR:            Global
Resource Name:   errdemon

Description
ERROR LOGGING TURNED OFF

Probable Causes
ERRSTOP COMMAND

User Causes
ERRSTOP COMMAND

Recommended Actions
RUN ERRDEAD COMMAND
TURN ERROR LOGGING ON


LABEL: SRC_RSTRT
IDENTIFIER: CB4A951F

Date/Time:       Tue Apr 28 13:06:40 2020
Sequence Number: 31149
Machine Id:      00C7EDA74C00
Node Id:         l79fis94
Class:           S
Type:            INFO
WPAR:            Global
Resource Name:   SRC

Description
SOFTWARE PROGRAM ERROR

Probable Causes
APPLICATION PROGRAM

Failure Causes
SOFTWARE PROGRAM

Recommended Actions
VERIFY SUBSYSTEM RESTARTED AUTOMATICALLY

Detail Data
SYMPTOM CODE
       16384
SOFTWARE ERROR CODE
       -9035
ERROR CODE
           0
DETECTING MODULE
'srchevn.c'@line:'234'
FAILING MODULE

sendmail

LABEL: SRC_RSTRT
IDENTIFIER: CB4A951F

Date/Time:       Tue Apr 28 13:05:40 2020
Sequence Number: 31148
Machine Id:      00C7EDA74C00
Node Id:         l79fis94
Class:           S
Type:            INFO
WPAR:            Global
Resource Name:   SRC

Description
SOFTWARE PROGRAM ERROR

Probable Causes
APPLICATION PROGRAM

Failure Causes
SOFTWARE PROGRAM

Recommended Actions
VERIFY SUBSYSTEM RESTARTED AUTOMATICALLY

Detail Data
SYMPTOM CODE
       16384
SOFTWARE ERROR CODE
       -9035
ERROR CODE
           0
DETECTING MODULE
'srchevn.c'@line:'234'
FAILING MODULE
sendmail

参与10

3同行回答

myciciymyciciyIT顾问某金融科技公司
如果确认配置没问题,考虑一下补丁问题显示全部

如果确认配置没问题,考虑一下补丁问题

收起
银行 · 2020-04-30
浏览2124
hufeng719hufeng719联盟成员系统工程师某钢铁企业
感觉集群配置有问题,HA手动切换前 集群状态都不对,肯定切换时会有问题。显示全部

感觉集群配置有问题,HA手动切换前 集群状态都不对,肯定切换时会有问题。

收起
能源采矿 · 2020-04-29
浏览2157
laq00098 邀答
zwz99999zwz99999系统工程师dcits
是vioc直接做的ha吧!那个版本的ha,aix系统是多少?感觉你ha没有配置好?你之前怎么配置的?规划贴出来看一下显示全部

是vioc直接做的ha吧!那个版本的ha,aix系统是多少?感觉你ha没有配置好?你之前怎么配置的?规划贴出来看一下

收起
系统集成 · 2020-04-28
浏览2363
laq00098 邀答
  • aix 是7.1 HA是7.1 具体不是我配置的 看hacmp.out日志 :cl_sel[131] 1> /dev/null 2>& 1 :cl_sel[132] [ 0 -ne 0 ] :cl_sel[139] compress /tmp/ibmsupt/hacmp/eventlogs.2020.04.28.12.48 :cl_sel[139] 1> /dev/null 2>& 1 :cl_sel[144] ls -1 /tmp/ibmsupt/hacmp/eventlogs.2020.03.24.09.34.Z /tmp/ibmsupt/hacmp/eventlogs.2020.03.30.06.36.Z /tmp/ibmsupt/hacmp/eventlogs.2020.04.28.12.48.Z :cl_sel[144] wc -l :cl_sel[144] 2> /dev/null :cl_sel[144] FFDC_COUNT=' 3' :cl_sel[145] [ ' 3' -gt 5 ] :cl_sel[155] dspmsg scripts.cat 10059 'FFDC event log collection saved to /tmp/ibmsupt/hacmp/eventlogs.2020.04.28.12.48\n' /tmp/ibmsupt/hacmp/eventlogs.2020.04.28.12.48 FFDC event log collection saved to /tmp/ibmsupt/hacmp/eventlogs.2020.04.28.12.48 :cl_sel[157] exit 0 WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 360 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 390 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 420 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 450 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 480 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 540 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 600 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 660 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 720 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 780 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 900 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 1020 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 1140 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 1260 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 1380 seconds. Please check cluster status. /ap/os/script/cluster/fhdba25_stop.sh[60]: 7471414 Terminated allinstances_stop: Info: process_vgs: all background jobs are now complete for a20_fhdba25. +a20_fhdba25:stop_server[+133] [ 0 -ne 0 ] +a20_fhdba25:stop_server[+161] ALLNOERRSERV=All_nonerror_servers +a20_fhdba25:stop_server[+162] [ REAL = EMUL ] +a20_fhdba25:stop_server[+167] cl_RMupdate resource_down All_nonerror_servers stop_server
    2020-04-28
  • 你这个估计是脚本的问题,你可以先把脚本去掉,然后启动和切换ha正常吗?如果正常在加入应用脚本试一下,如果不正常就需要修改脚本,也可以在不加脚本的情况一下启动ha然后手动启动和停止脚本试试
    2020-04-29

提问者

laq00098
技术支持神州数码有限公司
擅长领域: 服务器AIXUnix

问题来自

相关问题

相关资料

相关文章

问题状态

  • 发布时间:2020-04-28
  • 关注会员:4 人
  • 问题浏览:3862
  • 最近回答:2020-04-30
  • X社区推广