aix HA问题?

IBM 小型机安装powerVM  分区做的HA HA手动切换失败 报错日志:

LABEL: GS_START_ST
IDENTIFIER: AFA89905

Date/Time:       Tue Apr 28 13:17:07 2020
Sequence Number: 31157
Machine Id:      00C7EDA74C00
Node Id:         l79fis94
Class:           O
Type:            INFO
WPAR:            Global
Resource Name:   grpsvcs

Description
Group Services daemon started

Probable Causes
Daemon started during system startup
Daemon re-started automatically by SRC
Daemon started during installation
Daemon started manually by user

User Causes
Daemon started manually by user

Recommended Actions
Check that Group Services daemon is running

Detail Data DETECTING MODULE RSCT,pgsd.C,1.62.1.27,710                      ERROR ID  63Y7ej0HlvdS/uQp.0XAt8.................... REFERENCE CODE                                            DIAGNOSTIC EXPLANATION HAGS daemon started by SRC. Log file is /var/ha/log/grpsvcs_trace.

LABEL: TS_START_ST
IDENTIFIER: 97419D60

Date/Time:       Tue Apr 28 13:17:05 2020
Sequence Number: 31156
Machine Id:      00C7EDA74C00
Node Id:         l79fis94
Class:           O
Type:            INFO
WPAR:            Global
Resource Name:   topsvcs

Description
Topology Services daemon started

Probable Causes
Daemon started during system start-up
Daemon re-started automatically by SRC
Daemon started during installation
Daemon started manually by user

User Causes
Daemon started manually by user

Recommended Actions
Confirm that this is desirable

Detail Data DETECTING MODULE rsct,bootstrp.C,1.215.1.13,4956                ERROR ID  6UpNEL0FlvdS/mdl/0XAt8.................... REFERENCE CODE                                            Topology Services daemon started by: SRC Topology Services daemon log file location /var/ha/log/topsvcs.28.131705.l79fis84_94.en_US Topology Services daemon run directory /var/ha/run/topsvcs.l79fis84_94/

LABEL: RMCD_INFO_0_ST
IDENTIFIER: A6DF45AA

Date/Time:       Tue Apr 28 13:13:39 2020
Sequence Number: 31155
Machine Id:      00C7EDA74C00
Node Id:         l79fis94
Class:           O
Type:            INFO
WPAR:            Global
Resource Name:   RMCdaemon

Description
The daemon is started.

Probable Causes
The Resource Monitoring and Control daemon has been started.

User Causes
The startsrc -s ctrmc command has been executed or
the rmcctrl -s command has been executed.

Recommended Actions
Confirm that the daemon should be started.

Detail Data DETECTING MODULE RSCT,rmcd.c,1.95,239                           ERROR ID  6eKora01ivdS/ObE/0XAt8.................... REFERENCE CODE                                           

LABEL: REBOOT_ID
IDENTIFIER: 2BFA76F6

Date/Time:       Tue Apr 28 13:13:05 2020
Sequence Number: 31152
Machine Id:      00C7EDA74C00
Node Id:         l79fis94
Class:           S
Type:            TEMP
WPAR:            Global
Resource Name:   SYSPROC

Description
SYSTEM SHUTDOWN BY USER

Probable Causes
SYSTEM SHUTDOWN

Detail Data USER ID            0 0=SOFT IPL 1=HALT 2=TIME REBOOT            1 TIME TO REBOOT (FOR TIMED REBOOT ONLY)            0

LABEL: ERRLOG_ON
IDENTIFIER: 9DBCFDEE

Date/Time:       Tue Apr 28 13:13:23 2020
Sequence Number: 31151
Machine Id:      00C7EDA74C00
Node Id:         l79fis94
Class:           O
Type:            TEMP
WPAR:            Global
Resource Name:   errdemon

Description
ERROR LOGGING TURNED ON

Probable Causes
ERRDEMON STARTED AUTOMATICALLY

User Causes
/USR/LIB/ERRDEMON COMMAND

Recommended Actions
NONE

LABEL: ERRLOG_OFF
IDENTIFIER: 192AC071

Date/Time:       Tue Apr 28 13:07:36 2020
Sequence Number: 31150
Machine Id:      00C7EDA74C00
Node Id:         l79fis94
Class:           O
Type:            TEMP
WPAR:            Global
Resource Name:   errdemon

Description
ERROR LOGGING TURNED OFF

Probable Causes
ERRSTOP COMMAND

User Causes
ERRSTOP COMMAND

Recommended Actions
RUN ERRDEAD COMMAND
TURN ERROR LOGGING ON

LABEL: SRC_RSTRT
IDENTIFIER: CB4A951F

Date/Time:       Tue Apr 28 13:06:40 2020
Sequence Number: 31149
Machine Id:      00C7EDA74C00
Node Id:         l79fis94
Class:           S
Type:            INFO
WPAR:            Global
Resource Name:   SRC

Description
SOFTWARE PROGRAM ERROR

Probable Causes
APPLICATION PROGRAM

Failure Causes
SOFTWARE PROGRAM

Recommended Actions
VERIFY SUBSYSTEM RESTARTED AUTOMATICALLY

Detail Data SYMPTOM CODE        16384 SOFTWARE ERROR CODE        -9035 ERROR CODE            0 DETECTING MODULE 'srchevn.c'@line:'234' FAILING MODULE sendmail

LABEL: SRC_RSTRT
IDENTIFIER: CB4A951F

Date/Time:       Tue Apr 28 13:05:40 2020
Sequence Number: 31148
Machine Id:      00C7EDA74C00
Node Id:         l79fis94
Class:           S
Type:            INFO
WPAR:            Global
Resource Name:   SRC

Description
SOFTWARE PROGRAM ERROR

Probable Causes
APPLICATION PROGRAM

Failure Causes
SOFTWARE PROGRAM

Recommended Actions
VERIFY SUBSYSTEM RESTARTED AUTOMATICALLY

Detail Data
SYMPTOM CODE
       16384
SOFTWARE ERROR CODE
       -9035
ERROR CODE
           0
DETECTING MODULE
'srchevn.c'@line:'234'
FAILING MODULE
sendmail

3回答

张文正张文正  系统工程师 , dcits
maguang赞同了此回答
是vioc直接做的ha吧!那个版本的ha,aix系统是多少?感觉你ha没有配置好?你之前怎么配置的?规划贴出来看一下显示全部

是vioc直接做的ha吧!那个版本的ha,aix系统是多少?感觉你ha没有配置好?你之前怎么配置的?规划贴出来看一下

收起
 2020-04-28
浏览634
laq00098 邀答
  • aix 是7.1 HA是7.1 具体不是我配置的 看hacmp.out日志 :cl_sel[131] 1> /dev/null 2>& 1 :cl_sel[132] [ 0 -ne 0 ] :cl_sel[139] compress /tmp/ibmsupt/hacmp/eventlogs.2020.04.28.12.48 :cl_sel[139] 1> /dev/null 2>& 1 :cl_sel[144] ls -1 /tmp/ibmsupt/hacmp/eventlogs.2020.03.24.09.34.Z /tmp/ibmsupt/hacmp/eventlogs.2020.03.30.06.36.Z /tmp/ibmsupt/hacmp/eventlogs.2020.04.28.12.48.Z :cl_sel[144] wc -l :cl_sel[144] 2> /dev/null :cl_sel[144] FFDC_COUNT=' 3' :cl_sel[145] [ ' 3' -gt 5 ] :cl_sel[155] dspmsg scripts.cat 10059 'FFDC event log collection saved to /tmp/ibmsupt/hacmp/eventlogs.2020.04.28.12.48\n' /tmp/ibmsupt/hacmp/eventlogs.2020.04.28.12.48 FFDC event log collection saved to /tmp/ibmsupt/hacmp/eventlogs.2020.04.28.12.48 :cl_sel[157] exit 0 WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 360 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 390 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 420 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 450 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 480 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 540 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 600 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 660 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 720 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 780 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 900 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 1020 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 1140 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 1260 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 1380 seconds. Please check cluster status. /ap/os/script/cluster/fhdba25_stop.sh[60]: 7471414 Terminated allinstances_stop: Info: process_vgs: all background jobs are now complete for a20_fhdba25. +a20_fhdba25:stop_server[+133] [ 0 -ne 0 ] +a20_fhdba25:stop_server[+161] ALLNOERRSERV=All_nonerror_servers +a20_fhdba25:stop_server[+162] [ REAL = EMUL ] +a20_fhdba25:stop_server[+167] cl_RMupdate resource_down All_nonerror_servers stop_server
    2020-04-28
  • 你这个估计是脚本的问题,你可以先把脚本去掉,然后启动和切换ha正常吗?如果正常在加入应用脚本试一下,如果不正常就需要修改脚本,也可以在不加脚本的情况一下启动ha然后手动启动和停止脚本试试
    2020-04-29
孙伟光孙伟光  IT顾问 , 中国金融电子化公司
如果确认配置没问题,考虑一下补丁问题显示全部

如果确认配置没问题,考虑一下补丁问题

收起
 2020-04-30
浏览596
hufeng719hufeng719  系统工程师 , 某钢铁企业
感觉集群配置有问题,HA手动切换前 集群状态都不对,肯定切换时会有问题。显示全部

感觉集群配置有问题,HA手动切换前 集群状态都不对,肯定切换时会有问题。

收起
 2020-04-29
浏览614
laq00098 邀答

提问者

laq00098技术支持, 神州数码有限公司

核心数据库服务器选型优先顺序调查

发表您的选型观点,参与即得50金币。

问题状态

  • 发布时间:2020-04-28
  • 关注会员:4 人
  • 问题浏览:1666
  • 最近回答:2020-04-30