aix HA问题?
IBM 小型机安装powerVM 分区做的HA HA手动切换失败 报错日志:
LABEL: GS_START_ST
IDENTIFIER: AFA89905
Date/Time: Tue Apr 28 13:17:07 2020
Sequence Number: 31157
Machine Id: 00C7EDA74C00
Node Id: l79fis94
Class: O
Type: INFO
WPAR: Global
Resource Name: grpsvcs
Description
Group Services daemon started
Probable Causes
Daemon started during system startup
Daemon re-started automatically by SRC
Daemon started during installation
Daemon started manually by user
User Causes
Daemon started manually by user
Recommended Actions
Check that Group Services daemon is running
Detail Data DETECTING MODULE RSCT,pgsd.C,1.62.1.27,710 ERROR ID 63Y7ej0HlvdS/uQp.0XAt8.................... REFERENCE CODE DIAGNOSTIC EXPLANATION HAGS daemon started by SRC. Log file is /var/ha/log/grpsvcs_trace.
LABEL: TS_START_ST
IDENTIFIER: 97419D60
Date/Time: Tue Apr 28 13:17:05 2020
Sequence Number: 31156
Machine Id: 00C7EDA74C00
Node Id: l79fis94
Class: O
Type: INFO
WPAR: Global
Resource Name: topsvcs
Description
Topology Services daemon started
Probable Causes
Daemon started during system start-up
Daemon re-started automatically by SRC
Daemon started during installation
Daemon started manually by user
User Causes
Daemon started manually by user
Recommended Actions
Confirm that this is desirable
Detail Data DETECTING MODULE rsct,bootstrp.C,1.215.1.13,4956 ERROR ID 6UpNEL0FlvdS/mdl/0XAt8.................... REFERENCE CODE Topology Services daemon started by: SRC Topology Services daemon log file location /var/ha/log/topsvcs.28.131705.l79fis84_94.en_US Topology Services daemon run directory /var/ha/run/topsvcs.l79fis84_94/
LABEL: RMCD_INFO_0_ST
IDENTIFIER: A6DF45AA
Date/Time: Tue Apr 28 13:13:39 2020
Sequence Number: 31155
Machine Id: 00C7EDA74C00
Node Id: l79fis94
Class: O
Type: INFO
WPAR: Global
Resource Name: RMCdaemon
Description
The daemon is started.
Probable Causes
The Resource Monitoring and Control daemon has been started.
User Causes
The startsrc -s ctrmc command has been executed or
the rmcctrl -s command has been executed.
Recommended Actions
Confirm that the daemon should be started.
Detail Data DETECTING MODULE RSCT,rmcd.c,1.95,239 ERROR ID 6eKora01ivdS/ObE/0XAt8.................... REFERENCE CODE
LABEL: REBOOT_ID
IDENTIFIER: 2BFA76F6
Date/Time: Tue Apr 28 13:13:05 2020
Sequence Number: 31152
Machine Id: 00C7EDA74C00
Node Id: l79fis94
Class: S
Type: TEMP
WPAR: Global
Resource Name: SYSPROC
Description
SYSTEM SHUTDOWN BY USER
Probable Causes
SYSTEM SHUTDOWN
Detail Data USER ID 0 0=SOFT IPL 1=HALT 2=TIME REBOOT 1 TIME TO REBOOT (FOR TIMED REBOOT ONLY) 0
LABEL: ERRLOG_ON
IDENTIFIER: 9DBCFDEE
Date/Time: Tue Apr 28 13:13:23 2020
Sequence Number: 31151
Machine Id: 00C7EDA74C00
Node Id: l79fis94
Class: O
Type: TEMP
WPAR: Global
Resource Name: errdemon
Description
ERROR LOGGING TURNED ON
Probable Causes
ERRDEMON STARTED AUTOMATICALLY
User Causes
/USR/LIB/ERRDEMON COMMAND
Recommended Actions
NONE
LABEL: ERRLOG_OFF
IDENTIFIER: 192AC071
Date/Time: Tue Apr 28 13:07:36 2020
Sequence Number: 31150
Machine Id: 00C7EDA74C00
Node Id: l79fis94
Class: O
Type: TEMP
WPAR: Global
Resource Name: errdemon
Description
ERROR LOGGING TURNED OFF
Probable Causes
ERRSTOP COMMAND
User Causes
ERRSTOP COMMAND
Recommended Actions
RUN ERRDEAD COMMAND
TURN ERROR LOGGING ON
LABEL: SRC_RSTRT
IDENTIFIER: CB4A951F
Date/Time: Tue Apr 28 13:06:40 2020
Sequence Number: 31149
Machine Id: 00C7EDA74C00
Node Id: l79fis94
Class: S
Type: INFO
WPAR: Global
Resource Name: SRC
Description
SOFTWARE PROGRAM ERROR
Probable Causes
APPLICATION PROGRAM
Failure Causes
SOFTWARE PROGRAM
Recommended Actions
VERIFY SUBSYSTEM RESTARTED AUTOMATICALLY
Detail Data SYMPTOM CODE 16384 SOFTWARE ERROR CODE -9035 ERROR CODE 0 DETECTING MODULE 'srchevn.c'@line:'234' FAILING MODULE sendmail
LABEL: SRC_RSTRT
IDENTIFIER: CB4A951F
Date/Time: Tue Apr 28 13:05:40 2020
Sequence Number: 31148
Machine Id: 00C7EDA74C00
Node Id: l79fis94
Class: S
Type: INFO
WPAR: Global
Resource Name: SRC
Description
SOFTWARE PROGRAM ERROR
Probable Causes
APPLICATION PROGRAM
Failure Causes
SOFTWARE PROGRAM
Recommended Actions
VERIFY SUBSYSTEM RESTARTED AUTOMATICALLY
Detail Data
SYMPTOM CODE
16384
SOFTWARE ERROR CODE
-9035
ERROR CODE
0
DETECTING MODULE
'srchevn.c'@line:'234'
FAILING MODULE
sendmail
3回答
是vioc直接做的ha吧!那个版本的ha,aix系统是多少?感觉你ha没有配置好?你之前怎么配置的?规划贴出来看一下
收起- aix 是7.1 HA是7.1 具体不是我配置的 看hacmp.out日志 :cl_sel[131] 1> /dev/null 2>& 1 :cl_sel[132] [ 0 -ne 0 ] :cl_sel[139] compress /tmp/ibmsupt/hacmp/eventlogs.2020.04.28.12.48 :cl_sel[139] 1> /dev/null 2>& 1 :cl_sel[144] ls -1 /tmp/ibmsupt/hacmp/eventlogs.2020.03.24.09.34.Z /tmp/ibmsupt/hacmp/eventlogs.2020.03.30.06.36.Z /tmp/ibmsupt/hacmp/eventlogs.2020.04.28.12.48.Z :cl_sel[144] wc -l :cl_sel[144] 2> /dev/null :cl_sel[144] FFDC_COUNT=' 3' :cl_sel[145] [ ' 3' -gt 5 ] :cl_sel[155] dspmsg scripts.cat 10059 'FFDC event log collection saved to /tmp/ibmsupt/hacmp/eventlogs.2020.04.28.12.48\n' /tmp/ibmsupt/hacmp/eventlogs.2020.04.28.12.48 FFDC event log collection saved to /tmp/ibmsupt/hacmp/eventlogs.2020.04.28.12.48 :cl_sel[157] exit 0 WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 360 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 390 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 420 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 450 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 480 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 540 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 600 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 660 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 720 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 780 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 900 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 1020 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 1140 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 1260 seconds. Please check cluster status. WARNING: Cluster l79fis84_94 has been running recovery program 'TE_FAIL_NODE' for 1380 seconds. Please check cluster status. /ap/os/script/cluster/fhdba25_stop.sh[60]: 7471414 Terminated allinstances_stop: Info: process_vgs: all background jobs are now complete for a20_fhdba25. +a20_fhdba25:stop_server[+133] [ 0 -ne 0 ] +a20_fhdba25:stop_server[+161] ALLNOERRSERV=All_nonerror_servers +a20_fhdba25:stop_server[+162] [ REAL = EMUL ] +a20_fhdba25:stop_server[+167] cl_RMupdate resource_down All_nonerror_servers stop_server
- 你这个估计是脚本的问题,你可以先把脚本去掉,然后启动和切换ha正常吗?如果正常在加入应用脚本试一下,如果不正常就需要修改脚本,也可以在不加脚本的情况一下启动ha然后手动启动和停止脚本试试