系统环境:
oracleRAC 11.2.0.4
AIX 6100-07-04-1216
hacmp 5.5
硬件环境:
数据存储两台、power7小机两台
系统架构:
1、两台小机做oracleRAC集群,数据库数据存储在两台存储上,利用ASM做两个failgroup实现数据镜像。
2、hacmp搭建集群创建oracle OCRVOTE的共享卷组(两台存储各取一块磁盘,实现存储冗余),创建ocrvote 3个共享卷,再创建Votedisk。
3、hamcp配置包括两个磁盘心跳ether0、ether1,ether0为hacmp心跳地址,ether1为oracle OCRVOTE心跳地址。两个磁盘心跳diskdb1、diskhb2。三个共享卷组,两个是磁盘心跳卷组,一个是OCRVOTE用共享卷组。
目前遇到的问题是:
hamcp一个节点不定时发生宕机,疑似都是由于hacmp心跳异常引起。
hacmp的状态如下图:
共享卷组状态正常:
hdisk22 00f83b6be2ee8e92 votevg concurrent
hdisk23 00f83b6be2ee9137 votevg concurrent
hdisk24 00f83b6b115df95e diskhbvg concurrent
hdisk25 00f83b6b116fe412 diskhb2vg concurrent
1、aix系统日志显示如下:
2BFA76F6 1215043720 T S SYSPROC SYSTEM SHUTDOWN BY USER
9DBCFDEE 1215043920 T O errdemon ERROR LOGGING TURNED ON
AC9144F4 1215043020 T H ent17 HEA PORT DOWN
AC9144F4 1215042620 T H ent19 HEA PORT DOWN
EC0BCCD4 1215042420 T H ent9 ETHERNET DOWN
EC0BCCD4 1215042420 T H ent6 ETHERNET DOWN
AC9144F4 1215042320 T H ent18 HEA PORT DOWN
AC9144F4 1215042320 T H ent17 HEA PORT DOWN
2、hacmp日志:
nim.topsvcs.rhdisk24.vote日志一直报下面信息:
Heartbeat was NOT received. Missed HBs: 1. Limit: 4
Heartbeat was NOT received. Missed HBs: 1. Limit: 4
Heartbeat was NOT received. Missed HBs: 1. Limit: 4
Heartbeat was NOT received. Missed HBs: 1. Limit: 4
Heartbeat was NOT received. Missed HBs: 1. Limit: 4
Heartbeat was NOT received. Missed HBs: 1. Limit: 4
Heartbeat was NOT received. Missed HBs: 1. Limit: 4
Heartbeat was NOT received. Missed HBs: 1. Limit: 4
心跳线超时cpu太高了吧
configconfigtconfigconfigtoconconfigconfigtconfigconfigtoconconfigconfigtconfigconfigtoconfigconfigtconfigconfigtoconconfigconfigtconfigconfigtoconconfigconfigtconfigconfigtoconconfigconfigtconfigconfigtoconconconfigconfigtconfigconfigtoconconfigconfigtconfigconfigtoconconfigconfigtconfigconfigtocononconfigconfigtconfigconfigtoconconconfigconfigtconfigconfigtoconconfigconfigtconfigconfigtoconconfigconfigtconfigconfigtoconcnconfigconfigtconfigconfigtocononconfigconfigtconfigconfigtoconconconfigconfigtconfigconfigtoconconfigconfigtconfigconfigtoconconfigconfigtconfigconfigtocon
从输出看,Missed HBs 都是0,心跳都是好的吧;
https://www.toolbox.com/tech/operating-systems/question/hacmp-disk-heart-beat-080509/
dhb_read is a good utility, but first make sure there is no error in errpt
and /tmp/hacmp.out
In many busy systems it is a good idea to set the Failure Detection Rate of
disk heatbeat network module to slow. Specially if using HACMP 5.4 or above
with “fast failure detection” feature.