1、系统升级之前ha启动正常,切换正常。
2、系统从5305
升级到5312级别
oslevel 显示5300-10,主要有一个包没有打上去
jtfmistest:/>#oslevel -rl 5300-12
Fileset
ActualLevel
Recommended ML
-----------------------------------------------------------------------------
ifor_ls.html.en_US.base.cli
5.3.7.0
5.3.8.0
3、hacmp从5400 升级到5410最新包。
升级以后ha在第二个启动的节点(可以是A节点,也可以是B节点,按先后算),hacmp无法启动,运行clstart显示进程正常启动,但是资源不启动,hacmp.out无任何输出。
4、回退hacmp所有补丁包,故障现象照旧。
5、试删除hacmp包失败,发现EMCpowerpath 软件对hacmp5.4
4个基础包有依赖,所以只能直接再安装5410版本hacmp,然后打上最新补丁。所预料到的,故障依旧。
注:目前系统版本回退未测试,因为本次升级主要目标就是系统补丁升级。希望能找到问题所在。
6、powerpath版本:5.0.0
build 161
7、EMC
ha Custom Disk Methods未配置,ha event未配置,(原未配置也运行正常)。官方指定ha5.4以后无需配置。
现在已经测试,ha 经过以上配置,故障依旧。
注:我们环境另外一套系统也有类似情况,第二节点不能启动,但重启clstrmgrES进程后就能启动,本环境重启也不能启动。同样用EMC存储。
第二节点clstat 输出
Group Name
Group State
Application state
Node
--------------------------------------------------------------------------------
-------------------------------------
prod_res
OFFLINE(其实是启动的,重启clstrmgrES进程后状态就能正常显示)
prodapp
OFFLINE
test_res
OFFLINE
Command: failed
stdout: yes
stderr: no
Before command completion, additionalinstructions may appear below.
[TOP]
cldump: Waiting for the Cluster SMUX peer(clstrmgrES)
(1节点能正常显示,日过2节点ha先启动也能正常显示)
to stabilize.............
Failed retrieving cluster information.
There are a number of possible causes:
clinfoES or snmpd subsystems are notactive.
snmp is unresponsive.
snmp is not configured correctly.
Cluster services are not active on anynodes.
Refer to the HACMP Administration Guide formore information.
集群基本配置(无变动):
Cluster Name: jtfmiscl
Cluster Connection Authentication Mode:Standard
Cluster Message Authentication Mode: None
Cluster Message Encryption: None
Use Persistent Labels for Communication: No
There are 2 node(s) and 1 network(s)defined
NODE jtfmis:
Network net_ether_01
srv2
10.150.4.243
srv1
10.150.4.242
boot1
30.0.0.1
stdby1
50.0.0.1
NODE jtfmistest:
Network net_ether_01
srv2
10.150.4.243
srv1
10.150.4.242
stdby2
50.0.0.2
boot2
30.0.0.2
Resource Group prod_res
Startup Policy
Online On HomeNode Only
Fallover Policy
Fallover To NextPriority Node In The List
Fallback Policy
Never Fallback
Participating Nodes
jtfmisjtfmistest
Service IP Label
srv1
Resource Group test_res
Startup Policy
Online On HomeNode Only
Fallover Policy
Fallover To NextPriority Node In The List
Fallback Policy
Never Fallback
Participating Nodes
jtfmistest jtfmis
Service IP Label
srv2
Total Heartbeats Missed:
0
Cluster Topology Start Time:
03/26/2012 12:10:37
集群心跳情况:
jtfmistest:/>#lssrc -ls topsvcs
Subsystem
Group
PID
Status
topsvcs
topsvcs
1630276 active
Network Name
Indx Defd
Mbrs
St
Adapter ID
Group ID
net_ether_01_0 [ 0] 2
2
S
30.0.0.2
30.0.0.2
net_ether_01_0 [ 0] en0
0x476fec3f
0x476fec40
HB Interval = 1.000 secs. Sensitivity = 10missed beats
Missed HBs: Total: 0 Current group: 0
Packets sent
: 99381 ICMP 0 Errors: 0 No mbuf: 0
Packets received: 118263 ICMP 0 Dropped: 0
NIM's PID: 2424882
net_ether_01_1 [ 1] 2
2
S
50.0.0.2
50.0.0.2
net_ether_01_1 [ 1] en1
0x476fec41
0x476fec42
HB Interval = 1.000 secs. Sensitivity = 10missed beats
Missed HBs: Total: 0 Current group: 0
Packets sent
: 99386 ICMP 0 Errors: 0 No mbuf: 0
Packets received: 118265 ICMP 0 Dropped: 0
NIM's PID: 1753114
2locally connected Clients with PIDs:
haemd(1179800) hagsd(2031782)
Fast Failure Detection available but off.
Dead Man Switch Enabled:
reset interval = 1 seconds
trip
interval = 20 seconds
Client Heartbeating Disabled.
Configuration Instance = 44
Daemon employs no security
Segments pinned: Text Data.
Text segment size: 900 KB. Static data segment size: 1493 KB.
Dynamic data segment size: 5377. Number of outstanding malloc: 139
User time 8 sec. System time 5 sec.
Number of page faults: 1. Process swapped out 0 times.
Number of nodes up: 2. Number of nodes down: 0.
Ha软件包列表:
Fileset
Level
State
Description
----------------------------------------------------------------------------
Path: /usr/lib/objrepos
cluster.adt.es.client.include
5.4.1.0
COMMITTED
ES Client Include Files
cluster.adt.es.client.samples.clinfo
5.4.1.0
COMMITTED
ES Client CLINFO Samples
cluster.adt.es.client.samples.clstat
5.4.1.0
COMMITTED
ES Client Clstat Samples
cluster.adt.es.client.samples.libcl
5.4.1.0
COMMITTED
ES Client LIBCL Samples
cluster.adt.es.java.demo.monitor
5.4.1.0
COMMITTED
ES Web Based Monitor Demo
cluster.doc.en_US.es.html
5.4.1.0
COMMITTED
HAES Web-based HTML
Documentation - U.S. English
cluster.doc.en_US.es.pdf
5.4.1.0
COMMITTED
HAES PDF Documentation - U.S.
English
cluster.es.cfs.rte
5.4.1.0
COMMITTED
ES Cluster File System Support
cluster.es.client.lib
5.4.1.7
APPLIED
ES Client Libraries
cluster.es.client.rte
5.4.1.11
APPLIED
ES Client Runtime
cluster.es.client.utils
5.4.1.10
APPLIED
ES Client Utilities
cluster.es.client.wsm
5.4.1.0
COMMITTED
Web based Smit
cluster.es.cspoc.cmds
5.4.1.12
APPLIED
ES CSPOC Commands
cluster.es.cspoc.dsh
5.4.1.0
APPLIED
ES CSPOC dsh
cluster.es.cspoc.rte
5.4.1.7
APPLIED
ES CSPOC Runtime Commands
cluster.es.server.cfgast
5.4.1.0
COMMITTED
ES Two-Node Configuration
Assistant
cluster.es.server.diag
5.4.1.12
APPLIED
ES Server Diags
cluster.es.server.events
5.4.1.12
APPLIED
ES Server Events
cluster.es.server.rte
5.4.1.12
APPLIED
ES Base Server Runtime
cluster.es.server.simulator
5.4.1.0
COMMITTED
ES Cluster Simulator
cluster.es.server.testtool
5.4.1.0
COMMITTED
ES Cluster Test Tool
cluster.es.server.utils
5.4.1.12
APPLIED
ES Server Utilities
cluster.license
5.4.1.0
COMMITTED
HACMP Electronic License
Path: /etc/objrepos
cluster.es.client.lib
5.4.1.7
APPLIED
ES Client Libraries
cluster.es.client.rte
5.4.1.11
APPLIED
ES Client Runtime
cluster.es.cspoc.rte
5.4.0.0
COMMITTED
ES CSPOC Runtime Commands
cluster.es.server.diag
5.4.0.0
COMMITTED
ES Server Diags
cluster.es.server.events
5.4.0.0
COMMITTED
ES Server Events
cluster.es.server.rte
5.4.1.12
APPLIED
ES Base Server Runtime
cluster.es.server.simulator
5.4.1.0
COMMITTED
ES Cluster Simulator
cluster.es.server.utils
5.4.1.12
APPLIED
ES Server Utilities
Path: /usr/share/lib/objrepos
cluster.man.en_US.es.data
5.4.1.0
COMMITTED
ES Man Pages - U.S. English
注:powerpath 对 上表4个5.4.0.0 的ha包依赖