今天配置一个PowerHA 6.1集群, 2台AIX5.3的服务器, 每台机器的1块网卡做服务网卡,服务地址采用别名的方式.配置完成后, 双机可以正常启动, 但是资源组切换后, 主机不能再访问其他网段的机器, 也就是说默认路由不起作用了. 和这个[1]帖子的描述完全一样. 资源组切换前的route信息为:
# netstat -rn
Routing tables
Destination Gateway Flags Refs Use If Exp Groups
Route Tree for Protocol Family 2 (Internet):
default 10.209.3.62 UG 0 5 en2 - -
10.209.3.0 10.209.3.45 UHSb 0 0 en2 - - =>
10.209.3/26 10.209.3.45 U 1 0 en2 - -
10.209.3.45 127.0.0.1 UGHS 0 0 lo0 - -
10.209.3.63 10.209.3.45 UHSb 0 0 en2 - -
127/8 127.0.0.1 U 8 24709 lo0 - -
192.168.2.0 192.168.2.45 UHSb 0 0 en2 - - =>
192.168.2/26 192.168.2.45 U 2 3240 en2 - -
192.168.2.45 127.0.0.1 UGHS 0 5739 lo0 - -
192.168.2.63 192.168.2.45 UHSb 0 1395 en2 - -
192.168.3.0 192.168.3.45 UHSb 0 0 en0 - - =>
192.168.3/26 192.168.3.45 U 2 3869 en0 - -
192.168.3.45 127.0.0.1 UGHS 0 13164 lo0 - -
192.168.3.63 192.168.3.45 UHSb 0 768 en0 - -
192.168.100.0 192.168.100.36 UHSb 0 0 en3 - - =>
192.168.100/26 192.168.100.36 U 1 7078 en3 - -
192.168.100.36 127.0.0.1 UGHS 0 6343 lo0 - -
192.168.100.63 192.168.100.36 UHSb 0 4 en3 - -
Route Tree for Protocol Family 24 (Internet v6):
::1 ::1 UH 0 591 lo0 - -
资源组切换之后的路由变成了:
# netstat -rn
Routing tables
Destination Gateway Flags Refs Use If Exp Groups
Route Tree for Protocol Family 2 (Internet):
default 10.209.3.62 U 0 0 en0 - -
10.209.3.0 10.209.3.45 UHSb 0 0 en0 - - =>
10.209.3/26 10.209.3.45 U 0 1 en0 - -
10.209.3.45 127.0.0.1 UGHS 0 1 lo0 - -
10.209.3.63 10.209.3.45 UHSb 0 0 en0 - -
127/8 127.0.0.1 U 4 25581 lo0 - -
192.168.2.0 192.168.2.45 UHSb 0 0 en2 - - =>
192.168.2/26 192.168.2.45 U 0 3476 en2 - -
192.168.2.45 127.0.0.1 UGHS 0 5850 lo0 - -
以上路由信息来自资料[1].
发生变化的主要是默认路由, 由:
default 10.209.3.62 UG 0 5 en2 - -
变成了:
default 10.209.3.62 U 0 0 en0 - -
AIX route的输出结果的FLAG列中的U表示其状态为UP, G表示这是一个GATEWAY,
观察HACMP的日志其中有如下信息:
+filetrans_rg:clifconfig[207] ifconfig en10 delete 192.168.0.13
+filetrans_rg:cl_swap_IP_address[+1280] [[ -n ]]
+filetrans_rg:cl_swap_IP_address[+1303] /usr/es/sbin/cluster/.restore_routes
+filetrans_rg:.restore_routes[+9] date
+filetrans_rg:.restore_routes[+9] : Starting /usr/es/sbin/cluster/.restore_routes at Wed Mar 16 17:04:44 BEIST 2011
+filetrans_rg:.restore_routes[+11] cl_route_change default 127.0.0.1 192.168.0.254 inet
+filetrans_rg:cl_swap_IP_address[+1304] : Completed /usr/es/sbin/cluster/.restore_routes with return code 0.
+filetrans_rg:cl_swap_IP_address[+1304] [[ __AIX__ = __AIX__ ]]
+filetrans_rg:cl_swap_IP_address[+1305] enable_pmtu_gated
Setting tcp_pmtu_discover to 1
Setting udp_pmtu_discover to 1
+filetrans_rg:cl_swap_IP_address[+1308] cl_hats_adapter en10 -d 192.168.0.13 alias
+filetrans_rg:cl_hats_adapter[+50] [[ high = high ]]
+filetrans_rg:cl_hats_adapter[+50] version=1.40
+filetrans_rg:cl_hats_adapter[+51] +filetrans_rg:cl_hats_adapter[+51] cl_get_path
HA_DIR=es
+filetrans_rg:cl_hats_adapter[+52] +filetrans_rg:cl_hats_adapter[+52] cl_get_path -S
可以看出, HACMP中负责恢复路由任务的是/usr/es/sbin/cluster/.restore_routes, 该脚本内容如下:
#cat /usr/es/sbin/cluster/.restore_routes
#!/bin/ksh
#
# Script created by cl_swap_IP_address on Wed Mar 16 17:04:44 BEIST 2011
#
PATH=/usr/es/sbin/cluster:/usr/es/sbin/cluster/utilities:/usr/es/sbin/cluster/events:
/usr/es/sbin/cluster/events/utils:/usr/es/sbin/cluster/events/cmd:/usr/es/sbin/cluster/diag:
/usr/es/sbin/cluster/etc:/usr/es/sbin/cluster/sbin:/usr/es/sbin/cluster/cspoc:
/usr/es/sbin/cluster/conversion:/usr/es/sbin/cluster/events/emulate:
/usr/es/sbin/cluster/events/emulate/driver:/usr/es/sbin/cluster/events/emulate/utils:
/usr/es/sbin/cluster/tguides/bin:/usr/es/sbin/cluster/tguides/classes:
/usr/es/sbin/cluster/tguides/images:/usr/es/sbin/cluster/tguides/scripts:
/usr/es/sbin/cluster/glvm/utils:/usr/es/sbin/cluster/wpar:/usr/bin:/etc:
/usr/sbin:/usr/ucb:/usr/bin/X11:/sbin
PS4='${GROUPNAME:++$GROUPNAME}:${PROGNAME:-${0##*/}}${PS4_TIMER:+($SECONDS)}${PS4_LOOP:+:$PS4_LOOP}[${ERRNO:+${PS4_FUNC:-}+}$LINENO] '
export VERBOSE_LOGGING=${VERBOSE_LOGGING:-"high"}
[[ "$VERBOSE_LOGGING" = "high" ]] && set -x
: Starting $0 at $(date)
#
cl_route_change default 127.0.0.1 192.168.0.254 inet
实际上负责路由改变的是cl_route_change命令,这是一个二进制文件, 在IBM和google中搜索cl_route_change, 可以搜到结果[2][3][4][5], 从这些文章确认这是HACMP的一个BUG, 通过打efax iz63775或者升级到PowerHA 6 SP01.
因为之前从未使用过efax, 今天顺手玩了一把, 记录如下:
[/tmp/hacmp]#emgr -e IZ63775.epkg.Z
+-----------------------------------------------------------------------------+
Efix Manager Initialization
+-----------------------------------------------------------------------------+
Initializing log /var/adm/ras/emgr.log ...
Efix package file is: /tmp/hacmp/IZ63775.epkg.Z
MD5 generating command is /usr/bin/csum
MD5 checksum is 8ba66435963cf3318502e7953bfebf8a
Accessing efix metadata ...
Processing efix label "IZ63775" ...
Verifying efix control file ...
+-----------------------------------------------------------------------------+
Installp Prerequisite Verification
+-----------------------------------------------------------------------------+
Verifying prerequisite file ...
Checking prerequisites ...
Prerequisite Number: 1
Fileset: cluster.es.server.events
Minimal Level: 6.1.0.0
Maximum Level: 6.1.0.0
Actual Level: 6.1.0.0
Type: PREREQ
Requisite Met: yes
All prerequisites have been met.
+-----------------------------------------------------------------------------+
Processing APAR reference file
+-----------------------------------------------------------------------------+
APAR reference set to NONE. Interim fix is not enabled for automatic removal.
+-----------------------------------------------------------------------------+
Efix Attributes
+-----------------------------------------------------------------------------+
LABEL: IZ63775
PACKAGING DATE: Fri Oct 23 12:22:46 CDT 2009
ABSTRACT: Deflt route prblm in base HA 610
PACKAGER VERSION: 7
VUID: 00CCCC5B4C00102312104609
REBOOT REQUIRED: no
BUILD BOOT IMAGE: no
PRE-REQUISITES: yes
SUPERSEDE: no
PACKAGE LOCKS: no
E2E PREREQS: no
FIX TESTED: no
ALTERNATE PATH: None
EFIX FILES: 1
Install Scripts:
PRE_INSTALL: no
POST_INSTALL: no
PRE_REMOVE: no
POST_REMOVE: no
File Number: 1
LOCATION: /usr/es/sbin/cluster/events/utils/cl_route_change
FILE TYPE: Standard (file or executable)
INSTALLER: installp
SIZE: 76
ACL: DEFAULT
CKSUM: 44210
PACKAGE: cluster.es.server.events
MOUNT INST: no
+-----------------------------------------------------------------------------+
Efix Description
+-----------------------------------------------------------------------------+
This is a fix to cl_route_change for a problem introduced
in base PowerHA 610.
+-----------------------------------------------------------------------------+
Efix Lock Management
+-----------------------------------------------------------------------------+
Checking locks for file /usr/es/sbin/cluster/events/utils/cl_route_change ...
All files have passed lock checks.
+-----------------------------------------------------------------------------+
Space Requirements
+-----------------------------------------------------------------------------+
Checking space requirements ...
Space statistics (in 512 byte-blocks):
File system: /usr, Free: 16042168, Required: 1288, Deficit: 0.
File system: /tmp, Free: 7191192, Required: 2570, Deficit: 0.
+-----------------------------------------------------------------------------+
Efix Installation Setup
+-----------------------------------------------------------------------------+
Unpacking efix package file ...
Initializing efix installation ...
+-----------------------------------------------------------------------------+
Efix State
+-----------------------------------------------------------------------------+
Setting efix state to: INSTALLING
+-----------------------------------------------------------------------------+
File Archiving
+-----------------------------------------------------------------------------+
Saving all files that will be replaced ...
Save directory is: /usr/emgrdata/efixdata/IZ63775/save
File 1: Saving /usr/es/sbin/cluster/events/utils/cl_route_change as EFSAVE1 ...
+-----------------------------------------------------------------------------+
Efix File Installation
+-----------------------------------------------------------------------------+
Installing all efix files:
Installing efix file #1 (File: /usr/es/sbin/cluster/events/utils/cl_route_change) ...
/usr/sbin/emgr[160]: query: not found.
Total number of efix files installed is 1.
All efix files installed successfully.
+-----------------------------------------------------------------------------+
Package Locking
+-----------------------------------------------------------------------------+
Processing package locking for all files.
File 1: locking installp fileset cluster.es.server.events.
All package locks processed successfully.
+-----------------------------------------------------------------------------+
Reboot Processing
+-----------------------------------------------------------------------------+
Reboot is not required by this efix package.
+-----------------------------------------------------------------------------+
Efix State
+-----------------------------------------------------------------------------+
Setting efix state to: STABLE
+-----------------------------------------------------------------------------+
Operation Summary
+-----------------------------------------------------------------------------+
Log file is /var/adm/ras/emgr.log
EPKG NUMBER LABEL OPERATION RESULT
=========== ============== ================= ==============
1 IZ63775 INSTALL SUCCESS
Return Status = SUCCESS
[/tmp]#emgr -l
ID STATE LABEL INSTALL TIME ABSTRACT
=== ===== ========== ================== ======================================
1 S IZ63775 03/16/11 16:08:28 Deflt route prblm in base HA 610
STATE codes:
S = STABLE
M = MOUNTED
U = UNMOUNTED
Q = REBOOT REQUIRED
B = BROKEN
I = INSTALLING
R = REMOVING
T = TESTED
[/tmp]#emgr -l
There is no efix data on this system.
[/tmp]#emgr -r -L IZ63775
+-----------------------------------------------------------------------------+
Efix Manager Initialization
+-----------------------------------------------------------------------------+
Initializing log /var/adm/ras/emgr.log ...
Accessing efix metadata ...
Processing efix label "IZ63775" ...
+-----------------------------------------------------------------------------+
Efix Attributes
+-----------------------------------------------------------------------------+
LABEL: IZ63775
INSTALL DATE: 03/16/11 16:08:28
STATE: STABLE
ABSTRACT: Deflt route prblm in base HA 610
PACKAGER VERSION: 7
VUID: 00CCCC5B4C00102312104609
REBOOT REQUIRED: no
BUILD BOOT IMAGE: no
PRE-REQUISITES: yes
SUPERSEDE: no
PACKAGE LOCKS: no
E2E PREREQS: no
FIX TESTED: no
ALTERNATE PATH: None
EFIX FILES: 1
Install Scripts:
PRE_INSTALL: no
POST_INSTALL: no
PRE_REMOVE: no
POST_REMOVE: no
File Number: 1
LOCATION: /usr/es/sbin/cluster/events/utils/cl_route_change
FILE TYPE: Standard (file or executable)
INSTALLER: installp
SIZE: 76
ACL: DEFAULT
CKSUM: 44210
PACKAGE: cluster.es.server.events
MOUNT INST: no
+-----------------------------------------------------------------------------+
Efix Description
+-----------------------------------------------------------------------------+
This is a fix to cl_route_change for a problem introduced
in base PowerHA 610.
+-----------------------------------------------------------------------------+
Space Requirements
+-----------------------------------------------------------------------------+
Checking space requirements ...
Space statistics (in 512 byte-blocks):
File system: /usr, Free: 16041936, Required: 1247, Deficit: 0.
+-----------------------------------------------------------------------------+
Efix State
+-----------------------------------------------------------------------------+
Setting efix state to: REMOVING
+-----------------------------------------------------------------------------+
Package Locking
+-----------------------------------------------------------------------------+
Processing package unlocking for all files.
File 1: unlocking installp fileset cluster.es.server.events.
All package locks processed successfully.
+-----------------------------------------------------------------------------+
Efix File Removal
+-----------------------------------------------------------------------------+
Setting up for removal of efix files ...
Removing all efix files (in reverse order of installation):
Removing efix file #1 (File: /usr/es/sbin/cluster/events/utils/cl_route_change) ...
Total number of efix files removed is 1.
+-----------------------------------------------------------------------------+
Reboot Processing
+-----------------------------------------------------------------------------+
Reboot is not required by this efix package.
+-----------------------------------------------------------------------------+
Operation Summary
+-----------------------------------------------------------------------------+
Log file is /var/adm/ras/emgr.log
EFIX NUMBER LABEL OPERATION RESULT
=========== ============== ================= ==============
1 IZ63775 REMOVE SUCCESS
Return Status = SUCCESS
收起