在单位的三台服务器上装系统,配HADR多备机,结果备机都起不来,是DB2 AESE 10.1,操作系统windows xp
老是报SQL 1224N错误,实例在start备机的时候导致被停掉。
在家里我决心实验一下
四台虚拟机,网络都是桥接模式,centos 6.3, DB2 ESE 10.1
一主,一备,两辅备
主机配置
HADR database role = STANDARD
HADR local host name (HADR_LOCAL_HOST) = 192.168.1.105
HADR local service name (HADR_LOCAL_SVC) = 55001
HADR remote host name (HADR_REMOTE_HOST) = 192.168.1.106
HADR remote service name (HADR_REMOTE_SVC) = 55002
HADR instance name of remote server (HADR_REMOTE_INST) = db2hadr
HADR timeout value (HADR_TIMEOUT) = 120
HADR target list (HADR_TARGET_LIST) = 192.168.1.106:55002|192.168.1.104:55003|192.168.1.107:55004
HADR log write synchronization mode (HADR_SYNCMODE) = SYNC
HADR spool log data limit (4KB) (HADR_SPOOL_LIMIT) = 0
HADR log replay delay (seconds) (HADR_REPLAY_DELAY) = 0
HADR peer window duration (seconds) (HADR_PEER_WINDOW) = 0
主备配置
HADR database role = STANDBY
HADR local host name (HADR_LOCAL_HOST) = 192.168.1.106
HADR local service name (HADR_LOCAL_SVC) = 55002
HADR remote host name (HADR_REMOTE_HOST) = 192.168.1.105
HADR remote service name (HADR_REMOTE_SVC) = 55001
HADR instance name of remote server (HADR_REMOTE_INST) = db2hadr
HADR timeout value (HADR_TIMEOUT) = 120
HADR target list (HADR_TARGET_LIST) = 192.168.1.105:55001|192.168.1.104:55003|192.168.1.107:55004
HADR log write synchronization mode (HADR_SYNCMODE) = SYNC
HADR spool log data limit (4KB) (HADR_SPOOL_LIMIT) = 0
HADR log replay delay (seconds) (HADR_REPLAY_DELAY) = 0
HADR peer window duration (seconds) (HADR_PEER_WINDOW) = 0
辅备1
HADR database role = STANDBY
HADR local host name (HADR_LOCAL_HOST) = 192.168.1.104
HADR local service name (HADR_LOCAL_SVC) = 55003
HADR remote host name (HADR_REMOTE_HOST) = 192.168.1.105
HADR remote service name (HADR_REMOTE_SVC) = 55001
HADR instance name of remote server (HADR_REMOTE_INST) = db2hadr
HADR timeout value (HADR_TIMEOUT) = 120
HADR target list (HADR_TARGET_LIST) = 192.168.1.106:55002|192.168.1.105:55001|192.168.1.107:55004
HADR log write synchronization mode (HADR_SYNCMODE) = SUPERASYNC
HADR spool log data limit (4KB) (HADR_SPOOL_LIMIT) = 0
HADR log replay delay (seconds) (HADR_REPLAY_DELAY) = 0
HADR peer window duration (seconds) (HADR_PEER_WINDOW) = 0
辅备2
HADR database role = STANDBY
HADR local host name (HADR_LOCAL_HOST) = 192.168.1.107
HADR local service name (HADR_LOCAL_SVC) = 55004
HADR remote host name (HADR_REMOTE_HOST) = 192.168.1.105
HADR remote service name (HADR_REMOTE_SVC) = 55001
HADR instance name of remote server (HADR_REMOTE_INST) = db2hadr
HADR timeout value (HADR_TIMEOUT) = 120
HADR target list (HADR_TARGET_LIST) = 192.168.1.106:55002|192.168.1.105:55001|192.168.1.104:55003
HADR log write synchronization mode (HADR_SYNCMODE) = SUPERASYNC
HADR spool log data limit (4KB) (HADR_SPOOL_LIMIT) = 0
HADR log replay delay (seconds) (HADR_REPLAY_DELAY) = 0
HADR peer window duration (seconds) (HADR_PEER_WINDOW) = 0
备机上start hadr on db HADB as standby全部成功,数据库也是激活的
但是主机上start hadr on db HADB as primary一直不成功。
报错如下:
[db2hadr@ServerA ~]$ db2 start hadr on db HADB as primary
SQL1768N Unable to start HADR. Reason code = "7".
diag文件报错如下
2012-12-07-23.06.16.340658+480 I1384805G673 LEVEL: Error
PID : 1297 TID : 3010456432 PROC : db2sysc 0
INSTANCE: db2hadr NODE : 000
HOSTNAME: ServerA
EDUID : 50 EDUNAME: db2hadrp.0.1 (HADB) 0
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEdu::hdrEduP, probe:20390
MESSAGE : ZRC=0x8280001A=-2105540582=HDR_ZRC_NO_STANDBY
"Comm time-out in unforced HADR primary start, to avoid split-brain"
DATA #1 :
HADR primary did not establish connection with standby within timeout and will
shut down. BY FORCE option required to start primary without standby.
Timeout seconds = 120.
2012-12-07-23.06.16.341127+480 I1386252G480 LEVEL: Error
PID : 1297 TID : 3010456432 PROC : db2sysc 0
INSTANCE: db2hadr NODE : 000
HOSTNAME: ServerA
EDUID : 50 EDUNAME: db2hadrp.0.1 (HADB) 0
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEdu::hdrEduEntry, probe:21100
MESSAGE : ZRC=0x8280001A=-2105540582=HDR_ZRC_NO_STANDBY
"Comm time-out in unforced HADR primary start, to avoid split-brain"
2012-12-07-23.06.16.396983+480 I1386733G595 LEVEL: Error
PID : 1297 TID : 3027233648 PROC : db2sysc 0
INSTANCE: db2hadr NODE : 000 DB : HADB
APPHDL : 0-16 APPID: *LOCAL.db2hadr.121207150415
AUTHID : DB2HADR HOSTNAME: ServerA
EDUID : 39 EDUNAME: db2agent (HADB) 0
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSpawnPrimaryEdus, probe:23300
MESSAGE : ZRC=0x8280001A=-2105540582=HDR_ZRC_NO_STANDBY
"Comm time-out in unforced HADR primary start, to avoid split-brain"
2012-12-07-23.06.16.397239+480 I1387329G588 LEVEL: Error
PID : 1297 TID : 3027233648 PROC : db2sysc 0
INSTANCE: db2hadr NODE : 000 DB : HADB
APPHDL : 0-16 APPID: *LOCAL.db2hadr.121207150415
AUTHID : DB2HADR HOSTNAME: ServerA
EDUID : 39 EDUNAME: db2agent (HADB) 0
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSpawnEdus, probe:25100
MESSAGE : ZRC=0x8280001A=-2105540582=HDR_ZRC_NO_STANDBY
"Comm time-out in unforced HADR primary start, to avoid split-brain"
2012-12-07-23.06.16.397373+480 I1387918G571 LEVEL: Error
PID : 1297 TID : 3027233648 PROC : db2sysc 0
INSTANCE: db2hadr NODE : 000 DB : HADB
APPHDL : 0-16 APPID: *LOCAL.db2hadr.121207150415
AUTHID : DB2HADR HOSTNAME: ServerA
EDUID : 39 EDUNAME: db2agent (HADB) 0
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEduShutdownOnSpawningError, probe:25300
DATA #1 :
HADR startup has been interrupted. Shutting down the spawned HADR EDU now.
2012-12-07-23.06.16.400830+480 I1393706G1004 LEVEL: Error
PID : 1297 TID : 3027233648 PROC : db2sysc 0
INSTANCE: db2hadr NODE : 000 DB : HADB
APPHDL : 0-16 APPID: *LOCAL.db2hadr.121207150415
AUTHID : DB2HADR HOSTNAME: ServerA
EDUID : 39 EDUNAME: db2agent (HADB) 0
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEduStartup, probe:21250
MESSAGE : ZRC=0x8280001A=-2105540582=HDR_ZRC_NO_STANDBY
"Comm time-out in unforced HADR primary start, to avoid split-brain"
DATA #1 : SQLCA, PD_DB2_TYPE_SQLCA, 136 bytes
sqlcaid : SQLCA sqlcabc: 136 sqlcode: 0 sqlerrml: 0
sqlerrmc:
sqlerrp : SQL10010
sqlerrd : (1) 0x00000000 (2) 0x00000000 (3) 0x00000000
(4) 0x00000000 (5) 0x00000000 (6) 0x00000000
sqlwarn : (1) (2) (3) (4) (5) (6)
(7) (8) (9) (10) (11)
sqlstate:
2012-12-07-23.06.16.401025+480 I1394711G562 LEVEL: Error
PID : 1297 TID : 3027233648 PROC : db2sysc 0
INSTANCE: db2hadr NODE : 000 DB : HADB
APPHDL : 0-16 APPID: *LOCAL.db2hadr.121207150415
AUTHID : DB2HADR HOSTNAME: ServerA
EDUID : 39 EDUNAME: db2agent (HADB) 0
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEduStartup, probe:21250
MESSAGE : SQL1768N Unable to start HADR. Reason code = "7".
DATA #1 : ZRC, PD_TYPE_ZRC, 4 bytes
0x8280001A
2012-12-07-23.06.16.424052+480 I1397914G771 LEVEL: Error
PID : 1297 TID : 3027233648 PROC : db2sysc 0
INSTANCE: db2hadr NODE : 000 DB : HADB
APPHDL : 0-16 APPID: *LOCAL.db2hadr.121207150415
AUTHID : DB2HADR HOSTNAME: ServerA
EDUID : 39 EDUNAME: db2agent (HADB) 0
FUNCTION: DB2 UDB, data protection services, sqlpCheckToStartHadr, probe:4000
MESSAGE : ZRC=0x8010006D=-2146434963=SQLP_RC_CA_BUILT
"SQLCA has been built and saved in component specific control block."
DATA #1 : String, 36 bytes
sqlcode / myAction / isFirstConnect:
DATA #2 : Sqlcode, PD_TYPE_SQLCODE, 4 bytes
-1768
DATA #3 : Hex integer, 4 bytes
0x00002000
DATA #4 : Boolean, 1 bytes
true
2012-12-07-23.06.16.424597+480 I1398686G616 LEVEL: Severe
PID : 1297 TID : 3027233648 PROC : db2sysc 0
INSTANCE: db2hadr NODE : 000 DB : HADB
APPHDL : 0-16 APPID: *LOCAL.db2hadr.121207150415
AUTHID : DB2HADR HOSTNAME: ServerA
EDUID : 39 EDUNAME: db2agent (HADB) 0
FUNCTION: DB2 UDB, base sys utilities, sqledint, probe:5570
MESSAGE : ZRC=0x8010006D=-2146434963=SQLP_RC_CA_BUILT
"SQLCA has been built and saved in component specific control block."
DATA #1 : Sqlcode, PD_TYPE_SQLCODE, 4 bytes
-1768
2012-12-07-23.06.16.424803+480 I1399303G874 LEVEL: Severe
PID : 1297 TID : 3027233648 PROC : db2sysc 0
INSTANCE: db2hadr NODE : 000 DB : HADB
APPHDL : 0-16 APPID: *LOCAL.db2hadr.121207150415
AUTHID : DB2HADR HOSTNAME: ServerA
EDUID : 39 EDUNAME: db2agent (HADB) 0
FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::FirstConnect, probe:8716
DATA #1 : SQLCA, PD_DB2_TYPE_SQLCA, 136 bytes
sqlcaid : SQLCA sqlcabc: 136 sqlcode: -1768 sqlerrml: 1
sqlerrmc: 7
sqlerrp : SQL10010
sqlerrd : (1) 0x00000000 (2) 0x00000007 (3) 0x00000000
(4) 0x00000000 (5) 0x00000000 (6) 0x00000000
sqlwarn : (1) (2) (3) (4) (5) (6)
(7) (8) (9) (10) (11)
奇怪的是在主备机上netstat -a都看不到上述配置端口
netstat -a
tcp 0 52 ServerB:ssh 192.168.1.10:onehome-remote ESTABLISHED
unix 2 [ ] DGRAM 11089 @00018
unix 7 [ ] DGRAM 10588 /dev/log
unix 2 [ ACC ] STREAM LISTENING 10675 /var/run/rpcbind.sock
unix 2 [ ] DGRAM 11100 /var/run/fcm/fcm_clif
unix 3 [ ] DGRAM 11019 @/com/intel/lldpad
unix 2 [ ] DGRAM 11020
unix 3 [ ] STREAM CONNECTED 10927
unix 3 [ ] STREAM CONNECTED 10926
unix 2 [ ] DGRAM 10768
[db2hadr@ServerB ~]$ netstat -a |grep 50
unix 2 [ ] DGRAM 15087
[db2hadr@ServerB ~]$ netstat -a |grep 55
[db2hadr@ServerB ~]$ netstat -a |grep 5500
[db2hadr@ServerB ~]$ netstat -a
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 *:sunrpc *:* LISTEN
tcp 0 0 *:db2c_db2hadr *:* LISTEN
tcp 0 0 *:60243 *:* LISTEN
tcp 0 0 *:ssh *:* LISTEN
tcp 0 0 ServerB:DB2_HADR_B *:* LISTEN
tcp 0 52 ServerB:ssh 192.168.1.10:onehome-remote ESTABLISHED
tcp 0 0 *:59919 *:* LISTEN
tcp 0 0 *:sunrpc *:* LISTEN
tcp 0 0 *:ssh *:* LISTEN
udp 0 0 *:bootpc *:*
udp 0 0 *:47060 *:*
udp 0 0 *:sunrpc *:*
udp 0 0 *:acmaint_dbd *:*
udp 0 0 *:793 *:*
udp 0 0 *:36416 *:*
udp 0 0 *:sunrpc *:*
udp 0 0 *:acmaint_dbd *:*
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags Type State I-Node Path
unix 2 [ ACC ] STREAM LISTENING 8543 @/com/ubuntu/upstart
unix 2 [ ] DGRAM 11089 @00018
unix 7 [ ] DGRAM 10588 /dev/log
unix 2 [ ] DGRAM 8715 @/org/kernel/udev/udevd
unix 2 [ ACC ] STREAM LISTENING 10675 /var/run/rpcbind.sock
unix 2 [ ] DGRAM 11100 /var/run/fcm/fcm_clif
unix 3 [ ] DGRAM 11019 @/com/intel/lldpad
unix 2 [ ] DGRAM 22270
unix 3 [ ] STREAM CONNECTED 18292
unix 3 [ ] STREAM CONNECTED 18291
unix 2 [ ] DGRAM 18288
unix 2 [ ] DGRAM 15087
unix 2 [ ] DGRAM 11020
unix 3 [ ] STREAM CONNECTED 10927
unix 3 [ ] STREAM CONNECTED 10926
unix 2 [ ] DGRAM 10768
unix 3 [ ] DGRAM 8732
unix 3 [ ] DGRAM 8731收起