rac on ocfs拔public网线后两个节点都重起问题的处理过程

环境:

Oracle 11.2 ,Redhat 5.3的操作系统,两个节点RAC
ocr,voting disk,数据库都保存在ocfs2的文件系统/oradata上

故障描述:
单个节点的Public网卡断掉之后两个节点都自动重起了。

解决过程:
1)起初接到电话时根据上面的环境描述初步猜测原因:
ocfs2的心跳在public网卡上,public网卡断掉之后文件系统/oradata出现问题,造成clusterware的voting disk访问问题
从而引起两个节点同时重起。

2)几天后到用户现场,将ocfs2的心跳调整到private上之后:
拔一个public网线,vip自动切换到另一个节点(恢复正常)。
拔一个private网线,a节点自动重起,过了一小会儿b节点也自动重起!   ---还是不正常

看cssd.log日志(日志保存的不全,凑合看)
a节点的:
evmd(16859)]CRS-1401:EVMD started on node circxf1.
2010-06-25 11:20:26.292
[crsd(16861)]CRS-1201:CRSD started on node circxf1.
2010-06-25 11:30:55.954
[cssd(16778)]CRS-1612:Network communication with node circxf2 (1) missing for 50% of timeout interval.   Removal of this node from cluster in 14.220 seconds
2010-06-25 11:31:02.967
[cssd(16778)]CRS-1611:Network communication with node circxf2 (1) missing for 75% of timeout interval.   Removal of this node from cluster in 7.210 seconds
2010-06-25 11:31:07.976
[cssd(16778)]CRS-1610:Network communication with node circxf2 (1) missing for 90% of timeout interval.   Removal of this node from cluster in 2.200 seconds
2010-06-25 11:31:09.860
[cssd(16778)]CRS-1608:This node was evicted by node 1, circxf2; details at (:CSSNM00005 in /oracle/grid/11.2.0/log/circxf1/cssd/ocssd.log.
2010-06-25 11:35:26.071
[ohasd(5170)]CRS-2112:The OLR service started on node circxf1.
2010-06-25 11:35:26.412

b节点上:
2010-06-25 11:31:09.776: [ CSSD][1251248448]clssnmHandleSync: initleader 1 newleader 1
2010-06-25 11:31:09.776: [ CSSD][1251248448]clssnmQueueClientEvent:   Sending Event(2), type 2, incarn 171478018

2010-06-25 11:30:55.187: [ CSSD][1230268736]clssnmSendingThread: sent 5 status msgs to all nodes
2010-06-25 11:30:55.487: [ CSSD][1219778880]clssnmPollingThread: node circxf1 (2) at 50% heartbeat fatal, removal in 14.280 secon
ds
2010-06-25 11:30:55.487: [ CSSD][1219778880]clssnmPollingThread: node circxf1 (2) is impending reconfig, flag 197644, misstime 15
720
2010-06-25 11:30:55.487: [ CSSD][1219778880]clssnmPollingThread: local diskTimeout set to 27000 ms, remote disk timeout set to 27
000, impending reconfig status(1)
2010-06-25 11:30:59.805: [ CSSD][1135860032]clssnmvSchedDiskThreads: DiskPingMonitorThread sched delay 830 > margin 750 cur_ms 58
7888504 lastalive 587887674
2010-06-25 11:31:00.177: [ CSSD][1230268736]clssnmSendingThread: sending status msg to all nodes
2010-06-25 11:31:00.177: [ CSSD][1230268736]clssnmSendingThread: sent 5 status msgs to all nodes
2010-06-25 11:31:00.796: [ CSSD][1135860032]clssnmvSchedDiskThreads: DiskPingMonitorThread sched delay 820 > margin 750 cur_ms 58
7889494 lastalive 587888674
2010-06-25 11:31:02.500: [ CSSD][1219778880]clssnmPollingThread: node circxf1 (2) at 75% heartbeat fatal, removal in 7.270 second
s
2010-06-25 11:31:02.886: [ CSSD][1135860032]clssnmvSchedDiskThreads: DiskPingMonitorThread sched delay 910 > margin 750 cur_ms 58
7891584 lastalive 587890674
2010-06-25 11:31:04.975: [ CSSD][1135860032]clssnmvSchedDiskThreads: DiskPingMonitorThread sched delay 1000 > margin 750 cur_ms 5
87893674 lastalive 587892674
2010-06-25 11:31:05.186: [ CSSD][1230268736]clssnmSendingThread: sending status msg to all nodes
2010-06-25 11:31:05.186: [ CSSD][1230268736]clssnmSendingThread: sent 5 status msgs to all nodes
2010-06-25 11:31:07.509: [ CSSD][1219778880]clssnmPollingThread: node circxf1 (2) at 90% heartbeat fatal, removal in 2.260 second
s, seedhbimpd 1
2010-06-25 11:31:09.775: [ CSSD][1219778880]clssnmPollingThread: Removal started for node circxf1 (2), flags 0x3040c, state 3, wt
4c 0
2010-06-25 11:31:09.775: [ CSSD][1219778880]clssnmDiscHelper: circxf1, node(2) connection failed, endp (0x234), probe(0x100000000
), ninf->endp 0x234
2010-06-25 11:31:09.775: [ CSSD][1219778880]clssnmDiscHelper: node 2 clean up, endp (0x234), init state 5, cur state 5
2010-06-25 11:31:09.776: [GIPCXCPT][1219778880]gipcInternalDissociate: obj 0x2aaaac1fd770 [0000000000000234] { gipcEndpoint : localA
ddr 'gipc://circxf2:6371-bc6b-ae62-5b2e#192.168.201.2#40198', remoteAddr 'gipc://192.168.201.1:c9e1-2a70-5a06-f048#192.168.201.1#345
34', numPend 5, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x418a, pidPeer 0, flags 0x102616, usrFlags 0x0 } not asso
ciated with any container, ret gipcretFail (1)
2010-06-25 11:31:09.776: [GIPCXCPT][1219778880]gipcDissociateF [clssnmDiscHelper : clssnm.c : 3215]: EXCEPTION[ ret gipcretFail (1)
]   failed to dissociate obj 0x2aaaac1fd770 [0000000000000234] { gipcEndpoint : localAddr 'gipc://circxf2:6371-bc6b-ae62-5b2e#192.168
.201.2#40198', remoteAddr 'gipc://192.168.201.1:c9e1-2a70-5a06-f048#192.168.201.1#34534', numPend 5, numReady 0, numDone 0, numDead
0, numTransfer 0, objFlags 0x418a, pidPeer 0, flags 0x102616, usrFlags 0x0 }, flags 0x0
2010-06-25 11:31:09.776: [ CSSD][1240758592]clssnmDoSyncUpdate: Initiating sync 171478019
2010-06-25 11:31:09.776: [ CSSD][1240758592]clssscUpdateEventValue: NMReconfigInProgress   val 1, changes 7
2010-06-25 11:31:09.776: [ CSSD][1240758592]clssnmDoSyncUpdate: local disk timeout set to 27000 ms, remote disk timeout set to 27
000
2010-06-25 11:31:09.776: [ CSSD][1240758592]clssnmDoSyncUpdate: new values for local disk timeout and remote disk timeout will ta
ke effect when the sync is completed.
2010-06-25 11:31:09.776: [ CSSD][1240758592]clssnmDoSyncUpdate: Starting cluster reconfig with incarnation 171478019
2010-06-25 11:31:09.776: [ CSSD][1251248448]clssnmDiscEndp: gipcDestroy 0x234
2010-06-25 11:31:09.776: [ CSSD][1240758592]clssnmSetupAckWait: Ack message type (11)
2010-06-25 11:31:09.776: [ CSSD][1240758592]clssnmSetupAckWait: node(1) is ALIVE
2010-06-25 11:31:09.776: [ CSSD][1251248448]clssnmvSchedDiskThreads: DiskPingMonitorThread sched delay 800 > margin 750 cur_ms 58
7898474 lastalive 587897674
2010-06-25 11:31:09.776: [ CSSD][1240758592]clssnmSendSync: syncSeqNo(171478019), indicating EXADATA fence initialization complet
e
2010-06-25 11:31:09.776: [ CSSD][1240758592]List of nodes that have ACKed my sync: NULL
2010-06-25 11:31:09.776: [ CSSD][1240758592]clssnmSendSync: syncSeqNo(171478019)
2010-06-25 11:31:09.776: [ CSSD][1240758592]clssnmWaitForAcks: Ack message type(11), ackCount(1)
2010-06-25 11:31:09.776: [ CSSD][1251248448]clssnmHandleSync: Node circxf2, number 1, is EXADATA fence capable
2010-06-25 11:31:09.776: [ CSSD][1251248448]clssscUpdateEventValue: NMReconfigInProgress   val 1, changes 8
2010-06-25 11:31:09.776: [ CSSD][1251248448]clssnmHandleSync: local disk timeout set to 27000 ms, remote disk timeout set to 2700
0

可以看出来:

a节点重起是因为被b节点“evicted”驱逐。

b节点在重起之前一直是报" DiskPingMonitorThread sched delay 800 > margin 750 " "disk timeout "等错误,看来还是voting disk超时引起的。


3)测试了一下ocfs
crsctl stop crs之后,拔掉private的一个网线,在两个节点上 ls /oradata,都被hang住过了,很久(大概一分钟)之后启动一个节点的o2cb 被down掉
另一个节点上的ls /oradata也输出了结果。不难理解这段hang住的时间引起了b节点的voting disk心跳超时。

4)解决问题
clusterware和ocfs2的心跳都在private上,缩短clusterware心跳超时的时间,从而使a节点能够在心跳不通时早点重起,为ocfs2心跳超时提供时间,从而在
b节点voting disk超时之前ocfs2能够完成超时后的操作。(比较绕口)
具体:
crsctl set css misscount 4 将网络超时时间设为4秒,默认为30秒

再测试就符合正常的rac了:心跳断后一个节点重起

在活着的节点上操作系统日志中能看到:
、Jun 25 12:53:51 circxf2 nm-system-settings: ifcfg-rh:     error: Ignoring loopback device config.
Jun 25 12:53:51 circxf2 nm-system-settings: ifcfg-rh: parsing /etc/sysconfig/network-scripts/ifcfg-eth1 ...
Jun 25 12:53:51 circxf2 nm-system-settings: ifcfg-rh:     read connection 'System eth1'
Jun 25 12:53:52 circxf2 : error getting update info: Cannot retrieve repository metadata (repomd.xml) for repository: rhel-debuginfo. Please verify its path and try again
Jun 25 12:54:15 circxf2 kernel: (o2net,4411,8)2net_connect_expired:1664 ERROR: no connection established with node 0 after 30.0 seconds, giving up and returning errors.
Jun 25 12:54:15 circxf2 kernel: (oracle,6233,12):dlm_send_remote_convert_request:395 ERROR: status = -107
Jun 25 12:54:15 circxf2 kernel: (ls,7065,5):dlm_send_remote_convert_request:395 ERROR: status = -107
Jun 25 12:54:15 circxf2 kernel: (gnome-vfs-daemo,6963,6):dlm_do_master_request:1334 ERROR: link to 0 went down!
Jun 25 12:54:15 circxf2 kernel: (ls,7065,5):dlm_wait_for_node_death:370 7F6130E55B9042DAB83C4B89403AEF62: waiting 5000ms for notification of death of node 0
Jun 25 12:54:15 circxf2 kernel: (gnome-vfs-daemo,6963,6):dlm_get_lock_resource:917 ERROR: status = -107
Jun 25 12:54:15 circxf2 kernel: (ocssd.bin,5677,0):dlm_send_remote_convert_request:395 ERROR: status = -107
Jun 25 12:54:15 circxf2 kernel: (ocssd.bin,5677,0):dlm_wait_for_node_death:370 7F6130E55B9042DAB83C4B89403AEF62: waiting 5000ms for notification of death of node 0
Jun 25 12:54:15 circxf2 kernel: (crsd.bin,5754,5):dlm_send_remote_convert_request:395 ERROR: status = -107
Jun 25 12:54:15 circxf2 kernel: (crsd.bin,5754,5):dlm_wait_for_node_death:370 7F6130E55B9042DAB83C4B89403AEF62: waiting 5000ms for notification of death of node 0
Jun 25 12:54:15 circxf2 kernel: (oracle,6233,12):dlm_wait_for_node_death:370 7F6130E55B9042DAB83C4B89403AEF62: waiting 5000ms for notification of death of node 0
Jun 25 12:54:45 circxf2 kernel: (o2net,4411,8)2net_connect_expired:1664 ERROR: no connection established with node 0 after 30.0 seconds, giving up and returning errors.
Jun 25 12:54:45 circxf2 kernel: (ls,7065,5):dlm_send_remote_convert_request:395 ERROR: status = -107
Jun 25 12:54:45 circxf2 kernel: (ocssd.bin,5677,0):dlm_send_remote_convert_request:395 ERROR: status = -107
Jun 25 12:54:45 circxf2 kernel: (ocssd.bin,5677,0):dlm_wait_for_node_death:370 7F6130E55B9042DAB83C4B89403AEF62: waiting 5000ms for notification of death of node 0
Jun 25 12:54:45 circxf2 kernel: (oracle,6233,12):dlm_send_remote_convert_request:395 ERROR: status = -107
Jun 25 12:54:45 circxf2 kernel: (oracle,6233,12):dlm_wait_for_node_death:370 7F6130E55B9042DAB83C4B89403AEF62: waiting 5000ms for notification of death of node 0
Jun 25 12:54:45 circxf2 kernel: (ls,7065,5):dlm_wait_for_node_death:370 7F6130E55B9042DAB83C4B89403AEF62: waiting 5000ms for notification of death of node 0
Jun 25 12:54:45 circxf2 kernel: (crsd.bin,5754,3):dlm_send_remote_convert_request:395 ERROR: status = -107
Jun 25 12:54:45 circxf2 kernel: (crsd.bin,5754,3):dlm_wait_for_node_death:370 7F6130E55B9042DAB83C4B89403AEF62: waiting 5000ms for notification of death of node 0
Jun 25 12:55:15 circxf2 kernel: (o2net,4411,8)2net_connect_expired:1664 ERROR: no connection established with node 0 after 30.0 seconds, giving up and returning errors.
Jun 25 12:55:15 circxf2 kernel: (crsd.bin,5754,3):dlm_send_remote_convert_request:395 ERROR: status = -107
Jun 25 12:55:15 circxf2 kernel: (ocssd.bin,5677,0):dlm_send_remote_convert_request:395 ERROR: status = -107
Jun 25 12:55:15 circxf2 kernel: (oracle,6233,12):dlm_send_remote_convert_request:395 ERROR: status = -107
Jun 25 12:55:15 circxf2 kernel: (ls,7065,5):dlm_send_remote_convert_request:395 ERROR: status = -107
Jun 25 12:55:15 circxf2 kernel: (ocssd.bin,5677,0):dlm_wait_for_node_death:370 7F6130E55B9042DAB83C4B89403AEF62: waiting 5000ms for notification of death of node 0
Jun 25 12:55:15 circxf2 kernel: (ls,7065,5):dlm_wait_for_node_death:370 7F6130E55B9042DAB83C4B89403AEF62: waiting 5000ms for notification of death of node 0
Jun 25 12:55:15 circxf2 kernel: (oracle,6233,12):dlm_wait_for_node_death:370 7F6130E55B9042DAB83C4B89403AEF62: waiting 5000ms for notification of death of node 0
Jun 25 12:55:15 circxf2 kernel: (crsd.bin,5754,3):dlm_wait_for_node_death:370 7F6130E55B9042DAB83C4B89403AEF62: waiting 5000ms for notification of death of node 0
Jun 25 12:55:22 circxf2 kernel: (ls,7065,5):dlm_send_remote_convert_request:395 ERROR: status = -107
Jun 25 12:55:22 circxf2 kernel: (ocssd.bin,5677,0):dlm_send_remote_convert_request:395 ERROR: status = -107
Jun 25 12:55:22 circxf2 kernel: (oracle,6233,12):dlm_send_remote_convert_request:395 ERROR: status = -107
Jun 25 12:55:22 circxf2 kernel: (crsd.bin,5754,3):dlm_send_remote_convert_request:395 ERROR: status = -107
Jun 25 12:55:22 circxf2 kernel: (ocssd.bin,5677,0):dlm_wait_for_node_death:370 7F6130E55B9042DAB83C4B89403AEF62: waiting 5000ms for notification of death of node 0
Jun 25 12:55:22 circxf2 kernel: (crsd.bin,5754,3):dlm_wait_for_node_death:370 7F6130E55B9042DAB83C4B89403AEF62: waiting 5000ms for notification of death of node 0
Jun 25 12:55:22 circxf2 kernel: (oracle,6233,12):dlm_wait_for_node_death:370 7F6130E55B9042DAB83C4B89403AEF62: waiting 5000ms for notification of death of node 0
Jun 25 12:55:22 circxf2 kernel: (o2hb-C0D944067F,4560,12)cfs2_dlm_eviction_cb:98 device (8,17): dlm has evicted node 0
Jun 25 12:55:22 circxf2 kernel: (ls,7065,5):dlm_wait_for_node_death:370 7F6130E55B9042DAB83C4B89403AEF62: waiting 5000ms for notification of death of node 0
Jun 25 12:55:22 circxf2 kernel: (o2hb-C0D944067F,4560,12)cfs2_dlm_eviction_cb:98 device (8,33): dlm has evicted node 0
Jun 25 12:55:22 circxf2 kernel: (ocfs2rec,7118,0):dlm_get_lock_resource:844 7F6130E55B9042DAB83C4B89403AEF62:M0000000000000000000018dadde798: at least one node (0) to recover before lock mastery can begin
Jun 25 12:55:22 circxf2 kernel: (ocfs2rec,7119,12):dlm_get_lock_resource:844 C0D944067FF742A08C124432DE6B9C6F:M0000000000000000000018f871635a: at least one node (0) to recover before lock mastery can begin
Jun 25 12:55:23 circxf2 kernel: (ocfs2rec,7118,0):dlm_get_lock_resource:898 7F6130E55B9042DAB83C4B89403AEF62:M0000000000000000000018dadde798: at least one node (0) to recover before lock mastery can begin
Jun 25 12:55:23 circxf2 kernel: (ocfs2rec,7119,12):dlm_get_lock_resource:898 C0D944067FF742A08C124432DE6B9C6F:M0000000000000000000018f871635a: at least one node (0) to recover before lock mastery can begin
Jun 25 12:55:25 circxf2 kernel: (gnome-vfs-daemo,6963,6):dlm_restart_lock_mastery:1223 ERROR: node down! 0
Jun 25 12:55:25 circxf2 kernel: (gnome-vfs-daemo,6963,6):dlm_wait_for_lock_mastery:1040 ERROR: status = -11
Jun 25 12:55:25 circxf2 kernel: (dlm_reco_thread,4563,12):dlm_get_lock_resource:844 C0D944067FF742A08C124432DE6B9C6F:$RECOVERY: at least one node (0) to recover before lock mastery can begin
Jun 25 12:55:25 circxf2 kernel: (dlm_reco_thread,4563,12):dlm_get_lock_resource:878 C0D944067FF742A08C124432DE6B9C6F: recovery map is not empty, but must master $RECOVERY lock now
Jun 25 12:55:25 circxf2 kernel: (dlm_reco_thread,4563,12):dlm_do_recovery:524 (4563) Node 1 is the Recovery Master for the Dead Node 0 for Domain C0D944067FF742A08C124432DE6B9C6F
Jun 25 12:55:26 circxf2 kernel: (dlm_reco_thread,4552,3):dlm_get_lock_resource:844 7F6130E55B9042DAB83C4B89403AEF62:$RECOVERY: at least one node (0) to recover before lock mastery can begin
Jun 25 12:55:26 circxf2 kernel: (dlm_reco_thread,4552,3):dlm_get_lock_resource:878 7F6130E55B9042DAB83C4B89403AEF62: recovery map is not empty, but must master $RECOVERY lock now
Jun 25 12:55:26 circxf2 kernel: (dlm_reco_thread,4552,3):dlm_do_recovery:524 (4552) Node 1 is the Recovery Master for the Dead Node 0 for Domain 7F6130E55B9042DAB83C4B89403AEF62
Jun 25 12:55:34 circxf2 kernel: (ocfs2rec,7118,0)cfs2_replay_journal:1183 Recovering node 0 from slot 0 on device (8,17)
Jun 25 12:55:34 circxf2 kernel: (ocfs2rec,7119,12)cfs2_replay_journal:1183 Recovering node 0 from slot 0 on device (8,33)
Jun 25 12:55:36 circxf2 kernel: kjournald starting.   Commit interval 5 seconds
Jun 25 12:55:36 circxf2 kernel: kjournald starting.   Commit interval 5 seconds
Jun 25 12:55:39 circxf2 avahi-daemon[4943]: Registering new address record for 10.1.2.58 on eth0.
Jun 25 12:55:39 circxf2 avahi-daemon[4943]: Withdrawing address record for 10.1.2.58 on eth0.
Jun 25 12:55:39 circxf2 avahi-daemon[4943]: Registering new address record for 10.1.2.58 on eth0.

ocfs先等两个30秒+5000ms
然后Node 1(b节点)才开始recovery a节点的log(文件系统日志),完成之后/oradata才可用
这个时间如果b节点的voting 超时则b节点也会重起。

为了更保险起见,以保证voting disk超时之前ocfs的recovery能完成,这个时间也可以调短:
# service o2cb configure
Configuring the O2CB driver.
This will configure the on-boot properties of the O2CB driver.
The following questions will determine whether the driver is loaded on
boot.   The current values will be shown in brackets ('[]').   Hitting
without typing an answer will keep that current value.   Ctrl-C
will abort.
Load O2CB driver on boot (y/n) [y]:
Cluster stack backing O2CB [o2cb]:
Cluster to start on boot (Enter "none" to clear) [ocfs2]:
Specify heartbeat dead threshold (>=7) [31]: 16
Specify network idle timeout in ms (>=5000) [30000]: 15000
Specify network keepalive delay in ms (>=1000) [2000]:
Specify network reconnect delay in ms (>=2000) [2000]:
Writing O2CB configuration: OK
Cluster ocfs2 already online

至此,工作完成。
1)将ocfs2的心跳调整到了private上
2)缩短了crs的网络心跳超时时间。
3)缩短了ocfs 心跳的超时时间。
当然也可以通过延长voting disk的超时时间来完成,但是这样就有问题:
当ocfs超时之前,不光voting disk不能用,data file一类的都不能用,也就是说数据库会hang起来。

总结:
生产环境 ocfs还是要慎用啊

无论怎么设定,当某个节点hang住的时候,在ocfs超时之前 可用节点的ocfs文件系统也是挂起的!数据库当然也不能用
参与0

0同行回答

“答”则兼济天下,请您为题主分忧!

提问者

相关问题

相关资料

相关文章

问题状态

  • 发布时间:2011-01-20
  • 关注会员:0 人
  • 问题浏览:5456
  • X社区推广