HACMP双机到换故障,浮动IP资源抢占

1,/Var/adm/ras/errlog 文件如何查看,errpt -a -i errlog,显示为空,但文件ls显示1M多;
2,8月20日23:30:43左右,服务器1的oracle写入数据文件,但发现block错误,23:47左右,导致HACMP切换。但浮动IP资源没有释放。这里有个问题,是oracle找不到磁阵,写不了数据文件先触发hacmp,还是其他原因触发的。两台数据库的hacmp.out错误,alter.log故障时间点都是在23:30以后的。现在只能看errlog,但errlog打开不了。

-----------------------
主服务器的hacmp.out
rg_zxin:clstop[206] cl_echo 208 /usr/sbin/cluster/utilities/clstop: called with flags -y -N -grnn /usr/sbin/cluster/utilities/clstop -y -N -gr
rg_zxin:cl_echo[49] version=1.13
rg_zxin:cl_echo[98] HACMP_OUT_FILE=/tmp/hacmp.out
Aug 20 2010 23:30:47 /usr/sbin/cluster/utilities/clstop: called with flags -y -N -gr
rg_zxin:clstop[208] getopt fgrsyNRBS -y -N -gr
rg_zxin:clstop[208] set -- -y -N -g -r --
rg_zxin:clstop[214] [[ -y != -- ]]
rg_zxin:clstop[236] no_prompt=yes
rg_zxin:clstop[251] shift
——————————————
主服务器的alter.log,读存储
Fri Aug 20 08:48:28 2010
Thread 1 advanced to log sequence 6409
  Current log# 1 seq# 6409 mem# 0: /zxindata/oracle/redolog/redo01.log
Fri Aug 20 23:30:43 2010
KCF: write/open error block=0xba43 online=1
     file=3 /zxindata/oracle/system/rbs.dbf
     error=27063 txt: 'IBM AIX RISC System/6000 Error: 5: I/O error
Additional information: -1
Additional information: 4096'
Fri Aug 20 23:30:45 2010
Errors in file /home/oracle/zxindbf/admin/zxin/bdump/lgwr_42908_zxin.trc:
ORA-00345: redo log write error block 499397 count 2
ORA-00312: online log 1 thread 1: '/zxindata/oracle/redolog/redo01.log'
ORA-27063: skgfospo: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 16: Device busy
Additional information: -1
Additional information: 1024
Fri Aug 20 23:30:45 2010
Errors in file /home/oracle/zxindbf/admin/zxin/bdump/lgwr_42908_zxin.trc:
ORA-00340: IO error processing online log 1 of thread 1
ORA-00345: redo log write error block 499397 count 2
ORA-00312: online log 1 thread 1: '/zxindata/oracle/redolog/redo01.log'
ORA-27063: skgfospo: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 16: Device busy
————————————————————————————————
备服务器的alter.log
Sat Dec 23 11:21:15 2006
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
LICENSE_MAX_USERS = 0
Starting up ORACLE RDBMS Version: 8.1.7.4.0.
System parameters with non-default values:
————————————————————————————
备服务器hacmp.out
+ [[ high = high ]]
+ version=1.2
+ + cl_get_path
HA_DIR=es
+ STATUS=0
+ set +u
+ [ ]
+ exit 0
   HACMP Event Summary
Event: /usr/es/sbin/cluster/events/check_for_site_down scp2
Start time: Fri Aug 20 23:31:22 2010
End time: Fri Aug 20 23:31:22 2010
Action:  Resource:   Script Name:
----------------------------------------------------------------------------
No resources changed as a result of this event
----------------------------------------------------------------------------
Aug 20 23:31:22 EVENT START: node_down scp2
参与35

35同行回答

zhuangrfzhuangrf系统架构师福建新大陆
怎么看errlog文件,大家有没有招啊?显示全部
怎么看errlog文件,大家有没有招啊?收起
电信运营商 · 2010-09-14
浏览892
zhuangrfzhuangrf系统架构师福建新大陆
scp1@:[/home/zxin10]$lsattr -El aio0minservers 1         MINIMUM number of servers                Truemaxservers 10        MAXIMUM number of...显示全部
scp1@:[/home/zxin10]$lsattr -El aio0
minservers 1         MINIMUM number of servers                True
maxservers 10        MAXIMUM number of servers                True
maxreqs    4096      Maximum number of REQUESTS               True
kprocprio  39        Server PRIORITY                          True
autoconfig available STATE to be configured at system restart True
fastpath   enable    State of fast path                       True收起
电信运营商 · 2010-09-14
浏览936
zhuangrfzhuangrf系统架构师福建新大陆
比如:切换时,先要停应用吧,释放资源吧!最后才释放IP吧!接管机器:得到IP吧,得到资源吧!启动应用吧!问题是这些应用真的完全停了吗?没停,资源当然释放不了!没停怎么办?一刀砍死它,切底洗白!后面资源自然轻松释放,因 ...hubb-1 发表于 2010-9-14 11:23 分析不错!谢谢!...显示全部
比如:切换时,先要停应用吧,释放资源吧!最后才释放IP吧!
接管机器:得到IP吧,得到资源吧!启动应用吧!

问题是这些应用真的完全停了吗?没停,资源当然释放不了!
没停怎么办?一刀砍死它,切底洗白!后面资源自然轻松释放,因 ...
hubb-1 发表于 2010-9-14 11:23

分析不错!谢谢!收起
电信运营商 · 2010-09-14
浏览1356
zhuangrfzhuangrf系统架构师福建新大陆
不奇怪的!双机切换常有的事!呵呵网络时差,切换脚本等等!hubb-1 发表于 2010-9-14 11:20 问题是这个IP冲突了一天一夜,但业务使用正常。最后用户想登录浮动IP登录不了,才发现 问题。这个又产生一个问题,浮动IP冲突,为什么业务连接还是正常。...显示全部
不奇怪的!双机切换常有的事!呵呵
网络时差,切换脚本等等!
hubb-1 发表于 2010-9-14 11:20

问题是这个IP冲突了一天一夜,但业务使用正常。
最后用户想登录浮动IP登录不了,才发现 问题。
这个又产生一个问题,浮动IP冲突,为什么业务连接还是正常。收起
电信运营商 · 2010-09-14
浏览1365
zhuangrfzhuangrf系统架构师福建新大陆
/home/oracle/zxindbf/admin/zxin/bdump/lgwr_42908_zxin.trcOracle8i Enterprise Edition Release 8.1.7.4.0 - ProductionWith the Partitioning optionJServer Release 8.1.7.4.0 - ProductionORACLE_HOME = /home/oracle/oracle81System name:      &...显示全部
/home/oracle/zxindbf/admin/zxin/bdump/lgwr_42908_zxin.trc
Oracle8i Enterprise Edition Release 8.1.7.4.0 - Production
With the Partitioning option
JServer Release 8.1.7.4.0 - Production
ORACLE_HOME = /home/oracle/oracle81
System name:        AIX
Node name:        scp2
Release:        1
Version:        5
Machine:        000F17AC4C00
Instance name: zxin
Redo thread mounted by this instance: 1
Oracle process number: 6
Unix process pid: 42908, image: oracle@scp2 (LGWR)

*** SESSION ID:(5.1) 2010-08-20 23:30:45.915
*** 2010-08-20 23:30:45.915
ksedmp: internal or fatal error
ORA-00345: redo log write error block 499397 count 2
ORA-00312: online log 1 thread 1: '/zxindata/oracle/redolog/redo01.log'
ORA-27063: skgfospo: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 16: Device busy
Additional information: -1
Additional information: 1024
----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
ksedmp+00cc          bl       ksedst               1 ?
kcrfwcint+02a4       bl       ksedmp               159 ?
kcrfwint+014c        bl       kcrfwcint            1 ?
ksbcti+00ac          bl       _ptrgl               
ksbabs+0260          bl       ksbcti               0 ? 0 ?
ksbrdp+022c          bl       _ptrgl               
opirip+0260          bl       ksbrdp               
opidrv+0424          bl       opirip               F0F1E188 ? 0 ? 0 ?
sou2o+0028           bl       opidrv               32 ? 0 ? 0 ?
main+00d4            bl       sou2o                A0 ? FFFFFFF7 ? 2FF227A0 ?
                                                   0 ?
__start+0088         bl       main                 1 ? 2FF22948 ?
----- Argument/Register Address Dump -----
Argument/Register addr=1.  
Dump of memory from 0x0 to 0x101
000 00000000 00000000 00000000 00000000  [................]
        Repeat 15 times
100 60000000                             [`...]            
Argument/Register addr=159.  
Dump of memory from 0x119 to 0x259
110                   78214002 7C0004AC          [x!@.|...]
120 7C34FBA6 4C00012C 7C3FFAA6 78211F24  [|4..L..,|?..x!.$]
130 642107D1 60215000 E8210000 F8010000  [d!..`!P..!......]
140 7C0802A6 F8010100 80000190 7C0803A6  [|...........|...]
150 4E800021 38610000 7C3FFAA6 78211F24  [N..!8a..|?..x!.$]
160 642107D1 60215800 E8210000 F861FFF8  [d!..`!X..!...a..]
170 3821FF00 80600184 7C6803A6 8040018C  [8!...`..|h...@..]
180 4E800020 07D1492C 07D0F000 07D9E384  [N.. ..I,........]
190 07D14C54 07DB6188 00007FF8 31080004  [..LT..a.....1...]
1A0 4BFFFFE4 82270034 76250800 3CE05590  [K....'.4v%..<.U.]
1B0 4082003C 3CA08204 60A50080 7CAF01A4  [@..<<...`...|...]
1C0 3D00F0A0 7CA000A6 60A60010 7CC00124  [=...|...`...|..$]
1D0 4C00012C 90E80300 7CA00124 7CA52A78  [L..,....|..$|.*x]
1E0 7CAF01A4 4C00012C 7C0004AC 48000000  [|...L..,|...H...]
1F0 005A363C 802D00C0 804D00C4 7C42682E  [.Z6<.-...M..|Bh.]
200 60000000 7C3243A6 7C34FAA6 7821C000  [`...|2C.|4..x!..]
210 7821AAAC 782107A4 78214002 7C0004AC  [x!..x!..x!@.|...]
220 7C34FBA6 4C00012C 7C3FFAA6 78211F24  [|4..L..,|?..x!.$]
230 642107D1 60215000 E8210000 F8010000  [d!..`!P..!......]
240 7C0802A6 F8010100 80000290 7C0803A6  [|...........|...]
250 4E800021 38610000 7C3FFAA6           [N..!8a..|?..]   
Argument/Register addr=f0f1e188.  
Dump of memory from 0xF0F1E148 to 0xF0F1E288
F0F1E140                   7573722F 6363732F          [usr/ccs/]
F0F1E150 6C69622F 6C696263 2F61626F 72742E63  [lib/libc/abort.c]
F0F1E160 2C206C69 62637072 6F632C20 626F7335  [, libcproc, bos5]
F0F1E170 31302035 2F32322F 39372031 373A3138  [10 5/22/97 17:18]
F0F1E180 3A303900 00000000 61000000 0A000000  [:09.....a.......]
F0F1E190 0A000000 2C000000 3F3F0000 332E3100  [....,...??..3.1.]
F0F1E1A0 00000000 00000000 00000000 00000006  [................]
F0F1E1B0 00000000 00000000 00000000 00000000  [................]
F0F1E1C0 00000000 00000008 00000000 00000400  [................]
F0F1E1D0 00000200 00000020 00000010 00000000  [....... ........]
F0F1E1E0 00000000 00000000 00000000 00000000  [................]
F0F1E1F0 00000000 00000000 00000000 00000008  [................]
F0F1E200 00000064 00000000 74727565 00000000  [...d....true....]
F0F1E210 74727565 00000000 64656275 67000000  [true....debug...]
F0F1E220 64656275 67000000 75736572 3A000000  [debug...user:...]
F0F1E230 75736572 3A000000 75736572 3A000000  [user:...user:...]
F0F1E240 75736572 3A000000 6561726C 79000000  [user:...early...]
F0F1E250 2073697A 653D0000 68656170 3D000000  [ size=..heap=...]
F0F1E260 7374646F 75740000 73746465 72720000  [stdout..stderr..]
F0F1E270 73746465 72720000 0A457272 6E6F0000  [stderr...Errno..]
F0F1E280 73746465 72720000                    [stderr..]        
Argument/Register addr=a0.  
Dump of memory from 0x60 to 0x1A0
060 00000000 00000000 00000000 00000000  [................]
        Repeat 9 times
100 60000000 7C3243A6 7C34FAA6 7821C000  [`...|2C.|4..x!..]
110 7821AAAC 782107A4 78214002 7C0004AC  [x!..x!..x!@.|...]
120 7C34FBA6 4C00012C 7C3FFAA6 78211F24  [|4..L..,|?..x!.$]
130 642107D1 60215000 E8210000 F8010000  [d!..`!P..!......]
140 7C0802A6 F8010100 80000190 7C0803A6  [|...........|...]
150 4E800021 38610000 7C3FFAA6 78211F24  [N..!8a..|?..x!.$]
160 642107D1 60215800 E8210000 F861FFF8  [d!..`!X..!...a..]
170 3821FF00 80600184 7C6803A6 8040018C  [8!...`..|h...@..]
180 4E800020 07D1492C 07D0F000 07D9E384  [N.. ..I,........]
190 07D14C54 07DB6188 00007FF8 31080004  [..LT..a.....1...]
Argument/Register addr=2ff227a0.  
Dump of memory from 0x2FF22760 to 0x2FF228A0
2FF22760 2FF227A0 00000000 2000086C F1065440  [/.'..... ..l..T@]
2FF22770 2FF227B0 F1065E34 D015D594 00000000  [/.'...^4........]
2FF22780 00000000 00000000 20000878 00000001  [........ ..x....]
2FF22790 00000000 00000000 F0FFE3E4 F0FFE3D4  [................]
2FF227A0 2FF227F0 22228441 D01FFC00 F0F1B5F8  [/.'."".A........]
2FF227B0 2FF227F0 F0F58600 D0200200 00000000  [/.'...... ......]
2FF227C0 00000000 F0F58600 00000000 00000000  [................]
2FF227D0 00000000 00000000 00000000 00000000  [................]
2FF227E0 00000009 00000000 2000086C 2FF22890  [........ ..l/.(.]
2FF227F0 2FF22900 00000000 D01FFB5C 00000000  [/.)........\....]
2FF22800 00000000 F0F58600 00000000 00000000  [................]
2FF22810 00000000 00000000 00000000 00000000  [................]
        Repeat 1 times
2FF22830 00010000 00000010 00000009 00000000  [................]
2FF22840 10000000 200007B8 000F4618 00000002  [.... .....F.....]
2FF22850 D01D8780 F0F130E0 00091DE8 00010002  [......0.........]
2FF22860 00000000 00001000 00062318 00020002  [..........#.....]
2FF22870 D0000000 F0FFB000 00005020 00030003  [..........P ....]
2FF22880 D015A4C0 F1064FB0 00001370 00040003  [......O....p....]
2FF22890 D014C0F8 203030A8 20000878 00000000  [.... 00. ..x....]
Argument/Register addr=2ff22948.  
Dump of memory from 0x2FF22908 to 0x2FF22A48
2FF22900                   1000022C 00000000          [...,....]
2FF22910 00000000 200CCFA4 00000001 2FF22948  [.... ......./.)H]
2FF22920 00000000 00000000 00000000 00000000  [................]
        Repeat 1 times
2FF22940 00000000 00000000 2FF229F8 00000000  [......../.).....]
2FF22950 2FF22A06 2FF22A2A 2FF22A35 2FF22A42  [/.*./.**/.*5/.*B]
2FF22960 2FF22A56 2FF22A7B 2FF22B19 2FF22B44  [/.*V/.*{/.+./.+D]
2FF22970 2FF22B65 2FF22B76 2FF22B9B 2FF22BBC  [/.+e/.+v/.+./.+.]
2FF22980 2FF22C45 2FF22C54 2FF22C70 2FF22C80  [/.,E/.,T/.,p/.,.]
2FF22990 2FF22C99 2FF22CC0 2FF22CCC 2FF22CF2  [/.,./.,./.,./.,.]
2FF229A0 2FF22D02 2FF22D1B 116282E0 2FF22D43  [/.-./.-..b../.-C]
2FF229B0 2FF22D52 2FF22D6D 2FF22DA4 2FF22DB6  [/.-R/.-m/.-./.-.]
2FF229C0 2FF22DE9 2FF22DF3 2FF22E0F 2FF22E31  [/.-./.-./.../..1]
2FF229D0 2FF22E4D 2FF22E5E 2FF22E85 2FF22E90  [/..M/..^/.../...]
2FF229E0 2FF22EB3 2FF22ED2 2FF22EE1 2FF22EF4  [/.../.../.../...]
2FF229F0 2FF22F34 00000000 6F72615F 6C677772  [/./4....ora_lgwr]
2FF22A00 5F7A7869 6E005F3D 2F686F6D 652F6F72  [_zxin._=/home/or]
2FF22A10 61636C65 2F6F7261 636C6538 312F6269  [acle/oracle81/bi]
2FF22A20 6E2F7371 6C706C75 73004C41 4E473D7A  [n/sqlplus.LANG=z]
2FF22A30 685F434E 004C4F47 494E3D6F 7261636C  [h_CN.LOGIN=oracl]
2FF22A40 65004E4C 535F5445                    [e.NLS_TE]        
----- End of Call Stack Trace -----
===================================================
Files currently opened by this process:
===================================================
PROCESS STATE
-------------
Process global information:
     process: bc09bc5c, call: bc104238, xact: 0, curses: bc0b90f4, usrses: bc0b90f4
  ----------------------------------------
  SO: bc09bc5c, type: 1, owner: 0, pt: 0, flag: INIT/-/-/0x00
  (process) Oracle pid=6, calls cur/top: bc104238/bc104238, flag: (2) SYSTEM
            int error: 0, call error: 0, sess error: 0, txn error 0
  (post info) last post received: 0 0 13
              last post received-location: ksasnd
              last process to post me: bc09ceec 2 0
              last post sent: 85 0 4
              last post sent-location: kslpsr
              last process posted by me: bc09ceec 2 0
    (latch info) wait_event=0 bits=0
    Process Group: DEFAULT, pseudo proc: bc0b3620
    O/S info: user: oracle, term: UNKNOWN, ospid: 42908
    OSD pid info: 42908
    ----------------------------------------
    SO: bc1510d0, type: 4, owner: bc09bc5c, pt: 0, flag: INIT/-/-/0x00
    (enqueue) RT-00000001-00000000        DID: 0000-0006-00000003
    lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    res: bc151220, mode: X, prv: bc151228, sess: bc0b90f4, proc: bc09bc5c
    ----------------------------------------
    SO: bc16a97c, type: 9, owner: bc09bc5c, pt: 0, flag: INIT/-/-/0x00
    (broadcast handle) flag: (2) ACTIVE SUBSCRIBER, owner: bc09bc5c,
                       event: 5, last message event: 10, messages read: 1
                       channel: (bc16b034) scumnt mount lock
                                scope: 101, event: 10, last mesage event: 10,
                                publishers/subscribers: 0/8,
                                messages published: 1
    ----------------------------------------
    SO: bc0b90f4, type: 3, owner: bc09bc5c, pt: 0, flag: INIT/-/-/0x00
    (session) trans: 0, creator: bc09bc5c, flag: (51) USR/- BSY/-/-/-/-/-
              DID: 0001-0006-00000004, short-term DID: 0000-0000-00000000
              txn branch: 0
              oct: 0, prv: 0, user: 0/SYS
    last wait for 'log file parallel write' blocking sess=0x0 seq=16030 wait_time=-2
                files=1, blocks=2, requests=1
    ----------------------------------------
    SO: bc16a9f4, type: 9, owner: bc09bc5c, pt: 0, flag: INIT/-/-/0x00
    (broadcast handle) flag: (2) ACTIVE SUBSCRIBER, owner: bc09bc5c,
                       event: 6, last message event: 6, messages read: 0
                       channel: (bc16af54) system events broadcast channel
                                scope: 101, event: 4673, last mesage event: 0,
                                publishers/subscribers: 0/29,
                                messages published: 0
    ----------------------------------------
    SO: bc104238, type: 2, owner: bc09bc5c, pt: 0, flag: INIT/-/-/0x00
    (call) sess: cur bc0b90f4, rec 0, usr bc0b90f4; depth: 0
===================================================
CURRENT SESSION'S INSTANTIATION STATE
-------------------------------------
current session=bc0b90f4
********************   Cursor Dump   ************************
Current cursor: 0, pgadep: 0
Cursor Dump:
End of cursor dump
END OF PROCESS STATE
error 340 detected in background process
ORA-00340: IO error processing online log 1 of thread 1
ORA-00345: redo log write error block 499397 count 2
ORA-00312: online log 1 thread 1: '/zxindata/oracle/redolog/redo01.log'
ORA-27063: skgfospo: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 16: Device busy
Additional information: -1
Additional information: 1024收起
电信运营商 · 2010-09-14
浏览1538
zhuangrfzhuangrf系统架构师福建新大陆
顺便补充了一下,由于没有释放浮动IP,听我的同事说,8月21日看errpt,还是有日志的。而且都是属于浮动IP地址冲突错误。我打开errlog,虽然是乱码,不过也可以看到IP冲突的错误信息显示全部
顺便补充了一下,由于没有释放浮动IP,听我的同事说,8月21日看errpt,还是有日志的。而且都是属于浮动IP地址冲突错误。我打开errlog,虽然是乱码,不过也可以看到IP冲突的错误信息收起
电信运营商 · 2010-09-14
浏览1675
zhuangrfzhuangrf系统架构师福建新大陆
只有思路,没有答案!    任何切换,第一步应该先得到IP!然后才是相关应用的接管!    对于引起切换的原因,我不懂,就不好说了!网络及应用 均可!因HACMP不熟    至于切换后,没得到IP,因此后面的接管或多或少有点问题 ...hubb-1 发表于 2010-9-...显示全部
只有思路,没有答案!

    任何切换,第一步应该先得到IP!然后才是相关应用的接管!
    对于引起切换的原因,我不懂,就不好说了!网络及应用 均可!因HACMP不熟

    至于切换后,没得到IP,因此后面的接管或多或少有点问题 ...
hubb-1 发表于 2010-9-14 11:11


现在不是说得不到IP,而是原来的主数据库服务器出问题了,HACMP切换了。而故障服务器还是没有释放浮动IP,这个就奇怪。
对于触发该hacmp到换的原因也是一个问题。收起
电信运营商 · 2010-09-14
浏览1642
zhuangrfzhuangrf系统架构师福建新大陆
试过了。空空显示全部
试过了。空空收起
电信运营商 · 2010-09-14
浏览1510

    提问者

    zhuangrf
    系统架构师福建新大陆
    擅长领域: 私有云云计算主机

    相关问题

    相关资料

    问题状态

  • 发布时间:2010-09-14
  • 关注会员:0 人
  • 问题浏览:27887
  • 最近回答:2011-11-01
  • X社区推广