AIX 6.1db2 9.7
问题:
db2 connect无法连接,无法stop。等的时间太长就直接下电关机了。可是开机之后,数据库依然无法连接,查看了db2diag.log发现在做crash recovery,可是又等了很久很久。。为了尽快恢复业务,16:29左右停掉了它的后端应用。(总共有4个数据库,其它三个正常。但是前端应用却只连接着这个坏的数据库,对其它三个正常的数据库无法建立连接,于是就停掉了该数据库的后端应用之后,前端应用就恢复了对其它三个数据库的连接)
但是,crash recovery貌似也停掉了。
求大神帮忙看看。。。
db2diag.log2015-10-12-14.14.25.798061+480 I122014100A511 LEVEL: Event
PID : 6422530 TID : 11309 PROC : db2sysc 0
INSTANCE: rminst11 NODE : 000 DB : RMDB11
APPHDL : 0-7 APPID: 10.201.251.3.49995.151012061425
AUTHID : RMADMIN
EDUID : 11309 EDUNAME: db2agent (RMDB11) 0
FUNCTION: DB2 UDB, config/install, sqlfLogUpdateCfgParam, probe:20
CHANGE : CFG DB RMDB11 : \"Database_memory\" From: \"2771232\" To: \"2776608\"
2015-10-12-14.14.26.997106+480 I122014612A440 LEVEL: Warning
PID : 6422530 TID : 11309 PROC : db2sysc 0
INSTANCE: rminst11 NODE : 000 DB : RMDB11
APPHDL : 0-7 APPID: 10.201.251.3.49995.151012061425
AUTHID : RMADMIN
EDUID : 11309 EDUNAME: db2agent (RMDB11) 0
FUNCTION: DB2 UDB, base sys utilities, sqledint, probe:30
MESSAGE : Crash Recovery is needed.
2015-10-12-14.14.33.280989+480 E122015053A470 LEVEL: Event
PID : 6422530 TID : 11309 PROC : db2sysc 0
INSTANCE: rminst11 NODE : 000 DB : RMDB11
APPHDL : 0-7 APPID: 10.201.251.3.49995.151012061425
AUTHID : RMADMIN
EDUID : 11309 EDUNAME: db2agent (RMDB11) 0
FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::FirstConnect, probe:1000
START : DATABASE: RMDB11 : ACTIVATED: NO
2015-10-12-14.14.33.457553+480 I122015524A503 LEVEL: Warning
PID : 6422530 TID : 11309 PROC : db2sysc 0
INSTANCE: rminst11 NODE : 000 DB : RMDB11
APPHDL : 0-7 APPID: 10.201.251.3.49995.151012061425
AUTHID : RMADMIN
EDUID : 11309 EDUNAME: db2agent (RMDB11) 0
FUNCTION: DB2 UDB, recovery manager, sqlpresr, probe:410
MESSAGE : Crash recovery started. LowtranLSN 00000D25FF65A9D6 MinbuffLSN
00000D25FF65A9D6
2015-10-12-14.14.33.458044+480 E122016028A457 LEVEL: Warning
PID : 6422530 TID : 11309 PROC : db2sysc 0
INSTANCE: rminst11 NODE : 000 DB : RMDB11
APPHDL : 0-7 APPID: 10.201.251.3.49995.151012061425
AUTHID : RMADMIN
EDUID : 11309 EDUNAME: db2agent (RMDB11) 0
FUNCTION: DB2 UDB, recovery manager, sqlpresr, probe:410
MESSAGE : ADM1530E Crash recovery has been initiated.
2015-10-12-14.14.33.574659+480 I122016486A498 LEVEL: Warning
PID : 6422530 TID : 11309 PROC : db2sysc 0
INSTANCE: rminst11 NODE : 000 DB : RMDB11
APPHDL : 0-7 APPID: 10.201.251.3.49995.151012061425
AUTHID : RMADMIN
EDUID : 11309 EDUNAME: db2agent (RMDB11) 0
FUNCTION: DB2 UDB, recovery manager, sqlprecm, probe:2000
DATA #1 :
Using parallel recovery with 7 agents 12 QSets 72 queues and 8 chunks
2015-10-12-14.14.34.622121+480 I122016985A471 LEVEL: Warning
PID : 6422530 TID : 30328 PROC : db2sysc 0
INSTANCE: rminst11 NODE : 000
EDUID : 30328 EDUNAME: db2lfr (RMDB11) 0
FUNCTION: DB2 UDB, recovery manager, sqlplfrFMReadLog, probe:5150
MESSAGE : Found a log on a newer chain. Updating chain number. extNum /
chainId
DATA #1 : unsigned integer, 4 bytes
34493
DATA #2 : unsigned integer, 4 bytes
3
2015-10-12-16.29.04.081257+480 I122017457A539 LEVEL: Error
PID : 6422530 TID : 11309 PROC : db2sysc 0
INSTANCE: rminst11 NODE : 000 DB : RMDB11
APPHDL : 0-7 APPID: 10.201.251.3.49995.151012061425
AUTHID : RMADMIN
EDUID : 11309 EDUNAME: db2agent (RMDB11) 0
FUNCTION: DB2 UDB, common communication, sqlcctcptest, probe:11
MESSAGE : Detected client termination
DATA #1 : Hexdump, 2 bytes
0x07000000053F3E42 : 0036 .6
2015-10-12-16.29.04.177951+480 I122017997A521 LEVEL: Error
PID : 6422530 TID : 11309 PROC : db2sysc 0
INSTANCE: rminst11 NODE : 000 DB : RMDB11
APPHDL : 0-7 APPID: 10.201.251.3.49995.151012061425
AUTHID : RMADMIN
EDUID : 11309 EDUNAME: db2agent (RMDB11) 0
FUNCTION: DB2 UDB, common communication, sqlcctest, probe:50
MESSAGE : sqlcctest RC
DATA #1 : Hexdump, 2 bytes
0x07000000053F5F30 : 0036 .6
2015-10-12-16.29.04.178230+480 I122018519A513 LEVEL: Info
PID : 6422530 TID : 11309 PROC : db2sysc 0
INSTANCE: rminst11 NODE : 000 DB : RMDB11
APPHDL : 0-7 APPID: 10.201.251.3.49995.151012061425
AUTHID : RMADMIN
EDUID : 11309 EDUNAME: db2agent (RMDB11) 0
FUNCTION: DB2 UDB, base sys utilities, sqeAgent::AgentBreathingPoint, probe:5
DATA #1 : String, 65 bytes
Client Connection is gone. However, Crash Recovery will continue.
$ db2 list utilities show detail
ID = 1
Type = CRASH RECOVERY
Database Name = RMDB11
Partition Number = 0
Description = Crash Recovery
Start Time = 10/12/2015 14:14:33.663091
State = Executing
Invocation Type = User
Progress Monitoring:
Estimated Percentage Complete = 0
Phase Number [Current] = 1
Description = Forward
Total Work = 2374817338 bytes
Completed Work = 5810120 bytes
Start Time = 10/12/2015 14:14:33.818838
Phase Number = 2
Description = Backward
Total Work = 2374817338 bytes
Completed Work = 0 bytes
Start Time = Not Started
现在解决了。
crash recovery完成后,
db2diag.log报badpage;
数据库中有一个表空间状态是0x0040;
aix报多个PV IO error;
报错的那些pv,在磁盘阵列中报Unreadable sectors detected;
手动rollforward 夯住;
cancel rollforward之后restore db,也夯住;
所以可以确定硬件故障导致磁盘无法读写,最后在磁盘阵列中,重新划了lun挂上来,将数据库恢复到了新盘中。
收起crash recovery完了,可是现在restore的时候,恢复到26.6G就不动了。昨天也是一样,等了一天没反应,要哭了。。。
tsm: TSM>q sess Sess Comm. Sess Wait Bytes Bytes Sess Platform Client Name Number Method State Time Sent Recvd Type ------ ------ ------ ------ ------- ------- ----- -------- -------------------- 46,472 Tcp/Ip SendW 18.8 M 26.6 G 518 Node DB2/AIX- ECMRM1 64
$ db2 list utilities show detail ID = 1 Type = RESTORE Database Name = RMDB11 Partition Number = 0 Description = db Start Time = 10/14/2015 07:39:08.290599 State = Executing Invocation Type = User Progress Monitoring: Completed Work = 28578578432 bytes Start Time = 10/14/2015 07:39:08.309023 $收起