最近一套OLTP交易系统反映在4日00:30-00:32有大量交易超时,之后自动就恢复了
数据库版本 :DB2 9.7 FP8
OS版本:AIX 7102
查看了db2diag日志,看到00:30.03数据库发起了online backup,但奇怪的是实际备份从2017-06-04-00.32开始,而期间差不多2分钟时间正好是应用异常的时间。
求大家帮忙分析,谢谢!
==db2diag日志===
2017-06-04-00.30.03.618468+480 E16655092A480 LEVEL: Info
PID : 33751414 TID : 95608 PROC : db2sysc 0
INSTANCE: ecifinst NODE : 000 DB : ECIFDB
APPHDL : 0-20396 APPID: *LOCAL.ecifinst.170607185914
AUTHID : ECIFINST
EDUID : 95608 EDUNAME: db2agent (ECIFDB) 0
FUNCTION: DB2 UDB, database utilities, sqluxGetDegreeParallelism, probe:774
DATA #1 :
Autonomic BAR - using parallelism = 10.
2017-06-04-00.32.37.371608+480 E16655573A516 LEVEL: Info
PID : 33751414 TID : 95608 PROC : db2sysc 0
INSTANCE: ecifinst NODE : 000 DB : ECIFDB
APPHDL : 0-20396 APPID: *LOCAL.ecifinst.170607185914
AUTHID : ECIFINST
EDUID : 95608 EDUNAME: db2agent (ECIFDB) 0
FUNCTION: DB2 UDB, database utilities, sqluxGetAvailableHeapPages, probe:888
DATA #1 :
Autonomic BAR - heap consumption.
Targetting (50%) - 25560 of 51120 pages.
2017-06-04-00.32.37.372087+480 E16656090A506 LEVEL: Info
PID : 33751414 TID : 95608 PROC : db2sysc 0
INSTANCE: ecifinst NODE : 000 DB : ECIFDB
APPHDL : 0-20396 APPID: *LOCAL.ecifinst.170607185914
AUTHID : ECIFINST
EDUID : 95608 EDUNAME: db2agent (ECIFDB) 0
FUNCTION: DB2 UDB, database utilities, sqlubTuneBuffers, probe:1139
DATA #1 :
Autonomic backup - tuning enabled.
Using buffer size = 1024, number = 10.
2017-06-04-00.32.37.857664+480 E16656597A451 LEVEL: Info
PID : 33751414 TID : 95608 PROC : db2sysc 0
INSTANCE: ecifinst NODE : 000 DB : ECIFDB
APPHDL : 0-20396 APPID: *LOCAL.ecifinst.170607185914
AUTHID : ECIFINST
EDUID : 95608 EDUNAME: db2agent (ECIFDB) 0
FUNCTION: DB2 UDB, database utilities, sqlubSetupJobControl, probe:1604
MESSAGE : Starting an online db backup.
==db2pd -latch信息==
该数据库部了个脚本会定时收集db快照,app快照,db2pd -latch,app,tra,age,api等信息,其中比较可疑的是在00:30的数据中出现了大量的latch
Database Partition 0 -- Active -- Up 6 days 15:11:19 -- Date 2017-06-04-00.30.50.219305
Latches:
Address Holder Waiter Filename LOC LatchType HoldCount
0x0A00030C49764788 6684 0 sqlbpacc.C 2455 SQLO_LT_SQLB_POOL_CB__readLatch 256
0x0A00030000068A20 6684 0 sqlbilatch.C 1411 SQLO_LT_SQLB_POOL_MAP_CB__range_latch 256
0x0A00030C49764788 6941 0 sqlbpacc.C 2455 SQLO_LT_SQLB_POOL_CB__readLatch 256
0x0A00030000068A20 6941 0 sqlbilatch.C 1411 SQLO_LT_SQLB_POOL_MAP_CB__range_latch 256
0x0A00030C49764788 7198 0 sqlbpacc.C 2455 SQLO_LT_SQLB_POOL_CB__readLatch 256
0x0A00030000068A20 7198 0 sqlbilatch.C 1411 SQLO_LT_SQLB_POOL_MAP_CB__range_latch 256
0x0A00030C49764788 7455 0 sqlbpacc.C 2455 SQLO_LT_SQLB_POOL_CB__readLatch 256
0x0A00030000068A20 7455 0 sqlbilatch.C 1411 SQLO_LT_SQLB_POOL_MAP_CB__range_latch 256
。。。(省略大量latch等待,都是SQLO_LT_SQLB_POOL_CB__readLatch和SQLO_LT_SQLB_POOL_MAP_CB__range_latch )
0x0A00030C49764788 88926 0 sqlbpacc.C 1636 SQLO_LT_SQLB_POOL_CB__readLatch 1
0x0A00030000068A20 88926 0 sqlbilatch.C 1407 SQLO_LT_SQLB_POOL_MAP_CB__range_latch 1
0x0A00030C497671A8 90211 0 sqlbpacc.C 1636 SQLO_LT_SQLB_POOL_CB__readLatch 1
0x0A00030F534BD320 90211 0 sqlbilatch.C 1407 SQLO_LT_SQLB_POOL_MAP_CB__range_latch 1
0x0A00030C49764788 90468 0 sqlbpacc.C 1636 SQLO_LT_SQLB_POOL_CB__readLatch 1
0x0A00030000068A20 90468 0 sqlbilatch.C 1407 SQLO_LT_SQLB_POOL_MAP_CB__range_latch 1
0x0A00030C497671A8 91496 0 sqlbpacc.C 1636 SQLO_LT_SQLB_POOL_CB__readLatch 1
0x0A00030F534BD320 91496 0 sqlbilatch.C 1407 SQLO_LT_SQLB_POOL_MAP_CB__range_latch 1
0x0A000300000D94D0 95608 0 sqlbpool.C 13685 SQLO_LT_SQLB_CLNR_PAUSE_CB__preventSuspendLatch 1
0x0A00030001450930 95608 0 sqlbflush.C 119 SQLO_LT_SQLB_RuntimeBPDefs__rtBPLatch 1
0x0A0003000145D470 95608 0 sqlbbuffers.C 2639 SQLO_LT_SQLB_BufferPool__prevBPDCachingLatch 1
0x0A00030C49764798 95608 0 sqlbpacc.C 2476 SQLO_LT_SQLB_POOL_CB__writeLatch 1
0x0A00030000068A20 95608 0 sqlbilatch.C 1407 SQLO_LT_SQLB_POOL_MAP_CB__range_latch 1
0x0A00030C497671A8 96636 0 sqlbpacc.C 1636 SQLO_LT_SQLB_POOL_CB__readLatch 1
0x0A00030F534BD320 96636 0 sqlbilatch.C 1407 SQLO_LT_SQLB_POOL_MAP_CB__range_latch 1
。。。(省略一部分latch等待,都是SQLO_LT_SQLB_POOL_CB__readLatch和SQLO_LT_SQLB_POOL_MAP_CB__range_latch )
0x0A00030C497671A8 119521 0 sqlbpacc.C 1636 SQLO_LT_SQLB_POOL_CB__readLatch 1
0x0A00030F534BD320 119521 0 sqlbilatch.C 1407 SQLO_LT_SQLB_POOL_MAP_CB__range_latch 1
如上95608的latch等待比较特别,95608对应的app是online backup的会话。
收起