背景与环境:
现场信息:
Database Lock Snapshot
Database name = ZBBB
Database path = /db2data/db2inst1/NODE0000/SQL00001/MEMBER0000/
Input database alias = ZBBB
Locks held = 295
Applications currently connected = 161
Agents currently waiting on locks = 0
Snapshot timestamp = 05/05/2018 11:06:08.916275
Application handle = 10641
Application ID = 172.20.2.2.65000.180824194601
Sequence number = 00001
Application name = db2jcc_application
CONNECT Authorization ID = USERDBS
Application status = UOW Executing
Status change time = Not Collected
Application code page = 1208
Locks held = 292
Total wait time (ms) = Not Collected
List Of Locks
Lock Name = 0x0000000300000001000130A0D6
Lock Attributes = 0x00000000
Release Flags = 0x40000000
Lock Count = 1
Hold Count = 0
Lock Object Name = 0
Object Type = Internal Variation Lock
Mode = S
Lock Name = 0x000500060000000072CB000652
Lock Attributes = 0x00000008
Release Flags = 0x40000000
Lock Count = 1
Hold Count = 0
Lock Object Name = 1925906438
Object Type = Row
Tablespace Name = MHAPPLOB
Table Schema = MHAPP
Table Name = RPT01_CAC
Mode = X
Lock Name = 0x000500060000000072CB000552
Lock Attributes = 0x00000008
Release Flags = 0x40000000
Lock Count = 1
Hold Count = 0
Lock Object Name = 1925906437
Object Type = Row
Tablespace Name = MHAPPLOB
Table Schema = MHAPP
Table Name = RPT01_CAC
Mode = X
Lock Name = 0x00040005000000004601000452
Lock Attributes = 0x00000008
Release Flags = 0x40000000
Lock Count = 1
Hold Count = 0
Lock Object Name = 1174470660
Object Type = Row
Tablespace Name = MHAPPGEN
Table Schema = MHAPP
Table Name = ELE01_SASP
Mode = X
Lock Name = 0x000500060000000072CB000452
Lock Attributes = 0x00000008
Release Flags = 0x40000000
Lock Count = 1
Hold Count = 0
Lock Object Name = 1925906436
Object Type = Row
Tablespace Name = MHAPPLOB
Table Schema = MHAPP
Table Name = RPT01_CAC
Mode = X
Lock Name = 0x00040005000000004601000352
Lock Attributes = 0x00000008
Release Flags = 0x40000000
Lock Count = 1
Hold Count = 0
Lock Object Name = 1174470659
Object Type = Row
Tablespace Name = MHAPPGEN
Table Schema = MHAPP
Table Name = ELE01_SASP
Mode = X
Lock Name = 0x000500060000000072CB000352
Lock Attributes = 0x00000008
Release Flags = 0x40000000
Lock Count = 1
Hold Count = 0
Lock Object Name = 1925906435
Object Type = Row
Tablespace Name = MHAPPLOB
Table Schema = MHAPP
Table Name = RPT01_CAC
Mode = X
Lock Name = 0x00040005000000004601000252
Lock Attributes = 0x00000008
Release Flags = 0x40000000
Lock Count = 1
Hold Count = 0
Lock Object Name = 1174470658
Object Type = Row
Tablespace Name = MHAPPGEN
Table Schema = MHAPP
Table Name = ELE01_SASP
Mode = X
Lock Name = 0x000500060000000072CB000252
Lock Attributes = 0x00000008
Release Flags = 0x40000000
Lock Count = 1
Hold Count = 0
Lock Object Name = 1925906434
Object Type = Row
Tablespace Name = MHAPPLOB
Table Schema = MHAPP
Table Name = RPT01_CAC
Mode = X
......后面还有很多
db2diag.log中无报错信息;插入事务的程序工程师反映其在插入数据库后,确实做过了主动commit。
问题:
现阶段,我不知道该继续收集那些信息,如何判断瓶颈在何处。
请老师多多指教
db2pd -d dbname -wlocks 查看锁等待的现象。应用持有大量锁没关系,只要不发生锁升级和锁等待就没事。db2的锁是放在内存里的,获取锁的开销并不大。集中大量的数据写入慢更可能是出现在其他瓶颈上。例如latch,io等。
收起两种监控锁等待的方法,大量的锁,需要结合监控结果去分析,是哪些SQL hold了锁:
1 db2 event monitor for locking
2 db2pd -d dbname -wlocks
我们这里的数据库每天utc19:30开始将MHAPP表空间中7天以前的数据挪至MHHISTORY表空间中,将MHHISTORY表空间中90天之前的数据删除,同时每天把昨天的数据export到本地目录。
每天晚上19:30开始,数据库出现锁升级,数据库参数采用的都是默认参数,如下:
$ db2 get db cfg|grep 'LOCK'
Max storage for lock list (4KB) (LOCKLIST) = AUTOMATIC(777376)
Percent. of lock lists per application (MAXLOCKS) = AUTOMATIC(66)
Lock timeout (sec) (LOCKTIMEOUT) = -1
Block non logged operations (BLOCKNONLOGGED) = NO
Lock timeout events (MON_LOCKTIMEOUT) = NONE
Deadlock events (MON_DEADLOCK) = WITHOUT_HIST
Lock wait events (MON_LOCKWAIT) = NONE
$ db2 get db cfg|grep MAXLOCKS
Percent. of lock lists per application (MAXLOCKS) = AUTOMATIC(66)
$ db2 get db cfg|grep MAX
Percent. of lock lists per application (MAXLOCKS) = AUTOMATIC(66)
Max number of active applications (MAXAPPLS) = AUTOMATIC(323)
Max DB files open per application (MAXFILOP) = 61440
Percent max primary log space by transaction (MAX_LOG) = 0
Percent log file reclaimed before soft chckpt (SOFTMAX) = 100
所升级开始后,holded的锁和target number of locks 总维持在50%,然后ecaliscation 的锁数量又会比holed的锁数量少6个,请问这是什么原因呢?
2018-07-27-19.33.13.266796 Instance:db2inst1 Node:000
PID:6553838(db2agent (ZBBB) 0) TID:20887 Appid:172.20.2.2.34731.180815082849
data management sqldEscalateLocks Probe:1 Database:ZBBB
ADM5501I DB2 is performing lock escalation. The affected application is named
"db2jcc_application", and is associated with the workload name
"SYSDEFAULTUSERWORKLOAD" and application ID "172.20.2.2.34731.180815082849" at
member "0". The total number of locks currently held is "10098408", and the
target number of locks to hold is "5049204". The current statement being
executed is "delete from (select reftime,row_number() over(order by reftime) rn
from MHHISTORY.BULL01_ALL) where reftime<'2018-4-28 00:00:00 ' and
rn<=20000". Reason code: "1"
^^
2018-07-27-19.33.13.279160 Instance:db2inst1 Node:000
PID:6553838(db2agent (ZBBB) 0) TID:20887 Appid:172.20.2.2.34731.180815082849
data management sqldEscalateLocks Probe:3 Database:ZBBB
ADM5502W The escalation of "10098402" locks on table "MHHISTORY.BULL01_ALL" to
lock intent "X" was successful.
^^
2018-07-27-19.33.57.632442 Instance:db2inst1 Node:000
PID:6553838(db2agent (ZBBB) 0) TID:20887 Appid:172.20.2.2.34731.180815082849
data management sqldEscalateLocks Probe:1 Database:ZBBB
ADM5501I DB2 is performing lock escalation. The affected application is named
"db2jcc_application", and is associated with the workload name
"SYSDEFAULTUSERWORKLOAD" and application ID "172.20.2.2.34731.180815082849" at
member "0". The total number of locks currently held is "9893767", and the
target number of locks to hold is "4946883". The current statement being
executed is "delete from (select reftime,row_number() over(order by reftime) rn
from MHHISTORY.BULL01_ALL) where reftime<'2018-4-28 00:00:00 ' and
rn<=20000". Reason code: "1"
^^
2018-07-27-19.33.57.632997 Instance:db2inst1 Node:000
PID:6553838(db2agent (ZBBB) 0) TID:20887 Appid:172.20.2.2.34731.180815082849
data management sqldEscalateLocks Probe:3 Database:ZBBB
ADM5502W The escalation of "9893761" locks on table "MHHISTORY.BULL01_ALL" to
lock intent "X" was successful.
^^
2018-07-27-19.37.46.083092 Instance:db2inst1 Node:000
PID:6553838(db2agent (ZBBB) 0) TID:30034 Appid:172.20.2.2.35793.180815084544
data management sqldEscalateLocks Probe:1 Database:ZBBB
ADM5501I DB2 is performing lock escalation. The affected application is named
"db2jcc_application", and is associated with the workload name
"SYSDEFAULTUSERWORKLOAD" and application ID "172.20.2.2.35793.180815084544" at
member "0". The total number of locks currently held is "10282374", and the
target number of locks to hold is "5141187". The current statement being
executed is "delete from (select reftime,row_number() over(order by reftime) rn
from MHHISTORY.RPT01_CAC) where reftime<'2018-4-28 00:00:00 ' and rn<=20000".
Reason code: "1"
^^
2018-07-27-19.37.46.083651 Instance:db2inst1 Node:000
PID:6553838(db2agent (ZBBB) 0) TID:30034 Appid:172.20.2.2.35793.180815084544
data management sqldEscalateLocks Probe:3 Database:ZBBB
ADM5502W The escalation of "10282368" locks on table "MHHISTORY.RPT01_CAC" to
lock intent "X" was successful.
^^
2018-07-27-19.38.40.179714 Instance:db2inst1 Node:000
PID:6553838(db2agent (ZBBB) 0) TID:30034 Appid:172.20.2.2.35793.180815084544
data management sqldEscalateLocks Probe:1 Database:ZBBB
ADM5501I DB2 is performing lock escalation. The affected application is named
"db2jcc_application", and is associated with the workload name
"SYSDEFAULTUSERWORKLOAD" and application ID "172.20.2.2.35793.180815084544" at
member "0". The total number of locks currently held is "11113510", and the