发现数据库的连接进程逐渐增多,检查LOCKWAIT信息如下:
posdbhis /posdbhis >db2pd -d posdbhis -locks showlocks wait
Database Partition 0 -- Database POSDBHIS -- Active -- Up 83 days 14:33:37
Locks:
Address TranHdl Lockname Type Mode Sts Owner Dur HoldCount Att ReleaseFlg
0x078000003A8F7680 42 0007000500000000006C000752 Row ..U W 33 1 0 0x00 0x00000040 TbspaceID 7 TableID 5 PartitionID 0 Page 108 Slot 7
0x078000003A8E1940 18 0007000500000000006D001352 Row .NS W 33 1 0 0x00 0x00000001 TbspaceID 7 TableID 5 PartitionID 0 Page 109 Slot 19
... ...
0x078000003A8E7680 24 0007000500000000006D001352 Row .NS W 33 1 0 0x00 0x00000200 TbspaceID 7 TableID 5 PartitionID 0 Page 109 Slot 19
0x078000003A8F98C0 33 0007000500000000006D001352 Row ..X G 33 1 0 0x00 0x40000003 TbspaceID 7 TableID 5 PartitionID 0 Page 109 Slot 19
所有其它进程都在等待TRANHDL 33解锁(X LOCK),等待对象可以通过查询syscat.tables where tbspaceid=7 and tableid=5得到,实际为"systools.hmon_atm_info".
经确认,此表为HMON进程(DB2ACD)进行RUNSTATS统计时进行查询,更新计数的统计信息表。确认进程信息:
posdbhis /posdbhis >db2pd -d posdbhis -transactions
Database Partition 0 -- Database POSDBHIS -- Active -- Up 83 days 14:34:33
Transactions:
Address AppHandl [nod-index] TranHdl Locks State Tflag Tflag2 Firstlsn Lastlsn LogSpace SpaceReserved TID AxRegCnt GXID ClientUserID ClientWrkstnName ClientApplName ClientAccntng
0x0780000020331C80 1425 [000-01425] 2 11 READ 0x00000000 0x00000000 0x000000000000 0x000000000000 0 0 0x000000A92F6F 1 0 n/a n/a n/a n/a
0x0780000020332A80 44 [000-00044] 3 0 READ 0x00000000 0x00000000 0x000000000000 0x000000000000 0 0 0x00000099CBD2 1 0 n/a n/a n/a n/a
... ...
0x078000002034CE80 774 [000-00774] 33 73 READ 0x00000000 0x00000000 0x000000000000 0x000000000000 0 0 0x000000A6BB5C 1 0 n/a n/a n/a n/a
... ...
0x078000002036B880 221 [000-00221] 68 0 READ 0x00000000 0x00000000 0x000000000000 0x000000000000 0 0 0x000000AB6545 1 0 n/a n/a n/a n/a
这里得到TRANSID=33的APPLID为774,并且看到这个进程HOLD住了73个锁!
posdbhis /posdbhis >db2pd -d posdbhis -apinfo 774
Database Partition 0 -- Database POSDBHIS -- Active -- Up 82 days 06:32:35
Application :
Address : 0x078000000020E300
AppHandl [nod-index] : 774 [000-00774]
TranHdl : 33
Application ID : *LOCAL.posdbhis.120117182513
Application Name : DB2HMON
Application PID : 1155578
Application Node Name : APSBATCH
Connection Start Time : (1326824707)Wed Jan 18 02:25:07 2012
System Auth ID : POSDBHIS
Coordinator EDU ID : 921608
Number of Agents : 1
Locks timeout value : 5 seconds
Application Status : UOW-Waiting
ClientUserID : n/a
ClientWrkstnName : n/a
ClientApplName : n/a
ClientAccntng : n/a
时间确认为1月18日2点连接的APP进程。查看APPL信息:
posdbhis /posdbhis >db2pd -d posdbhis -appl
Database Partition 0 -- Database POSDBHIS -- Active -- Up 83 days 14:34:04
Applications:
Address AppHandl [nod-index] NumAgents CoorPid Status C-AnchID C-StmtUID L-AnchID L-StmtUID Appid
0x078000000032C9A0 221 [000-00221] 1 790912 ConnectCompleted 0 0 0 0 *LOCAL.posdbhis.120215140916
0x07800000004DB320 582 [000-00582] 1 1655012 ConnectCompleted 0 0 0 0 BC003F0F.B4C0.120215133121
0x07800000004D7CA0 15 [000-00015] 1 332214 Lock-wait 13 59 0 0 188.0.112.25.1332.120214230844
... ...
0x078000000020E300 774 [000-00774] 1 921608 UOW-Waiting 0 0 200 585 *LOCAL.posdbhis.120117182513
0x078000000032F8C0 1265 [000-01265] 1 1036362 ConnectCompleted 0 0 0 0 *LOCAL.DB2.111123233608
0x078000000032F760 129 [000-00129] 1 422286 ConnectCompleted 0 0 0 0 *LOCAL.DB2.111123233607
0x078000000040F660 44 [000-00044] 1 1212702 ConnectCompleted 0 0 0 0 *LOCAL.DB2.111123233606
注意,这里看到774进程状态为UOW-Waiting,当前没有执行任何语句,上一条SQL语句的ARCHID与STMT UID为200,585。
可以在DB快照中发现此SQL语句:
posdbhis /posdbhis >db2 update monitor switches using statement on lock on table on uow on
posdbhis /posdbhis >db2 get snapshot for all on posdbhis
posdbhis /posdbhis >db2pd -d posdbhis -transactions -appl -dyn
... ...
Dynamic SQL Statements:
Address AnchID StmtUID NumEnv NumVar NumRef NumExe Text
... ...
0x078000004486ACE0 200 585 0 0 28 28 SELECT COUNT( * ) FROM SYSTOOLS.HMON_ATM_INFO WHERE STATS_FLAG = 'Y' AND STATS_NOTIFY = 'Y'
当前语句未能捕获到,应该是一条UPDATE。查看所有相关SQL:
posdbhis /posdbhis >grep -n "Statement text" posdbhis_snapall.info | grep -i HMON_ATM_INFO
892: Statement text = SELECT * FROM "SYSTOOLS"."HMON_ATM_INFO"
952: Statement text = SELECT CREATE_TIME FROM SYSTOOLS.HMON_ATM_INFO WHERE SCHEMA = ? AND NAME = ? FOR UPDATE
1012: Statement text = SELECT NULLS, COLNO, DEFAULT, KEYSEQ,LONGLENGTH, NAME FROM SYSIBM.SYSCOLUMNS A, TABLE(SYSPROC.BASE_TABLE('SYSTOOLS','HMON_ATM_INFO')) B WHERE A.TBNAME=B.BASENAME AND A.TBCREATOR=B.BASESCHEMA ORDER BY COLNO
1192: Statement text = select * from systools.hmon_atm_info
1613: Statement text = select count(*) from systools.hmon_atm_info
1673: Statement text = select INDNAME,INDSCHEMA from syscat.indexes where TABSCHEMA='SYSTOOLS' AND TABNAME='HMON_ATM_INFO'
2333: Statement text = SELECT SCHEMA, NAME, STATS_DETAIL, STATS_STATE, STATS_TIME FROM SYSTOOLS.HMON_ATM_INFO WHERE (STATS_STATE = 2 OR STATS_STATE = 6) AND STATS_FLAG = 'Y'
2603: Statement text = SELECT COLCOUNT FROM SYSIBM.SYSTABLES A, TABLE(SYSPROC.BASE_TABLE('SYSTOOLS','HMON_ATM_INFO')) B WHERE A.CREATOR = B.BASESCHEMA AND A.NAME = B.BASENAME
2694: Statement text = select * from systools.hmon_atm_info with ur
2754: Statement text = SELECT CAST(s.INCREMENT as CHAR(33)), CAST(s.START as CHAR(33)), CAST(s.CACHE as INT), c.GENERATED, c.COLNO, CAST(s.MINVALUE as CHAR(33)), CAST(s.MAXVALUE as CHAR(33)), s.CYCLE, s.ORDER, s.REMARKS FROM SYSCAT.COLIDENTATTRIBUTES i, SYSCAT.SEQUENCES s, SYSIBM.SYSCOLUMNS c, TABLE(SYSPROC.BASE_TABLE('SYSTOOLS','HMON_ATM_INFO')) D WHERE i.TABSCHEMA = D.BASESCHEMA AND i.TABNAME = D.BASENAME AND s.SEQID = i.SEQID AND c.TBCREATOR = D.BASESCHEMA AND c.TBNAME = D.BASENAME AND c.IDENTITY = 'Y'
2815: Statement text = SELECT CLUSTERED, TBSPACEID, PROTECTIONGRANULARITY FROM syscat.tables WHERE TABSCHEMA='SYSTOOLS' AND TABNAME='HMON_ATM_INFO'
2875: Statement text = SELECT COUNT( * ) FROM SYSTOOLS.HMON_ATM_INFO WHERE STATS_FLAG = 'Y' AND STATS_NOTIFY = 'Y'
2995: Statement text = select count(*) from systools.hmon_atm_info with ur
3025: Statement text = SELECT COUNT( * ) FROM SYSTOOLS.HMON_ATM_INFO WHERE STATS_FLAG = 'Y' AND STATS_STATE > 1
3145: Statement text = UPDATE SYSTOOLS.HMON_ATM_INFO SET STATS_LOCK = 'Y' WHERE CURRENT OF SQL_CURSH200C4
经检查此UPDATE语句被执行次数较多:
Dynamic SQL Statements:
Address AnchID StmtUID NumEnv NumVar NumRef NumExe Text
0x0780000044EAE720 235 574 0 0 1316 1316 UPDATE SYSTOOLS.HMON_ATM_INFO SET STATS_LOCK = 'Y' WHERE CURRENT OF SQL_CURSH200C4
此进程在OS上的进程号为921608,系统上进一步确认进程:
posdbhis /posdbhis >ps -ef|grep 921608
posdbhis 532728 1442048 0 22:10:55 pts/2 0:00 grep 921608
posdbhis 921608 974962 0 Jan 13 - 4:15 db2agent (POSDBHIS) 0
查看详细APPL信息:
posdbhis /posdbhis >db2 list applications show detail
CONNECT Auth Id Application Name Appl. Application Id Seq# Number of Coordinating DB Coordinator Status Status Change Time DB Name DB Path
Handle Agents partition number pid/thread
------------------------------ -------------------- ---------- -------------------------------------------------------------- ----- ---------- ---------------- --------------- ------------------------------ -------------------------- -------- --------------------
POSDBHIS db2bp 221 *LOCAL.posdbhis.120215140916 00001 1 0 790912 Connect Completed Not Collected POSDBHIS /posdbhis/posdbhis/NODE0000/SQL00001/
... ...
POSDBHIS db2jcc_application 582 BC003F0F.B4C0.120215133121 00001 1 0 1655012 Connect Completed Not Collected POSDBHIS /posdbhis/posdbhis/NODE0000/SQL00001/
POSDBHIS db2move 15 188.0.112.25.1332.120214230844 00001 1 0 332214 Lock-wait Not Collected POSDBHIS /posdbhis/posdbhis/NODE0000/SQL00001/
... ...
POSDBHIS db2move 137 188.0.112.25.33502.120117222634 00001 1 0 475336 Lock-wait Not Collected POSDBHIS /posdbhis/posdbhis/NODE0000/SQL00001/
POSDBHIS DB2HMON 859 *LOCAL.posdbhis.120117194510 00001 1 0 598440 Lock-wait Not Collected POSDBHIS /posdbhis/posdbhis/NODE0000/SQL00001/
POSDBHIS DB2HMON 925 *LOCAL.posdbhis.120117182739 00001 1 0 729222 Lock-wait Not Collected POSDBHIS /posdbhis/posdbhis/NODE0000/SQL00001/
POSDBHIS DB2HMON 774 *LOCAL.posdbhis.120117182513 00090 1 0 921608 UOW Waiting Not Collected POSDBHIS /posdbhis/posdbhis/NODE0000/SQL00001/
POSDBHIS db2evmg_DB2DETAILDEA 1265 *LOCAL.DB2.111123233608 00001 1 0 1036362 Connect Completed Not Collected POSDBHIS /posdbhis/posdbhis/NODE0000/SQL00001/
POSDBHIS db2taskd 129 *LOCAL.DB2.111123233607 00001 1 0 422286 Connect Completed Not Collected POSDBHIS /posdbhis/posdbhis/NODE0000/SQL00001/
POSDBHIS db2stmm 44 *LOCAL.DB2.111123233606 00001 1 0 1212702 Connect Completed Not Collected POSDBHIS /posdbhis/posdbhis/NODE0000/SQL00001/
进一步TRACE查看:
posdbhis /posdbhis >db2trc on -p 921608 -t -f db2trc.921608.dmp
Trace is turned on
posdbhis /posdbhis >date
Wed Feb 15 22:13:43 BEIST 2012
posdbhis /posdbhis >ls -al *.dmp
-rw-r----- 1 posdbhis db2grp1 187008 Feb 15 22:13 db2trc.921608.dmp
...(这里发现文件很长时间没有变化)...
posdbhis /posdbhis >db2trc off
Trace is turned off
posdbhis /posdbhis >db2trc flw -t db2trc.921608.dmp db2trc.921608.flw
Total number of trace records : 0
posdbhis /posdbhis >db2trc fmt db2trc.921608.dmp db2trc.921608.fmt
Trace truncated : NO
Trace wrapped : NO
Total number of trace records : 0
posdbhis /posdbhis >ls db2trc* db2trc.921608.dmp
posdbhis /posdbhis >truss -f -a -p 921608
921608: psargs:
921608: __semop(0, 0x0000000000000000, 0) (sleeping...)
^CPstatus: process is not stopped
说明此进程已经挂起。通过db2 force application(774)杀除,命令正常执行,但进程不响应,无法正常杀除。
直接在OS级别杀除任何db2本身的进程包括db2acd都会导致实例宕机,通过force applications all,db2stop force来关闭实例的方式来释放较好。
添加新评论6 条评论
2013-04-18 20:55
2013-04-09 21:39
2013-04-09 11:14
2013-03-22 22:21
2013-03-04 09:21
2013-01-21 16:56