itsbegin
作者itsbegin·2013-01-21 15:01
软件开发工程师·上海通联支付

【原创】DB2 db2acd进程挂起现象与处理

字数 12781阅读 6009评论 6赞 2
发现数据库的连接进程逐渐增多,检查LOCKWAIT信息如下:
posdbhis /posdbhis >db2pd -d posdbhis -locks showlocks wait 

Database Partition 0 -- Database POSDBHIS -- Active -- Up 83 days 14:33:37

Locks:
Address TranHdl Lockname Type Mode Sts Owner Dur HoldCount Att ReleaseFlg
0x078000003A8F7680 42 0007000500000000006C000752 Row ..U W 33 1 0 0x00 0x00000040 TbspaceID 7 TableID 5 PartitionID 0 Page 108 Slot 7
0x078000003A8E1940 18 0007000500000000006D001352 Row .NS W 33 1 0 0x00 0x00000001 TbspaceID 7 TableID 5 PartitionID 0 Page 109 Slot 19
... ...
0x078000003A8E7680 24 0007000500000000006D001352 Row .NS W 33 1 0 0x00 0x00000200 TbspaceID 7 TableID 5 PartitionID 0 Page 109 Slot 19
0x078000003A8F98C0 33 0007000500000000006D001352 Row ..X G 33 1 0 0x00 0x40000003 TbspaceID 7 TableID 5 PartitionID 0 Page 109 Slot 19
 
所有其它进程都在等待TRANHDL 33解锁(X LOCK),等待对象可以通过查询syscat.tables where tbspaceid=7 and tableid=5得到,实际为"systools.hmon_atm_info".
经确认,此表为HMON进程(DB2ACD)进行RUNSTATS统计时进行查询,更新计数的统计信息表。确认进程信息:
posdbhis /posdbhis >db2pd -d posdbhis -transactions 

Database Partition 0 -- Database POSDBHIS -- Active -- Up 83 days 14:34:33

Transactions:
Address AppHandl [nod-index] TranHdl Locks State Tflag Tflag2 Firstlsn Lastlsn LogSpace SpaceReserved TID AxRegCnt GXID ClientUserID ClientWrkstnName ClientApplName ClientAccntng                  
0x0780000020331C80 1425 [000-01425] 2 11 READ 0x00000000 0x00000000 0x000000000000 0x000000000000 0 0 0x000000A92F6F 1 0 n/a n/a n/a n/a                            
0x0780000020332A80 44 [000-00044] 3 0 READ 0x00000000 0x00000000 0x000000000000 0x000000000000 0 0 0x00000099CBD2 1 0 n/a n/a n/a n/a       
... ...                     
0x078000002034CE80 774 [000-00774] 33 73 READ 0x00000000 0x00000000 0x000000000000 0x000000000000 0 0 0x000000A6BB5C 1 0 n/a n/a n/a n/a                            
... ...
0x078000002036B880 221 [000-00221] 68 0 READ 0x00000000 0x00000000 0x000000000000 0x000000000000 0 0 0x000000AB6545 1 0 n/a n/a n/a n/a                            
 
这里得到TRANSID=33的APPLID为774,并且看到这个进程HOLD住了73个锁!
posdbhis /posdbhis >db2pd -d posdbhis -apinfo 774                                       

Database Partition 0 -- Database POSDBHIS -- Active -- Up 82 days 06:32:35

Application :
  Address : 0x078000000020E300 
  AppHandl [nod-index] : 774 [000-00774]
  TranHdl : 33        
  Application ID : *LOCAL.posdbhis.120117182513    
  Application Name : DB2HMON             
  Application PID : 1155578   
  Application Node Name : APSBATCH                                
  Connection Start Time : (1326824707)Wed Jan 18 02:25:07 2012
  System Auth ID : POSDBHIS                      
  Coordinator EDU ID : 921608    
  Number of Agents : 1         
  Locks timeout value : 5 seconds
  Application Status : UOW-Waiting             
  ClientUserID : n/a
  ClientWrkstnName : n/a
  ClientApplName : n/a
  ClientAccntng : n/a
 
时间确认为1月18日2点连接的APP进程。查看APPL信息:
posdbhis /posdbhis >db2pd -d posdbhis -appl 

Database Partition 0 -- Database POSDBHIS -- Active -- Up 83 days 14:34:04

Applications:
Address AppHandl [nod-index] NumAgents CoorPid Status C-AnchID C-StmtUID L-AnchID L-StmtUID Appid                           
0x078000000032C9A0 221 [000-00221] 1 790912 ConnectCompleted 0 0 0 0 *LOCAL.posdbhis.120215140916    
0x07800000004DB320 582 [000-00582] 1 1655012 ConnectCompleted 0 0 0 0 BC003F0F.B4C0.120215133121      
0x07800000004D7CA0 15 [000-00015] 1 332214 Lock-wait 13 59 0 0 188.0.112.25.1332.120214230844  
... ...
0x078000000020E300 774 [000-00774] 1 921608 UOW-Waiting 0 0 200 585 *LOCAL.posdbhis.120117182513    
0x078000000032F8C0 1265 [000-01265] 1 1036362 ConnectCompleted 0 0 0 0 *LOCAL.DB2.111123233608         
0x078000000032F760 129 [000-00129] 1 422286 ConnectCompleted 0 0 0 0 *LOCAL.DB2.111123233607         
0x078000000040F660 44 [000-00044] 1 1212702 ConnectCompleted 0 0 0 0 *LOCAL.DB2.111123233606         
注意,这里看到774进程状态为UOW-Waiting,当前没有执行任何语句,上一条SQL语句的ARCHID与STMT UID为200,585。
可以在DB快照中发现此SQL语句:
posdbhis /posdbhis >db2 update monitor switches using statement on lock on table on uow on
posdbhis /posdbhis >db2 get snapshot for all on posdbhis

posdbhis /posdbhis >db2pd -d posdbhis -transactions -appl -dyn

... ...
Dynamic SQL Statements:
Address            AnchID StmtUID NumEnv NumVar NumRef NumExe Text 
... ...
0x078000004486ACE0 200    585     0      0      28     28     SELECT COUNT( * ) FROM SYSTOOLS.HMON_ATM_INFO WHERE STATS_FLAG = 'Y' AND STATS_NOTIFY = 'Y'
 
当前语句未能捕获到,应该是一条UPDATE。查看所有相关SQL:
posdbhis /posdbhis >grep -n "Statement text" posdbhis_snapall.info | grep -i HMON_ATM_INFO 
892: Statement text = SELECT * FROM "SYSTOOLS"."HMON_ATM_INFO"
952: Statement text = SELECT CREATE_TIME FROM SYSTOOLS.HMON_ATM_INFO WHERE SCHEMA = ? AND NAME = ? FOR UPDATE
1012: Statement text = SELECT NULLS, COLNO, DEFAULT, KEYSEQ,LONGLENGTH, NAME FROM SYSIBM.SYSCOLUMNS A, TABLE(SYSPROC.BASE_TABLE('SYSTOOLS','HMON_ATM_INFO')) B WHERE A.TBNAME=B.BASENAME AND A.TBCREATOR=B.BASESCHEMA ORDER BY COLNO
1192: Statement text = select * from systools.hmon_atm_info
1613: Statement text = select count(*) from systools.hmon_atm_info
1673: Statement text = select INDNAME,INDSCHEMA from syscat.indexes where TABSCHEMA='SYSTOOLS' AND TABNAME='HMON_ATM_INFO'
2333: Statement text = SELECT SCHEMA, NAME, STATS_DETAIL, STATS_STATE, STATS_TIME FROM SYSTOOLS.HMON_ATM_INFO WHERE (STATS_STATE = 2 OR STATS_STATE = 6) AND STATS_FLAG = 'Y'
2603: Statement text = SELECT COLCOUNT FROM SYSIBM.SYSTABLES A, TABLE(SYSPROC.BASE_TABLE('SYSTOOLS','HMON_ATM_INFO')) B WHERE A.CREATOR = B.BASESCHEMA AND A.NAME = B.BASENAME
2694: Statement text = select * from systools.hmon_atm_info with ur
2754: Statement text = SELECT CAST(s.INCREMENT as CHAR(33)), CAST(s.START as CHAR(33)), CAST(s.CACHE as INT), c.GENERATED, c.COLNO, CAST(s.MINVALUE as CHAR(33)), CAST(s.MAXVALUE as CHAR(33)), s.CYCLE, s.ORDER, s.REMARKS FROM SYSCAT.COLIDENTATTRIBUTES i, SYSCAT.SEQUENCES s, SYSIBM.SYSCOLUMNS c, TABLE(SYSPROC.BASE_TABLE('SYSTOOLS','HMON_ATM_INFO')) D WHERE i.TABSCHEMA = D.BASESCHEMA AND i.TABNAME = D.BASENAME AND s.SEQID = i.SEQID AND c.TBCREATOR = D.BASESCHEMA AND c.TBNAME = D.BASENAME AND c.IDENTITY = 'Y'
2815: Statement text = SELECT CLUSTERED, TBSPACEID, PROTECTIONGRANULARITY FROM syscat.tables WHERE TABSCHEMA='SYSTOOLS' AND TABNAME='HMON_ATM_INFO'
2875: Statement text = SELECT COUNT( * ) FROM SYSTOOLS.HMON_ATM_INFO WHERE STATS_FLAG = 'Y' AND STATS_NOTIFY = 'Y'
2995: Statement text = select count(*) from systools.hmon_atm_info with ur
3025: Statement text = SELECT COUNT( * ) FROM SYSTOOLS.HMON_ATM_INFO WHERE STATS_FLAG = 'Y' AND STATS_STATE > 1
3145: Statement text = UPDATE SYSTOOLS.HMON_ATM_INFO SET STATS_LOCK = 'Y' WHERE CURRENT OF SQL_CURSH200C4
 
经检查此UPDATE语句被执行次数较多:
Dynamic SQL Statements: 
Address            AnchID StmtUID NumEnv NumVar NumRef NumExe Text 
0x0780000044EAE720 235    574     0      0      1316   1316   UPDATE SYSTOOLS.HMON_ATM_INFO SET STATS_LOCK = 'Y' WHERE CURRENT OF SQL_CURSH200C4
 
此进程在OS上的进程号为921608,系统上进一步确认进程:
posdbhis /posdbhis >ps -ef|grep 921608 
posdbhis 532728 1442048 0 22:10:55 pts/2 0:00 grep 921608 
posdbhis 921608 974962 0 Jan 13 - 4:15 db2agent (POSDBHIS) 0
 
查看详细APPL信息:
posdbhis /posdbhis >db2 list applications show detail 

CONNECT Auth Id Application Name Appl. Application Id Seq# Number of Coordinating DB Coordinator Status Status Change Time DB Name DB Path
                                                    Handle Agents partition number pid/thread
------------------------------ -------------------- ---------- -------------------------------------------------------------- ----- ---------- ---------------- --------------- ------------------------------ -------------------------- -------- --------------------
POSDBHIS db2bp 221 *LOCAL.posdbhis.120215140916 00001 1 0 790912 Connect Completed Not Collected POSDBHIS /posdbhis/posdbhis/NODE0000/SQL00001/
... ...
POSDBHIS db2jcc_application 582 BC003F0F.B4C0.120215133121 00001 1 0 1655012 Connect Completed Not Collected POSDBHIS /posdbhis/posdbhis/NODE0000/SQL00001/
POSDBHIS db2move 15 188.0.112.25.1332.120214230844 00001 1 0 332214 Lock-wait Not Collected POSDBHIS /posdbhis/posdbhis/NODE0000/SQL00001/
... ...
POSDBHIS db2move 137 188.0.112.25.33502.120117222634 00001 1 0 475336 Lock-wait Not Collected POSDBHIS /posdbhis/posdbhis/NODE0000/SQL00001/
POSDBHIS DB2HMON 859 *LOCAL.posdbhis.120117194510 00001 1 0 598440 Lock-wait Not Collected POSDBHIS /posdbhis/posdbhis/NODE0000/SQL00001/
POSDBHIS DB2HMON 925 *LOCAL.posdbhis.120117182739 00001 1 0 729222 Lock-wait Not Collected POSDBHIS /posdbhis/posdbhis/NODE0000/SQL00001/
POSDBHIS DB2HMON 774 *LOCAL.posdbhis.120117182513 00090 1 0 921608 UOW Waiting Not Collected POSDBHIS /posdbhis/posdbhis/NODE0000/SQL00001/
POSDBHIS db2evmg_DB2DETAILDEA 1265 *LOCAL.DB2.111123233608 00001 1 0 1036362 Connect Completed Not Collected POSDBHIS /posdbhis/posdbhis/NODE0000/SQL00001/
POSDBHIS db2taskd 129 *LOCAL.DB2.111123233607 00001 1 0 422286 Connect Completed Not Collected POSDBHIS /posdbhis/posdbhis/NODE0000/SQL00001/
POSDBHIS db2stmm 44 *LOCAL.DB2.111123233606 00001 1 0 1212702 Connect Completed Not Collected POSDBHIS /posdbhis/posdbhis/NODE0000/SQL00001/
 
进一步TRACE查看:
posdbhis /posdbhis >db2trc on -p 921608 -t -f db2trc.921608.dmp 
Trace is turned on
posdbhis /posdbhis >date
Wed Feb 15 22:13:43 BEIST 2012
posdbhis /posdbhis >ls -al *.dmp
-rw-r----- 1 posdbhis db2grp1 187008 Feb 15 22:13 db2trc.921608.dmp
...(这里发现文件很长时间没有变化)...
posdbhis /posdbhis >db2trc off 
Trace is turned off
posdbhis /posdbhis >db2trc flw -t db2trc.921608.dmp db2trc.921608.flw
Total number of trace records : 0
posdbhis /posdbhis >db2trc fmt db2trc.921608.dmp db2trc.921608.fmt 
Trace truncated : NO
Trace wrapped : NO
Total number of trace records : 0
posdbhis /posdbhis >ls db2trc* db2trc.921608.dmp

posdbhis /posdbhis >truss -f -a -p 921608  
921608: psargs: 
921608: __semop(0, 0x0000000000000000, 0) (sleeping...)
^CPstatus: process is not stopped
 
说明此进程已经挂起。通过db2 force application(774)杀除,命令正常执行,但进程不响应,无法正常杀除。
 
直接在OS级别杀除任何db2本身的进程包括db2acd都会导致实例宕机,通过force applications all,db2stop force来关闭实例的方式来释放较好。
 
 
 

如果觉得我的文章对您有用,请点赞。您的支持将鼓励我继续创作!

2

添加新评论6 条评论

thuanqinthuanqin其它ibm
2013-04-18 20:55
好文章,命令很强大,自己还得多用用才能熟练
cg07cg07系统工程师联通
2013-04-09 21:39
好推荐一下
richmondrichmond系统运维工程师浙商银行
2013-04-09 11:14
什么版本的?
flywiththewindflywiththewind其它easy world
2013-03-22 22:21
思想够深入!学习了!好文章!
zhmwangzhmwangPDOceanBase
2013-03-04 09:21
好文章
liliyliliy其它北京
2013-01-21 16:56
学习了
Ctrl+Enter 发表

作者其他文章

相关文章

相关问题

相关资料

X社区推广