近日遇到个生产问题,花些时间和精力整理出来,与大家一起分享,希望看过的童鞋能有所收获。
1. 问题描述
AIX 5.3 DB2V9.1生产环境INSERT插入慢。
通过informatic自动生成insert语句,每晚对历史表(当前133G)进行插入,每1万条提交一次。 项目组反馈之前速度还都可以,后来越来越慢,3百多万条(每条51个字段),需要2个多小时的时间,SQL语句简单,insert values形式,容器存储采用裸设备(该表对应的个表空间容器为220个)。非交易系统,夜晚执行跑批,没有故意留守人员,白天根据系统环境和数据库的一些历史信息进行分析。
2. 问题分析抓取sql snapshot的片段
Number of executions = 3030137 Number of compilations = 1 Worst preparation time (ms) = 1 Best preparation time (ms) = 1 Internal rows deleted = 0 Internal rows inserted = 0 Rows read = 0 Internal rows updated = 0 Rows written = 3030137 Statement sorts = 0 Statement sort overflows = 0 Total sort time = 0 Buffer pool data logical reads = 7343062 Buffer pool data physical reads = 636371 Buffer pool temporary data logical reads = 0 Buffer pool temporary data physical reads = 0 Buffer pool index logical reads = 38809086 Buffer pool index physical reads = 1840177 Buffer pool temporary index logical reads = 0 Buffer pool temporary index physical reads = 0 Buffer pool xda logical reads = 0 Buffer pool xda physical reads = 0 Buffer pool temporary xda logical reads = 0 Buffer pool temporary xda physical reads = 0 Total execution time (sec.ms) = 8763.789449 Total user cpu time (sec.ms) = 123.317766 Total system cpu time (sec.ms) = 69.937971 Statement text = INSERT INTO ** VALUES ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) |
物理IO的时候是占用线程CPU,通过计算用户CPU与系统CPU的总和,能够大约估算出每次执行使用的CPU的时间,也就是说排除物理IO的影响。通过SQL快照评估SQL占用的资源。193.26/3030137*1000=0.0638ms即平均每次执行CPU时间,平均每次执行的总时间为8763/3030137*1000=2.892ms。
计算该SQL命中率,100*(1-(636371+1840177)/(7343062+38809086))=94.63%
平均每秒次数:3030137/8764=345.74,计算得每次IO插入的行数,100*(1-1/x)=94.63% x=1/(1-0.9463)=18.62即执行18.62行执行一次物理IO。
理论最大值每秒执行数3030137/193.26=15679
以每秒执行5000行为例。
1/5000=0.2ms
1/((0.2-0.0638)/5)=36.7
说明一个物理IO至少36.7条才能满足5000行速率的插入。
100*(1-1/36.7)=97.27%,即计算缓冲池命中率至少在97.27%以上满足此速率。
分别查看当前环境的BUFFERPOOL利用率和内存使用情况。
(1)该表对应的BUFFERPOOL的 ID 为3,截取部分信息如下所示:
Address Id Name PageSz PA-NumPgs BA-NumPgs BlkSize NumTbsp PgsToRemov CurrentSz PostAlter SuspndTSCt 0x0780000032BAD300 1 IBMDEFAULTBP 4096 1000 0 0 5 0 1000 1000 0 0x0780000032BAD780 2 BP4K 4096 10000 0 0 5 0 10000 10000 0 0x0780000032BADC00 3 BP8K 8192 400000 0 0 11 0 400000 400000 0 0x0780000032BAE080 4 BP16K 16384 8000 0 0 2 0 8000 8000 0 0x0780000032BAE500 5 BP32K 32768 6400 0 0 4 0 6400 6400 0 0 0 16 16 0
Bufferpool Statistics for all bufferpools (when BUFFERPOOL monitor switch is ON):
BPID DatLRds DatPRds HitRatio TmpDatLRds TmpDatPRds HitRatio IdxLRds IdxPRds HitRatio TmpIdxLRds TmpIdxPRds HitRatio 1 786229833 63552058 91.92% 468843 6908 98.53% 171346331 8044680 95.31% 0 0 00.00% 2 3841173627 720456585 81.24% 347925 212 99.94% 928328735 135596527 85.39% 0 0 00.00% 3 5013994242*7521116065*85.00% 1058100530*398223022 96.24% 3412927642*4195393715 98.77% 824844323 542029 99.93% 4 8279 35 99.58% 0 0 00.00% 0 0 00.00% 0 0 00.00% 5 7683001530 92419448 98.80% 21254628 0 100.00% 967052023 1016254 99.89% 0 0 00.00% |
(2)查看物理内存总大小
System configuration: lcpu=8 mem=32768MB
kthr memory page faults cpu ----- ----------- ------------------------ ------------ ----------- r b avm fre re pi po fr sr cy in sy cs us sy id wa 2 0 5742960 2004583 0 0 0 0 0 0 17 7279 1877 23 0 76 0 1 0 5722887 2024655 0 0 0 0 0 0 1264 12463 4647 27 2 65 5 0 0 5722901 2024633 0 0 0 0 0 0 3925 9728 10441 12 6 58 24 0 0 5722901 2024629 0 0 0 0 0 0 4863 12344 12271 21 10 28 41 2 0 5722977 2024553 0 0 0 0 0 0 5527 16714 13832 25 12 24 39 2 0 5722900 2024630 0 0 0 0 0 0 4881 11182 12540 22 10 22 45 3 0 5722898 2024275 0 0 0 0 0 0 2545 7801 7152 18 5 54 24 1 1 5722894 2024279 0 0 0 0 0 0 17 2504 618 4 0 96 0 1 0 5722894 2024278 0 0 0 0 0 0 45 2850 825 7 0 93 0
|
(3)topas查看使用内存使用情况:
Topas Monitor for host: bbdcrpt03 EVENTS/QUEUES FILE/TTY Tue Feb 18 15:09:43 2014 Interval: 2 Cswitch 475 Readch 5219.3K Syscall 7044 Writech 49730 CPU User% Kern% Wait% Idle% Reads 344 Rawin 0 ALL 1.4 0.7 0.3 97.6 Writes 1467 Ttyout 1045 Forks 1 Igets 0 Network KBPS I-Pack O-Pack KB-In KB-Out Execs 3 Namei 2066 en2 20.2 46.0 41.5 8.6 11.6 Runqueue 0.0 Dirblk 0 lo0 2.0 11.0 11.0 1.0 1.0 Waitqueue 0.0 en4 0.0 1.0 0.0 0.0 0.0 en0 0.0 0.0 0.0 0.0 0.0 PAGING MEMORY Faults 867 Real,MB 32768 Disk Busy% KBPS TPS KB-Read KB-Writ Steals 0 % Comp 68 power4 1.0 1.7K 11.0 1.7K 0.0 PgspIn 0 % Noncomp 7 power6 1.0 1.3K 17.0 1.3K 0.0 PgspOut 0 % Client 7 power5 1.0 1.1K 10.0 1.1K 18.0 PageIn 0 hdisk7 0.0 864.0 16.0 864.0 0.0 PageOut 10 PAGING SPACE hdisk22 0.0 832.0 16.0 832.0 0.0 Sios 11 Size,MB 8192 hdisk9 1.0 724.5 18.0 724.5 0.0 % Used 1 hdisk8 0.0 646.5 13.0 640.5 6.0 NFS (calls/sec) % Free 99
|
3. 参数调整
ID为3的bufferpool增加1G(剩余系统9G),当前大小为400000*8K/1024=3125M, bufferpool命中率为85.39%。
4. 调整后结果
ID为3的bufferpool增加1G 的空间。
截取sql snapshot片段:
Number of executions = 3125586 Number of compilations = 1 Worst preparation time (ms) = 1 Best preparation time (ms) = 1 Internal rows deleted = 0 Internal rows inserted = 0 Rows read = 0 Internal rows updated = 0 Rows written = 3125586 Statement sorts = 0 Statement sort overflows = 0 Total sort time = 0 Buffer pool data logical reads = 3674724 Buffer pool data physical reads = 59334 Buffer pool temporary data logical reads = 0 Buffer pool temporary data physical reads = 0 Buffer pool index logical reads = 41608916 Buffer pool index physical reads = 1597339 Buffer pool temporary index logical reads = 0 Buffer pool temporary index physical reads = 0 Buffer pool xda logical reads = 0 Buffer pool xda physical reads = 0 Buffer pool temporary xda logical reads = 0 Buffer pool temporary xda physical reads = 0 Total execution time (sec.ms) = 2402.526391 Total user cpu time (sec.ms) = 121.498738 Total system cpu time (sec.ms) = 44.173302 Statement text = INSERT INTO **** VALUES ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) |
平均每次执行CPU时间:165/3125586*1000=0.053ms
平均每次执行时间:2402/3125586*1000=0.768ms
平均每秒次数:3125586/2402.6=1300
计算该SQL命中率:100*(1-(59334+1597339)/(3674724+41608916))=96.34%
5. 结论比较参数修改前后的数据值命中率从94.63%增加到96.34%,每秒插入行数从345.74增加到1300。
与项目组沟通,与之前相比时间快了很多。俺说性能还有提升空间,满足需求即可,若月末批量任务较多,时间较紧可再联系,阶段性胜利。
KPI指标bufferpool命中率作为DBA来说都不陌生,非DBA有时也会追问BUFF命中率如何。正向说明我们都不陌生,反向推理的思想在慢慢形成。
6. 联想(1) 插入有时慢有时快
之前看过论坛问题诊断里有个问题关于SQL 插入时快时慢。根据此现象举一反三,其中一个原因有可能与这条SQL的插入次数和插入时间间隔有关。插入的表近期经常被访问,单条SQL命此时中率相对较高,物理读少,插入时间较短。隔段时间后BUFFERPOOL换页机制把许久不用或很少使用的页换出内存,为新的request提供内存页,第一次插入时单条SQL命中率较低,物理多读,插入时间会较长。具体情况需具体分析。
(2) 插入越来越慢
新环境搭建完时,一定的Buff大小会满足当前数据的data、index逻辑读,即bufferpool命中率为100%,随着时间的增长,表数据不断地插入,表会越来越大,越来越多的页会被写入BUFF,脏页刷到磁盘。读取更大的表数据时,当时设置的buff大小不能满足现在数据量的逻辑读,便会有更多的物理读把数据库写入BUFF,此时BUFF命中率会逐步变小。插入时物理读会逐步增多,从一个时间段来看,插入越来越慢。所以BUFF的定期巡检也非常有必要。
附:
在分析SQL快照的同时也抓取该SQL对应的表结构,例如是否有大对象,是否有索引,表和索引是否有碎片。
(1) DB 参数配置
Max DB files open per application (MAXFILOP) = 64 |
(2) 表的大小及行数
$ db2 "select FPAGES,card from syscat.tables where tabname='R_ACCT_SV_HIS'"
FPAGES CARD -------------------- -------------------- 17521439 923851062
1 record(s) selected.
$ bc 17521439*8/1024/1024 133
|
(3) 表及相关索引页面碎片类信息
Table statistics:
F1: 100 * OVERFLOW / CARD < 5 F2: 100 * (Effective Space Utilization of Data Pages) > 70 F3: 100 * (Required Pages / Total Pages) > 80
SCHEMA.NAME CARD OV NP FP ACTBLK TSIZE F1 F2 F3 REORG ---------------------------------------------------------------------------------------- Table: DATACORE.R_ACCT_SV_HIS 9.2e+08 0 2e+07 2e+07 - 1.40e+11 0 98 99 --- ----------------------------------------------------------------------------------------
Index statistics:
F4: CLUSTERRATIO or normalized CLUSTERFACTOR > 80 F5: 100 * (Space used on leaf pages / Space available on non-empty leaf pages) > MIN(50, (100 - PCTFREE)) F6: (100 - PCTFREE) * (Amount of space available in an index with one less level / Amount of space required for all keys) < 100 F7: 100 * (Number of pseudo-deleted RIDs / Total number of RIDs) < 20 F8: 100 * (Number of pseudo-empty leaf pages / Total number of leaf pages) < 20
SCHEMA.NAME INDCARD LEAF ELEAF LVLS NDEL KEYS LEAF_RECSIZE NLEAF_RECSIZE LEAF_PAGE_OVERHEAD NLEAF_PAGE_OVERHEAD F4 F5 F6 F7 F8 REORG -------------------------------------------------------------------------------------------------------------------------------------------------------------- Table: DATACORE.R_ACCT_SV_HIS Index: DATACORE.I_RA_SV_HIS1 9.2e+08 1e+06 0 4 0 74314 23 23 948 948 79 92 3 0 0 *---- Index: DATACORE.I_RA_SV_HIS2 9.2e+08 1e+06 0 4 0 31405 14 14 1174 1174 93 92 5 0 0 ----- Index: DATACORE.R_ACCT_SV_HIS_IDX8 9.2e+08 1e+06 0 4 0 2006134 22 22 1568 1568 2 93 2 0 0 *---- --------------------------------------------------------------------------------------------------------------------------------------------------------------
|
如果觉得我的文章对您有用,请点赞。您的支持将鼓励我继续创作!
赞0
添加新评论0 条评论