DB2 V9.7 表压缩测试
一、压缩表占用空间测试本次以总记录数2亿7千万记录的表pmid.T_DW_AL02_005_D1305为源表数据,分别以压缩和非压缩方式创建测试表,进行数据入表测试。
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2 "select count(*) from pmid.T_DW_AL02_005_D1305"
1
-----------
276234830
1 record(s) selected.
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2 "select tabschema,tabname,STATS_TIME,npages,fpages,compression from syscat.tables where tabname=upper('T_DW_AL02_005_D1305')"
TABSCHEMA TABNAME STATS_TIME NPAGES FPAGES COMPRESSION
-------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------- -------------------- -------------------- -----------
PMID T_DW_AL02_005_D1305 2013-09-12-14.39.45.303142 578770 578770 N
1 record(s) selected.
1、非压缩表页面占用测试:
首先,我们创建db2inst1. Testfengnocompress非压缩表:
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2 connect to dssdb
Database Connection Information
Database server = DB2/AIX64 9.7.5
SQL authorization ID = DB2INST1
Local database alias = DSSDB
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2 "create table testfengnocompress like pmid.T_DW_AL02_005_D1305"
DB20000I The SQL command completed successfully.
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>time db2 "insert into testfengnocompress select * from pmid.T_DW_AL02_005_D1305"
DB20000I The SQL command completed successfully.
real 1m47.87s
user 0m0.01s
sys 0m0.00s
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2 runstats on table db2inst1.TESTFENGNOCOMPRESS
DB20000I The RUNSTATS command completed successfully.
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2 "select tabschema,tabname,STATS_TIME,npages,fpages,compression from syscat.tables where tabname=upper('testfengnocompress')" TABSCHEMA TABNAME STATS_TIME NPAGES FPAGES COMPRESSION
-------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------- -------------------- -------------------- -----------
DB2INST1 TESTFENGNOCOMPRESS 2013-09-12-15.39.47.945832 585440 585440 N
1 record(s) selected.
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>time db2 "select count(*) from db2inst1.TESTFENGNOCOMPRESS"
1
-----------
276234830
1 record(s) selected.
real 0m1.65s
user 0m0.00s
sys 0m0.00s
我们看到,和pmid.T_DW_AL02_005_D1305表相同记录数新建的非压缩表,占用585440个页面。
2、压缩表页面占用测试创建db2inst1.TESTFENGCOMPRESS压缩表,启用压缩属性:
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2 create table testfengcompress like pmid.T_DW_AL02_005_D1305
DB20000I The SQL command completed successfully.
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2 alter table testfengcompress compress yes
DB20000I The SQL command completed successfully.
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2 runstats on table db2inst1.TESTFENGCOMPRESS
DB20000I The RUNSTATS command completed successfully.
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2 "select tabschema,tabname,STATS_TIME,npages,fpages,compression from syscat.tables where tabname=upper('testfengcompress')"
TABSCHEMA TABNAME STATS_TIME NPAGES FPAGES COMPRESSION
-------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------- -------------------- -------------------- -----------
DB2INST1 TESTFENGCOMPRESS 2013-09-12-15.11.52.463884 0 32 R
1 record(s) selected.
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>time db2 "insert into db2inst1.TESTFENGCOMPRESS select * from pmid.T_DW_AL02_005_D1305"
DB20000I The SQL command completed successfully.
real 2m0.56s
user 0m0.01s
sys 0m0.01s
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2 runstats on table db2inst1.TESTFENGCOMPRESS
DB20000I The RUNSTATS command completed successfully.
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2 "select tabschema,tabname,STATS_TIME,npages,fpages,compression from syscat.tables where tabname=upper('testfengcompress')"
TABSCHEMA TABNAME STATS_TIME NPAGES FPAGES COMPRESSION
-------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------- -------------------- -------------------- -----------
DB2INST1 TESTFENGCOMPRESS 2013-09-12-15.49.49.971975 318560 318592 R
1 record(s) selected.
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>time db2 "select count(*) from db2inst1.TESTFENGCOMPRESS"
1
-----------
276234830
1 record(s) selected.
real 0m1.75s
user 0m0.00s
sys 0m0.00s
根据insert到压缩测试表的统计信息,我们看到,压缩测试表,总共占用318560个页面。
2)load数据入压缩测试表db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2 "truncate table db2inst1.TESTFENGCOMPRESS immediate"
DB20000I The SQL command completed successfully.
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2 "declare c1 cursor for select * from pmid.T_DW_AL02_005_D1305"
DB20000I The SQL command completed successfully.
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>time db2 "load from c1 of cursor insert into db2inst1.testfengcompress"
Agent Type Node SQL Code Result
______________________________________________________________________________
LOAD 000 +00000000 Success.
…………………………………………………………………………
…………………………………………………………………………
______________________________________________________________________________
PARTITION 001 +00000000 Success.
______________________________________________________________________________
RESULTS: 32 of 32 LOADs completed successfully.
______________________________________________________________________________
Summary of Partitioning Agents:
Rows Read = 276234830
Rows Rejected = 0
Rows Partitioned = 276234830
Summary of LOAD Agents:
Number of rows read = 276234830
Number of rows skipped = 0
Number of rows loaded = 276234830
Number of rows rejected = 0
Number of rows deleted = 0
Number of rows committed = 276234830
real 4m32.67s
user 0m0.01s
sys 0m0.01s
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2 runstats on table db2inst1.TESTFENGCOMPRESS
DB20000I The RUNSTATS command completed successfully.
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2 "select tabschema,tabname,STATS_TIME,npages,fpages,compression from syscat.tables where tabname=upper('testfengcompress')"
TABSCHEMA TABNAME STATS_TIME NPAGES FPAGES COMPRESSION
-------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------- -------------------- -------------------- -----------
DB2INST1 TESTFENGCOMPRESS 2013-09-12-15.32.05.217527 318496 318592 R
1 record(s) selected.
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>time db2 "select count(*) from db2inst1.TESTFENGCOMPRESS"
1
-----------
276234830
1 record(s) selected.
real 0m9.51s
user 0m0.00s
sys 0m0.01s
根据load数据到压缩测试表的统计信息,我们看到,压缩测试表,总共占用318496个页面。
不管使用load,还是insert方式,同样记录入启用压缩属性的表,占用页面数基本相差不大。
我们看到,和pmid.T_DW_AL02_005_D1305表相同记录数新建的非压缩表,占用585440个页面。
二、压缩、非压缩表全表扫描查询IO对比我们通过查询表总的记录数来对压缩表、非压缩表进行全表扫描查询的IO成本进行对比。
1、 非压缩表全扫COSTdb2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2 set current explain mode explain
DB20000I The SQL command completed successfully.
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh> db2 -v "select count(*) from db2inst1.TESTFENGNOCOMPRESS"
select count(*) from db2inst1.TESTFENGNOCOMPRESS
SQL0217W The statement was not executed as only Explain information requests
are being processed. SQLSTATE=01604
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2 set current explain mode no
DB20000I The SQL command completed successfully.
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2exfmt -d dssdb -g TIC -w -1 -n % -# 0 -s % -o no_compress.out
…………………………………………………………………………………………
Access Plan:
-----------
Total Cost: 19803.8
Query Degree: 1
Rows
RETURN
( 1)
Cost
I/O
|
1
GRPBY
( 2)
19803.8
18295
|
32
DTQ
( 3)
19803.8
18295
|
1
GRPBY
( 4)
19803.7
18295
|
8.78076e+06
TBSCAN
( 5)
19224.8
18295
|
8.78076e+06
TABLE: DB2INST1
TESTFENGNOCOMPRESS
Q1
Plan Details:
-------------
1) RETURN: (Return Result)
Cumulative Total Cost: 19803.8
Cumulative CPU Cost: 5.89099e+09
Cumulative I/O Cost: 18295
Cumulative Re-Total Cost: 1523.81
Cumulative Re-CPU Cost: 5.77801e+09
Cumulative Re-I/O Cost: 0
Cumulative First Row Cost: 19803.8
Cumulative Comm Cost: 32.3085
Cumulative First Comm Cost: 0
Estimated Bufferpool Buffers: 18295
非压缩表:
Total Cost:19803.8 ,IO Cost:18295 ,CPU Cost:5.89099e+09
2、 压缩表全扫COSTdb2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2 set current explain mode explain
DB20000I The SQL command completed successfully.
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh> db2 -v "select count(*) from db2inst1.TESTFENGCOMPRESS"
select count(*) from db2inst1.TESTFENGCOMPRESS
SQL0217W The statement was not executed as only Explain information requests
are being processed. SQLSTATE=01604
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2 set current explain mode no
DB20000I The SQL command completed successfully.
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2exfmt -d dssdb -g TIC -w -1 -n % -# 0 -s % -o compress.out
…………………………………………………………………………………………
Access Plan:
-----------
Total Cost: 12307.2
Query Degree: 1
Rows
RETURN
( 1)
Cost
I/O
|
1
GRPBY
( 2)
12307.2
9956
|
32
DTQ
( 3)
12307.2
9956
|
1
GRPBY
( 4)
12307.2
9956
|
8.78076e+06
TBSCAN
( 5)
11728.2
9956
|
8.78076e+06
TABLE: DB2INST1
TESTFENGCOMPRESS
Q1
Plan Details:
-------------
1) RETURN: (Return Result)
Cumulative Total Cost: 12307.2
Cumulative CPU Cost: 8.99552e+09
Cumulative I/O Cost: 9956
Cumulative Re-Total Cost: 1522.48
Cumulative Re-CPU Cost: 5.77295e+09
Cumulative Re-I/O Cost: 0
Cumulative First Row Cost: 12307.2
Cumulative Comm Cost: 32.3085
Cumulative First Comm Cost: 0
Estimated Bufferpool Buffers: 9956
压缩表:
Total Cost:12307.2 ,IO Cost:9956,CPU Cost:8.99552e+09
三、压缩测试总结
1、表压缩前后占用存储空间
|
NPAGES |
FPAGES |
|
非压缩表 |
585440 |
585440 |
|
压缩表(insert) |
318560 |
318592 |
|
压缩表(load) |
318496 |
318592 |
|
压缩后表占用页面大小降至非压缩时表占用页面大小的一半左右,其中通过insert、load方式对压缩表入数据均可实现自动压缩,占用空间基本差不多。(585440-318560/585440=0.46,也就是说,压缩后表节省大概46%的存储空间。
2、表压缩前后全表查询成本
|
Total Cost |
IO Cost |
CPU Cost |
非压缩表 |
19803.8 |
18295 |
5.89E+09 |
压缩表 |
12307.2 |
9956 |
9.00E+09 |
可以看出,对表全表记录统计查询语句的总成本下降38%左右,压缩表比非压缩表的IO成本减少46%左右,因为占用的页面少了,IO的成本随之降低。但语句的CPU成本比非压缩表要增加了34%左右(此语句CPUcost是DB2中语句执行的CPU使用度量单位,和实际主机系统上CPU的使用率不是同等概念)。
3、总结
DB2 的行压缩为用户数据的存储和访问提供了更加有效和便捷的方式。通过使用行压缩技术,一方面可以节省存储成本,另一方面对于 I/O 是瓶颈的系统来说,由于压缩能使相同的数据页上存放了更多的用户数据,进而在数据访问时可以减少 I/O 次数,从而提高系统性能。然而,由于压缩和解压需要而外的 CPU 资源,根据实际应用的经验来看,一般启用表压缩后CPU使用率会增加5-10%左右,对于某些 CPU 作为瓶颈的系统而言,压缩或许不是很好的选择。由于咱们日常系统访问的表,并不一定都是这些压缩的表,对我们日常系统的查询访问性能基本不影响。
四、DB2行压缩技术参考资料
DB2 中的行压缩技术是基于字典的压缩。简单来说,就是将数据中较长的重复的模式(pattern)在存放时用较短的符号(symbol)来表示,记录模式和符号之间对应关系的数据结构即为字典。在未被压缩的数据中寻找模式,并且生成符号,最终组装成字典的过程称为创建字典;用户将一条记录(record)插入一个打开了压缩选项、并且已经有字典的表中,数据中的模式会被符号取代后存放到数据库中,该替代过程称为压缩;相对的,在读取已被压缩的数据时,记录中的符号会被还原成原始的模式,并返回给用户,该还原过程称为解压。压缩和解压过程对于用户来说是完全透明的。
记录的解压和压缩比较简单,都是由数据库自动来完成的;但是,相对来说,字典的创建比较复杂。创建字典主要就是寻找重复模式的过程,由于用户表的数据量有可能非常大,不可能在所有的数据里去寻找重复的模式,所以,DB2 通常会在所给的数据上进行等距采样(sampling),在采样出来的数据上寻找模式,并生成符号,最终组装成字典。
在行压缩技术刚刚推出的 V9.1 版本中,字典必须通过用户运行带有重置字典(RESETDICTIONARY)选项的的离线表重组(不带有 INPLACE 选项的 REORG TABLE 命令)来创建,在 V9.5 之后的版本,除了离线表重组,还可以使用自动字典(Automatic Dictionary Creation,ADC)创建来建立字典。
1、使用离线表重组来创建字典(DB2 V9.1)
在用户使用 DB2 V9.5 之前的版本(即 V9.1)压缩表时,通常的流程是这样的:
1. 创建表:使用带有 COMPRESS YES 字句的 CREATE TABLE 语句创建表,打开压缩选项:
清单 1. 启用行压缩的建表语句
/home/xujing->db2 CREATE TABLE TAB1(C1 INT, C2 CHAR(100), C3 CHAR(50)) COMPRESS YES" DB20000I The SQL command completed successfully. |
2. 导入数据(该操作可能是 insert/export/load 等任何一种或几种,以下提到的“数据导入”,皆指此类操作)
3. 运行 REORG TABLE TABNAME RESETDICTIONARY 命令重组表,同时建立字典,并压缩数据:
清单 2. 带有 RESETDICTIONARY 字句的表重组命令
/home/xujing->db2 "REORG TABLE TAB1 RESETDICTIONARY" DB20000I The REORG command completed successfully. |
由于在数据被导入到表中的时候,字典并不存在,所以此时这些数据并未被压缩;而只有在数据完全存放在 DB2 中了之后,才通过离线表重组来创建字典并压缩这些已有数据。在运行 REORG TABLE RESETDICTIONARY 命令创建字典时,DB2 首先会对表做一次全表扫描(full scan),在扫描的同时,为创建字典进行采样;全表扫描结束后,采样也完成了,此时在所采样出来的样本数据上寻找重复模式,生成符号,最终组装成字典,并将字典以记录的形式存放在表中。有了字典之后,使用该字典从头开始一条一条压缩表中的数据,将压缩之后的记录逐条整齐地放回表中。
由于建立字典所需的样本数据是通过全表扫描得到的,该样本能够较好的体现全表的数据冗余特征,因此该字典是最优的,即通过这种方式能获得行压缩的最大压缩率。同时,这一方式的缺点也很明显,表重组本身是一个开销比较大的操作,需要的时间比较长,同时,因为需要创建字典,比单纯的表重组多了一次全表扫描,这使得该类表重组所需时间更长;而且在重组过程中,表不能进行写操作(这就是此类表重组被称为“离线表重组”(offline table reorganization)的原因,相对于在线表重组(online table reorganization,即带有 INPLACE 选项的 REORG 命令))。
要对已有表启用行压缩,需要使用带有 COMPRESS YES 字句的 ALTER TABLE 语句来修改表,启用压缩特性,随后,类似的,使用 REORG TABLE TABNAME RESETDICTIONARY 命令重组表,建立字典并压缩已有数据。
/home/xujing->db2 ALTER TABLE TAB1 COMPRESS YES DB20000I The SQL command completed successfully. |
由于离线表重组的不足,到了 DB2 V9.5 中,为了改进字典创建的效率,出现了一种折中的办法,即自动字典创建(Automatic Dictionary Creation,ADC)技术。
2、使用自动字典创建(ADC)来创建字典(DB2 V9.5及以上)
所谓的自动字典创建,就是在用户表数据不断增长的时候(数据可能来源于 insert/export/load 等操作),数据达到一定数量之后(默认是 2M),DB2 会在这些数据上自动创建字典,并且,在此之后再进入表的数据就能被这个字典压缩了,但在创建字典之前已有数据(即那 2M 数据)并不会被压缩。
相对于离线表重组创建字典,使用自动字典创建来压缩一个用户表,数据在进入表的同时就被压缩了,因此减少了一次离线表重组,效率得到了很大提升,即带有自动字典创建的数据导入,相对于数据导入 + 离线表重组的组合而言,所花费的时间减少很多。同时,数据导入过程也有可能因为压缩而获得效率的提升,即带有自动字典创建的数据导入,相对于未带有压缩功能的数据导入,花费时间有所减少,这是由于存储数据所需要的数据页的数量减少,写入数据所需要的时间就少了。
但是,由于离线表重组来说,由于创建字典所使用的样本只有该表开始的区区 2M,这些数据或许并不能很好地体现全表数据的冗余特征,因此,通过自动字典创建所得到的压缩率通常没有离线表重组高。
如果觉得我的文章对您有用,请点赞。您的支持将鼓励我继续创作!
赞1
添加新评论0 条评论