fengsh
作者fengsh2013-09-17 09:51
系统工程师, 电信行业

DB2 V9.7 表压缩测试

字数 41270阅读 3980评论 0赞 1

DB2 V9.7 表压缩测试

一、压缩表占用空间测试

       本次以总记录数2亿7千万记录的表pmid.T_DW_AL02_005_D1305为源表数据,分别以压缩和非压缩方式创建测试表,进行数据入表测试。

 

db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2 "select count(*) from pmid.T_DW_AL02_005_D1305"

1          
-----------
  276234830

  1 record(s) selected.

db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2 "select tabschema,tabname,STATS_TIME,npages,fpages,compression from syscat.tables  where tabname=upper('T_DW_AL02_005_D1305')"

TABSCHEMA                                                                                                                        TABNAME                                                                                                                          STATS_TIME                 NPAGES               FPAGES               COMPRESSION
-------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------- -------------------- -------------------- -----------
PMID                                                                                                                             T_DW_AL02_005_D1305                                                                                                              2013-09-12-14.39.45.303142               578770               578770 N          

  1 record(s) selected.

 

1、非压缩表页面占用测试:

首先,我们创建db2inst1. Testfengnocompress非压缩表:

 

db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2 connect to dssdb
   Database Connection Information
Database server        = DB2/AIX64 9.7.5
SQL authorization ID   = DB2INST1
Local database alias   = DSSDB
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2 "create table testfengnocompress like pmid.T_DW_AL02_005_D1305"
DB20000I  The SQL command completed successfully.
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>time db2 "insert into testfengnocompress select * from  pmid.T_DW_AL02_005_D1305"
DB20000I  The SQL command completed successfully.
real    1m47.87s
user    0m0.01s
sys     0m0.00s
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2 runstats on table db2inst1.TESTFENGNOCOMPRESS 
DB20000I  The RUNSTATS command completed successfully.
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2 "select tabschema,tabname,STATS_TIME,npages,fpages,compression from syscat.tables  where tabname=upper('testfengnocompress')" TABSCHEMA                                                                                                                        TABNAME                                                                                                                          STATS_TIME                 NPAGES               FPAGES               COMPRESSION
-------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------- -------------------- -------------------- -----------
DB2INST1                                                                                                                         TESTFENGNOCOMPRESS                                                                                                               2013-09-12-15.39.47.945832              
585440               585440 N         
  1 record(s) selected.
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>time db2 "select count(*) from db2inst1.TESTFENGNOCOMPRESS"
1         
-----------
  276234830
  1 record(s) selected.
real    0m1.65s
user    0m0.00s
sys     0m0.00s

 

我们看到,和pmid.T_DW_AL02_005_D1305表相同记录数新建的非压缩表,占用585440个页面。

2、压缩表页面占用测试

创建db2inst1.TESTFENGCOMPRESS压缩表,启用压缩属性:

 

db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2 create table testfengcompress like pmid.T_DW_AL02_005_D1305
DB20000I  The SQL command completed successfully.
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2 alter table testfengcompress compress yes
DB20000I  The SQL command completed successfully.

db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2 runstats on table db2inst1.TESTFENGCOMPRESS
DB20000I  The RUNSTATS command completed successfully.
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2 "select tabschema,tabname,STATS_TIME,npages,fpages,compression from syscat.tables  where tabname=upper('testfengcompress')"
TABSCHEMA                                                                                                                        TABNAME                                                                                                                          STATS_TIME                 NPAGES               FPAGES               COMPRESSION
-------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------- -------------------- -------------------- -----------
DB2INST1                                                                                                                         TESTFENGCOMPRESS                                                                                                                 2013-09-12-15.11.52.463884                    0                   32 R         
  1 record(s) selected.

1insert数据入测试表

db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>time db2 "insert into db2inst1.TESTFENGCOMPRESS select * from pmid.T_DW_AL02_005_D1305"
DB20000I  The SQL command completed successfully.
real    2m0.56s
user    0m0.01s
sys     0m0.01s

db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2 runstats on table db2inst1.TESTFENGCOMPRESS 
DB20000I  The RUNSTATS command completed successfully.
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2 "select tabschema,tabname,STATS_TIME,npages,fpages,compression from syscat.tables  where tabname=upper('testfengcompress')"
TABSCHEMA                                                                                                                        TABNAME                                                                                                                          STATS_TIME                 NPAGES               FPAGES               COMPRESSION
-------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------- -------------------- -------------------- -----------
DB2INST1                                                                                                                         TESTFENGCOMPRESS                                                                                                                 2013-09-12-15.49.49.971975              
318560               318592 R         
  1 record(s) selected.
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>time db2 "select count(*) from db2inst1.TESTFENGCOMPRESS"                                                                  
1         
-----------
  276234830
  1 record(s) selected.
real    0m1.75s
user    0m0.00s
sys     0m0.00s

 

根据insert到压缩测试表的统计信息,我们看到,压缩测试表,总共占用318560个页面。

2load数据入压缩测试表

db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2 "truncate table db2inst1.TESTFENGCOMPRESS immediate"
DB20000I  The SQL command completed successfully.
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2 "declare c1 cursor for select * from pmid.T_DW_AL02_005_D1305"
DB20000I  The SQL command completed successfully.
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>                                                                 
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>time db2 "load from c1 of cursor insert into db2inst1.testfengcompress"
  Agent Type     Node     SQL Code     Result
______________________________________________________________________________
  LOAD           000      +00000000    Success.
…………………………………………………………………………

 

…………………………………………………………………………
______________________________________________________________________________
  PARTITION      001      +00000000    Success.
______________________________________________________________________________
  RESULTS:       32 of 32 LOADs completed successfully.
______________________________________________________________________________

Summary of Partitioning Agents:
Rows Read                   = 276234830
Rows Rejected               = 0
Rows Partitioned            = 276234830

Summary of LOAD Agents:
Number of rows read         = 276234830
Number of rows skipped      = 0
Number of rows loaded       = 276234830
Number of rows rejected     = 0
Number of rows deleted      = 0
Number of rows committed    = 276234830

real    4m32.67s
user    0m0.01s
sys     0m0.01s
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2 runstats on table db2inst1.TESTFENGCOMPRESS                                                                                      
DB20000I  The RUNSTATS command completed successfully.
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2 "select tabschema,tabname,STATS_TIME,npages,fpages,compression from syscat.tables  where tabname=upper('testfengcompress')"      

TABSCHEMA                                                                                                                        TABNAME                                                                                                                          STATS_TIME                 NPAGES               FPAGES               COMPRESSION
-------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------- -------------------- -------------------- -----------
DB2INST1                                                                                                                         TESTFENGCOMPRESS                                                                                                                 2013-09-12-15.32.05.217527              
318496               318592 R         

  1 record(s) selected.

db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>time db2 "select count(*) from db2inst1.TESTFENGCOMPRESS"
1         
-----------
  276234830
  1 record(s) selected.

real    0m9.51s
user    0m0.00s
sys     0m0.01s

根据load数据到压缩测试表的统计信息,我们看到,压缩测试表,总共占用318496个页面。

不管使用load,还是insert方式,同样记录入启用压缩属性的表,占用页面数基本相差不大。

 

我们看到,和pmid.T_DW_AL02_005_D1305表相同记录数新建的非压缩表,占用585440个页面。

二、压缩、非压缩表全表扫描查询IO对比

我们通过查询表总的记录数来对压缩表、非压缩表进行全表扫描查询的IO成本进行对比。

1、     非压缩表全扫COST

db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2 set current explain mode explain
DB20000I  The SQL command completed successfully.
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh> db2 -v "select count(*) from db2inst1.TESTFENGNOCOMPRESS"                                                             
select count(*) from db2inst1.TESTFENGNOCOMPRESS
SQL0217W  The statement was not executed as only Explain information requests 
are being processed.  SQLSTATE=01604
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2 set current explain mode no
DB20000I  The SQL command completed successfully.
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2exfmt -d dssdb -g TIC -w -1 -n % -# 0 -s % -o no_compress.out
…………………………………………………………………………………………

Access Plan:
-----------
        Total Cost:             19803.8
        Query Degree:           1

        Rows 
       RETURN
       (   1)
        Cost 
         I/O 
         |
          1 
       GRPBY 
 
      (   2)
       19803.8 
        18295 

         |
         32 
       DTQ   
       (   3)
       19803.8 
        18295 
 
        |
          1 
       GRPBY 
       (   4)
       19803.7 
        18295 
         |
     8.78076e+06 
       TBSCAN
       (   5)
       19224.8 
        18295 
         |
     8.78076e+06 
   TABLE: DB2INST1
TESTFENGNOCOMPRESS
         Q1


Plan Details:
-------------
        1) RETURN: (Return Result)
                Cumulative Total Cost:          19803.8
                Cumulative CPU Cost:            5.89099e+09
                Cumulative I/O Cost:            18295

                Cumulative Re-Total Cost:       1523.81
                Cumulative Re-CPU Cost:         5.77801e+09
                Cumulative Re-I/O Cost:         0
                Cumulative First Row Cost:      19803.8
                Cumulative Comm Cost:           32.3085
                Cumulative First Comm Cost:     0
                Estimated Bufferpool Buffers:   18295

 

非压缩表:

Total Cost19803.8 IO Cost18295 CPU Cost5.89099e+09

2、     压缩表全扫COST

db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2 set current explain mode explain
DB20000I  The SQL command completed successfully.
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh> db2 -v "select count(*) from db2inst1.TESTFENGCOMPRESS"
select count(*) from db2inst1.TESTFENGCOMPRESS
SQL0217W  The statement was not executed as only Explain information requests 
are being processed.  SQLSTATE=01604

db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2 set current explain mode no 
DB20000I  The SQL command completed successfully.
db2inst1@bhq_dw_1:/db2home/db2inst1/fengsh>db2exfmt -d dssdb -g TIC -w -1 -n % -# 0 -s % -o compress.out   
…………………………………………………………………………………………

Access Plan:
-----------
        Total Cost:             12307.2
        Query Degree:           1

       Rows 
      RETURN
      (   1)
       Cost 
        I/O 
        |
         1 
      GRPBY 
      (   2)
      12307.2 
       9956 
        |

        32 
      DTQ   
      (   3)
      12307.2 
       9956 
        |
         1 
      GRPBY 
      (   4)
      12307.2 
       9956 
        |
    8.78076e+06 
      TBSCAN
      (   5)
      11728.2 
       9956 
        |
    8.78076e+06 
  TABLE: DB2INST1
TESTFENGCOMPRESS
        Q1

 

Plan Details:
-------------
        1) RETURN: (Return Result)
                Cumulative Total Cost:          12307.2
                Cumulative CPU Cost:            8.99552e+09
                Cumulative I/O Cost:            9956

                Cumulative Re-Total Cost:       1522.48
                Cumulative Re-CPU Cost:         5.77295e+09
                Cumulative Re-I/O Cost:         0
                Cumulative First Row Cost:      12307.2
                Cumulative Comm Cost:           32.3085
                Cumulative First Comm Cost:     0
                Estimated Bufferpool Buffers:   9956

 

 

压缩表:

Total Cost12307.2 IO Cost9956CPU Cost8.99552e+09

 

三、压缩测试总结

1、表压缩前后占用存储空间

 

NPAGES

FPAGES

 

非压缩表

585440

585440

 

压缩表(insert)

318560

318592

 

压缩表(load)

318496

318592

 

 

       压缩后表占用页面大小降至非压缩时表占用页面大小的一半左右,其中通过insertload方式对压缩表入数据均可实现自动压缩,占用空间基本差不多。(585440-318560/585440=0.46,也就是说,压缩后表节省大概46%的存储空间。

 

2、表压缩前后全表查询成本

 

Total Cost

IO Cost

CPU Cost

非压缩表

19803.8

18295

5.89E+09

压缩表

12307.2

9956

9.00E+09

 

       可以看出,对表全表记录统计查询语句的总成本下降38%左右,压缩表比非压缩表的IO成本减少46%左右,因为占用的页面少了,IO的成本随之降低。但语句的CPU成本比非压缩表要增加了34%左右(此语句CPUcostDB2中语句执行的CPU使用度量单位,和实际主机系统上CPU的使用率不是同等概念)。

 

3、总结

 

       DB2 的行压缩为用户数据的存储和访问提供了更加有效和便捷的方式。通过使用行压缩技术,一方面可以节省存储成本,另一方面对于 I/O 是瓶颈的系统来说,由于压缩能使相同的数据页上存放了更多的用户数据,进而在数据访问时可以减少 I/O 次数,从而提高系统性能。然而,由于压缩和解压需要而外的 CPU 资源,根据实际应用的经验来看,一般启用表压缩后CPU使用率会增加5-10%左右,对于某些 CPU 作为瓶颈的系统而言,压缩或许不是很好的选择。由于咱们日常系统访问的表,并不一定都是这些压缩的表,对我们日常系统的查询访问性能基本不影响。

 

 

四、DB2行压缩技术参考资料

       DB2 中的行压缩技术是基于字典的压缩。简单来说,就是将数据中较长的重复的模式(pattern)在存放时用较短的符号(symbol)来表示,记录模式和符号之间对应关系的数据结构即为字典。在未被压缩的数据中寻找模式,并且生成符号,最终组装成字典的过程称为创建字典;用户将一条记录(record)插入一个打开了压缩选项、并且已经有字典的表中,数据中的模式会被符号取代后存放到数据库中,该替代过程称为压缩;相对的,在读取已被压缩的数据时,记录中的符号会被还原成原始的模式,并返回给用户,该还原过程称为解压。压缩和解压过程对于用户来说是完全透明的。

       记录的解压和压缩比较简单,都是由数据库自动来完成的;但是,相对来说,字典的创建比较复杂。创建字典主要就是寻找重复模式的过程,由于用户表的数据量有可能非常大,不可能在所有的数据里去寻找重复的模式,所以,DB2 通常会在所给的数据上进行等距采样(sampling),在采样出来的数据上寻找模式,并生成符号,最终组装成字典。

       在行压缩技术刚刚推出的 V9.1 版本中,字典必须通过用户运行带有重置字典(RESETDICTIONARY)选项的的离线表重组(不带有 INPLACE 选项的 REORG TABLE 命令)来创建,在 V9.5 之后的版本,除了离线表重组,还可以使用自动字典(Automatic Dictionary CreationADC)创建来建立字典。

1、使用离线表重组来创建字典DB2 V9.1

在用户使用 DB2 V9.5 之前的版本(即 V9.1)压缩表时,通常的流程是这样的:

1.     创建表:使用带有 COMPRESS YES 字句的 CREATE TABLE 语句创建表,打开压缩选项:
清单 1. 启用行压缩的建表语句

 /home/xujing->db2  CREATE TABLE TAB1(C1 INT, C2 CHAR(100), C3 CHAR(50)) COMPRESS YES"

 DB20000I  The SQL command completed successfully.

2.     导入数据(该操作可能是 insert/export/load 等任何一种或几种,以下提到的“数据导入”,皆指此类操作)

3.     运行 REORG TABLE TABNAME RESETDICTIONARY 命令重组表,同时建立字典,并压缩数据:
清单 2. 带有 RESETDICTIONARY 字句的表重组命令

 /home/xujing->db2 "REORG TABLE TAB1 RESETDICTIONARY"   

 DB20000I  The REORG command completed successfully.

       由于在数据被导入到表中的时候,字典并不存在,所以此时这些数据并未被压缩;而只有在数据完全存放在 DB2 中了之后,才通过离线表重组来创建字典并压缩这些已有数据。在运行 REORG TABLE RESETDICTIONARY 命令创建字典时,DB2 首先会对表做一次全表扫描(full scan),在扫描的同时,为创建字典进行采样;全表扫描结束后,采样也完成了,此时在所采样出来的样本数据上寻找重复模式,生成符号,最终组装成字典,并将字典以记录的形式存放在表中。有了字典之后,使用该字典从头开始一条一条压缩表中的数据,将压缩之后的记录逐条整齐地放回表中。

       由于建立字典所需的样本数据是通过全表扫描得到的,该样本能够较好的体现全表的数据冗余特征,因此该字典是最优的,即通过这种方式能获得行压缩的最大压缩率。同时,这一方式的缺点也很明显,表重组本身是一个开销比较大的操作,需要的时间比较长,同时,因为需要创建字典,比单纯的表重组多了一次全表扫描,这使得该类表重组所需时间更长;而且在重组过程中,表不能进行写操作(这就是此类表重组被称为“离线表重组”(offline table reorganization)的原因,相对于在线表重组(online table reorganization,即带有 INPLACE 选项的 REORG 命令))。

       要对已有表启用行压缩,需要使用带有 COMPRESS YES 字句的 ALTER TABLE 语句来修改表,启用压缩特性,随后,类似的,使用 REORG TABLE TABNAME RESETDICTIONARY 命令重组表,建立字典并压缩已有数据。


清单 3. 启用行压缩的修改表语句

                                                            

 /home/xujing->db2 ALTER TABLE TAB1 COMPRESS YES

 DB20000I  The SQL command completed successfully.

 

       由于离线表重组的不足,到了 DB2 V9.5 中,为了改进字典创建的效率,出现了一种折中的办法,即自动字典创建(Automatic Dictionary CreationADC)技术。

2、使用自动字典创建(ADC)来创建字典DB2 V9.5及以上)

       所谓的自动字典创建,就是在用户表数据不断增长的时候(数据可能来源于 insert/export/load 等操作),数据达到一定数量之后(默认是 2M),DB2 会在这些数据上自动创建字典,并且,在此之后再进入表的数据就能被这个字典压缩了,但在创建字典之前已有数据(即那 2M 数据)并不会被压缩。

       相对于离线表重组创建字典,使用自动字典创建来压缩一个用户表,数据在进入表的同时就被压缩了,因此减少了一次离线表重组,效率得到了很大提升,即带有自动字典创建的数据导入,相对于数据导入 + 离线表重组的组合而言,所花费的时间减少很多。同时,数据导入过程也有可能因为压缩而获得效率的提升,即带有自动字典创建的数据导入,相对于未带有压缩功能的数据导入,花费时间有所减少,这是由于存储数据所需要的数据页的数量减少,写入数据所需要的时间就少了。

       但是,由于离线表重组来说,由于创建字典所使用的样本只有该表开始的区区 2M,这些数据或许并不能很好地体现全表数据的冗余特征,因此,通过自动字典创建所得到的压缩率通常没有离线表重组高

 

如果觉得我的文章对您有用,请点赞。您的支持将鼓励我继续创作!

1

添加新评论0 条评论

Ctrl+Enter 发表

作者其他文章

相关文章

相关问题

相关资料

X社区推广