shifo0409
作者shifo04092019-04-11 09:06
信息分析/架构师, 日本ITS会社

ORACLE 位图索引原理-适用于ORACLE 11 R2/12c

字数 14464阅读 1407评论 0赞 1

ORACLE的位图索引是最适用于数据仓库或决策支持系统(DSS)的索引模型。数据仓库中常见的一种模式是星型模式。其中包括中央事实表和数个相关的维度表。前者包含特定主题的全部信息。如客户的收支,后者包含特定维度比如说时间维度或者地理维度的相关引用类型数据。在星型模式中,维度表是父表,而中央事实表是子表。以下这就是个简单的星型模式示例:

3w0mwtmywu94

3w0mwtmywu94

在ORACLE数据中,专门针对星型模式进行了配置设计我们称之为星型转换。他是专门用来提高星型模式的查询性能。若是要使用星型转换,需要配置一些ORACLE的初始化参数。此外,为了使星型转换能够发生,最低限度使必须在星型模式的事实表的外键上设置位图索引。虽然,现在很多文章或者教程中已经特殊说明了不建议使用外键,但是对于一些强关系模型来说,设置外键还是事倍功半的。

那么哪些情况适用于位图索引呢?他的优缺点又是什么呢?我们看看下面的条目就一目了然了:

1 位图索引一般应用于低基数列(关于索引基数我们会在后面的文章中涉及)

2 位图索引最好用于数据仓库和DSS系统

3 位图索引最好用于星型模式 当然了雪花模型也是可以使用的 但是要考虑到雪花构型的花瓣复杂度

4 对于在索引列上有许多链接查询或者过滤查询 位图索引是十分高效的

5 对于启用了位图索引的表 一定一定要尽量减少或者避免DML操作尤其是UPDATE

6 如果你非要对表执行DML操作 ,应在对表进行更新之前删除位图索引,等DML操作完成之后再重新建立

7 如果要在分区表上执行DML操作,应该吧发生DML的分区的索引设置为不可用,并在DML活动完成之后重新创建分区索引。

总之就是折腾对不对?!对于OLTP(联机事务处理系统),使用位图索引是非常不合适的。因为在这里操作中,会有大量的锁产生。而使位图索引出现行锁定的问题。会极大的影响性能。

那么位图索引和我们上一篇文章说的B*传统索引究竟有多大的不同呢?请看图:

upzf8httsti6

upzf8httsti6

大家看到了吧!发没发现点小问题。

位图索引,顾名思义,肯定与“位”有关。在这里相比知道,计算机中的所有信息最终都是通过“位/bit”来运算, 而二进制位运算在计算机中是非常高效的。(你别告诉我你不知道原因?计算机就是一个典型的二进制模型)每一个二进制位都可以取值0或者1,而取值的确切含义是由具体的上下文环境决定的(参考计算机《计算机原理》一书)。在ORACLE位图索引中,每一个二进制位代表了某一行中索引列的取值情况。例如,上面标识颜色列的位图索引结构如下:

blue:1000100100010010100

green :0001010000100100000

red :0100000011000001001

yellow :0010001000001000010

在上面的位图结构中,存储了19条记录的分布情况,以blue为例,从左到右的第n个二进制位代表了第n条记录是否为blue,如果二进制位为1,代表true即是,0代表false即性别不是。以此类推。

大家都知道,在oracle中是根据rowid来定位记录的,因此,我们需要引入start rowid和end rowid,通过start rowid ,end rowid 和二进制位的偏移,我们就可以非常快速的计算出二进制位所代表的表记录rowid。位图索引的最终逻辑结构如下图:

r41a2eqtwgi8

r41a2eqtwgi8

位图索引的查询性能究竟怎么样?!

在很多资料中,都可以看到这样的论述:位图索引适合于 low distict cardinality(低基数)的列。实际上,对于high distinct cardinality (高基数)的列,位图索引的查询性能也是非常不错的。下面我们来验证这个结论。

以下实验环境使用的事ORCLE 11G R2 11.2.0.4版本:

首先我们创建两张表: emp_l和emp_h.
SQL> create table emp_l(empno number(10), ename varchar2(30), sal number(10));

表已创建。

Begin
For i in 1..1000000
Loop
Insert into emp_l

values(i, dbms_random.string('U',30), dbms_random.value(1000,7000));
 If mod(i, 10000) = 0 then
   Commit;
 End if;

End loop;
End;
/

PL/SQL 过程已成功完成。

SQL> create table emp_h as select / +append / * from emp_l order by dbms_random.random;
emp_h由于其记录是随机分布的,因此该表上索引的CLUSTERING_FACTOR要高一些。
我们首先看一下emp_l表等值查询情况下,索引的效率如何:
SQL> create bitmap index bm_normal on emp_l(empno);

索引已创建。

SQL> analyze table emp_l compute statistics for table for all indexes for all indexed columns;

表已分析。

SQL> select index_name,clustering_factor from user_indexes;

INDEX_NAME CLUSTERING_FACTOR


BM_NORMAL 1000000

SQL> set autot traceonly
SQL> select * from emp_l where empno=&empno;
输入 empno 的值: 1000
原值 1: select * from emp_l where empno=&empno
新值 1: select * from emp_l where empno=1000

执行计划

Plan hash value: 1526426521


| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

| 0 | SELECT STATEMENT | | 1 | 34 | 3 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID | emp_l | 1 | 34 | 3 (0)| 00:00:01 |
| 2 | BITMAP CONVERSION TO ROWIDS| | | | | |

|* 3 | BITMAP INDEX SINGLE VALUE | BM_NORMAL | | | | |

Predicate Information (identified by operation id):

3 - access("EMPNO"=1000)

统计信息

1 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
702 bytes sent via SQL*Net to client
520 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

SQL> select * from emp_l where empno=&empno;
输入 empno 的值: 2398
原值 1: select * from emp_l where empno=&empno
新值 1: select * from emp_l where empno=2398

执行计划

Plan hash value: 1526426521


| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

| 0 | SELECT STATEMENT | | 1 | 34 | 3 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID | emp_l | 1 | 34 | 3 (0)| 00:00:01 |
| 2 | BITMAP CONVERSION TO ROWIDS| | | | | |

|* 3 | BITMAP INDEX SINGLE VALUE | BM_NORMAL | | | | |

Predicate Information (identified by operation id):

3 - access("EMPNO"=2398)

统计信息

1 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
703 bytes sent via SQL*Net to client
520 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

SQL> select * from emp_l where empno=&empno;
输入 empno 的值: 8545
原值 1: select * from emp_l where empno=&empno
新值 1: select * from emp_l where empno=8545

执行计划

Plan hash value: 1526426521


| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

| 0 | SELECT STATEMENT | | 1 | 34 | 3 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID | emp_l | 1 | 34 | 3 (0)| 00:00:01 |
| 2 | BITMAP CONVERSION TO ROWIDS| | | | | |

|* 3 | BITMAP INDEX SINGLE VALUE | BM_NORMAL | | | | |

Predicate Information (identified by operation id):

3 - access("EMPNO"=8545)

统计信息

1 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
703 bytes sent via SQL*Net to client
520 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

SQL> select * from emp_l where empno=&empno;
输入 empno 的值: 128444
原值 1: select * from emp_l where empno=&empno
新值 1: select * from emp_l where empno=128444

执行计划

Plan hash value: 1526426521


| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

| 0 | SELECT STATEMENT | | 1 | 34 | 3 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID | emp_l | 1 | 34 | 3 (0)| 00:00:01 |
| 2 | BITMAP CONVERSION TO ROWIDS| | | | | |

|* 3 | BITMAP INDEX SINGLE VALUE | BM_NORMAL | | | | |

Predicate Information (identified by operation id):

3 - access("EMPNO"=128444)

统计信息

1 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
704 bytes sent via SQL*Net to client
520 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

SQL> drop index bm_normal;

索引已删除。

SQL> create index bt_normal on emp_l(empno);

索引已创建。

SQL> analyze table emp_l compute statistics for table for all indexes for all indexed columns;

表已分析。

SQL> select index_name,clustering_factor from user_indexes;

INDEX_NAME CLUSTERING_FACTOR


BT_NORMAL 6210
SYS_IL0000076897C00002$$
PK_EMP 1
PK_DEPT 1

SQL> set autot traceonly
SQL> select * from emp_l where empno=&empno;
输入 empno 的值: 1000
原值 1: select * from emp_l where empno=&empno
新值 1: select * from emp_l where empno=1000

执行计划

Plan hash value: 733975378


| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

| 0 | SELECT STATEMENT | | 1 | 34 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| emp_l | 1 | 34 | 4 (0)| 00:00:01 |

|* 2 | INDEX RANGE SCAN | BT_NORMAL | 1 | | 3 (0)| 00:00:01 |

Predicate Information (identified by operation id):

2 - access("EMPNO"=1000)

统计信息

1 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
702 bytes sent via SQL*Net to client
520 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

SQL> select * from emp_l where empno=&empno;
输入 empno 的值: 128444
原值 1: select * from emp_l where empno=&empno
新值 1: select * from emp_l where empno=128444

执行计划

Plan hash value: 733975378


| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

| 0 | SELECT STATEMENT | | 1 | 34 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| emp_l | 1 | 34 | 4 (0)| 00:00:01 |

|* 2 | INDEX RANGE SCAN | BT_NORMAL | 1 | | 3 (0)| 00:00:01 |

Predicate Information (identified by operation id):

2 - access("EMPNO"=128444)

统计信息

1 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
704 bytes sent via SQL*Net to client
520 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

SQL> select * from emp_l where empno=&empno;
输入 empno 的值: 2398
原值 1: select * from emp_l where empno=&empno
新值 1: select * from emp_l where empno=2398

执行计划

Plan hash value: 733975378


| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

| 0 | SELECT STATEMENT | | 1 | 34 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| emp_l | 1 | 34 | 4 (0)| 00:00:01 |

|* 2 | INDEX RANGE SCAN | BT_NORMAL | 1 | | 3 (0)| 00:00:01 |

Predicate Information (identified by operation id):

2 - access("EMPNO"=2398)

统计信息

1 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
703 bytes sent via SQL*Net to client
520 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
总结如下:
BITMAP EMPNO B-TREE
Consistent Reads Physical Reads Consistent Reads Physical Reads
5 0 1000 5 0
5 0 2398 5 0
5 0 8545 5 0
5 0 98008 5 0
5 0 85342 5 0
5 0 128444 5 0
5 0 858 5 0
对emp_h表进行实验,得出的结果与之类似,这里不再獒述。从这里可以看出,在唯一列上的等值查询,位图索引与btree索引的效率相当。

下面,我们在针对范围查询来进行测试。

SQL> create bitmap index bm_random on emp_h(empno);

索引已创建。

SQL> analyze table emp_h compute statistics for table for all indexes for all columns;

表已分析。

SQL> select index_name,clustering_factor from user_indexes;

INDEX_NAME CLUSTERING_FACTOR


BM_RANDOM 1000000

SQL> set autot traceonly
SQL> select * from emp_h where empno between &range1 and &range2;
输入 range1 的值: 1
输入 range2 的值: 2300
原值 1: select * from emp_h where empno between &range1 and &range2
新值 1: select * from emp_h where empno between 1 and 2300

已选择2300行。

执行计划

Plan hash value: 811843605


| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

| 0 | SELECT STATEMENT | | 2299 | 85063 | 418 (1)| 00:00:06 |
| 1 | TABLE ACCESS BY INDEX ROWID | emp_h | 2299 | 85063 | 418 (1)| 00:00:06 |
| 2 | BITMAP CONVERSION TO ROWIDS| | | | | |

|* 3 | BITMAP INDEX RANGE SCAN | BM_RANDOM | | | | |

Predicate Information (identified by operation id):

3 - access("EMPNO">=1 AND "EMPNO"<=2300)

统计信息

1 recursive calls
0 db block gets
2463 consistent gets
0 physical reads
0 redo size
130225 bytes sent via SQL*Net to client
2203 bytes received via SQL*Net from client
155 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2300 rows processed

SQL> select * from emp_h where empno between &range1 and &range2;
输入 range1 的值: 8
输入 range2 的值: 1980
原值 1: select * from emp_h where empno between &range1 and &range2
新值 1: select * from emp_h where empno between 8 and 1980

已选择1973行。

执行计划

Plan hash value: 811843605


| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

| 0 | SELECT STATEMENT | | 1972 | 72964 | 366 (0)| 00:00:05 |
| 1 | TABLE ACCESS BY INDEX ROWID | emp_h | 1972 | 72964 | 366 (0)| 00:00:05 |
| 2 | BITMAP CONVERSION TO ROWIDS| | | | | |

|* 3 | BITMAP INDEX RANGE SCAN | BM_RANDOM | | | | |

Predicate Information (identified by operation id):

3 - access("EMPNO">=8 AND "EMPNO"<=1980)

统计信息

1 recursive calls
0 db block gets
2114 consistent gets
0 physical reads
0 redo size
111758 bytes sent via SQL*Net to client
1961 bytes received via SQL*Net from client
133 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1973 rows processed

SQL> select * from emp_h where empno between &range1 and &range2;
输入 range1 的值: 28888
输入 range2 的值: 31850
原值 1: select * from emp_h where empno between &range1 and &range2
新值 1: select * from emp_h where empno between 28888 and 31850

已选择2963行。

执行计划

Plan hash value: 811843605


| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

| 0 | SELECT STATEMENT | | 2962 | 107K| 513 (0)| 00:00:07 |
| 1 | TABLE ACCESS BY INDEX ROWID | emp_h | 2962 | 107K| 513 (0)| 00:00:07 |
| 2 | BITMAP CONVERSION TO ROWIDS| | | | | |

|* 3 | BITMAP INDEX RANGE SCAN | BM_RANDOM | | | | |

Predicate Information (identified by operation id):

3 - access("EMPNO">=28888 AND "EMPNO"<=31850)

统计信息

1 recursive calls
0 db block gets
3172 consistent gets
0 physical reads
0 redo size
170625 bytes sent via SQL*Net to client
2687 bytes received via SQL*Net from client
199 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2963 rows processed

SQL> drop index bm_random;

索引已删除。

SQL> create index bt_random on emp_h(empno);

索引已创建。

SQL> analyze table emp_h compute statistics for table for all indexes for all columns;

表已分析。

SQL> set autot off
SQL> select index_name,clustering_factor from user_indexes;

INDEX_NAME CLUSTERING_FACTOR


BT_RANDOM 999834
SQL> set autot traceonly
SQL> select * from emp_h where empno between &range1 and &range2;
输入 range1 的值: 1
输入 range2 的值: 2300
原值 1: select * from emp_h where empno between &range1 and &range2
新值 1: select * from emp_h where empno between 1 and 2300

已选择2300行。

执行计划

Plan hash value: 731629521


| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

| 0 | SELECT STATEMENT | | 2299 | 85063 | 1735 (1)| 00:00:21 |

|* 1 | TABLE ACCESS FULL| emp_h | 2299 | 85063 | 1735 (1)| 00:00:21 |

Predicate Information (identified by operation id):

1 - filter("EMPNO"<=2300 AND "EMPNO">=1)

统计信息

1 recursive calls
0 db block gets
6410 consistent gets
0 physical reads
0 redo size
121081 bytes sent via SQL*Net to client
2203 bytes received via SQL*Net from client
155 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2300 rows processed

SQL> select * from emp_h where empno between &range1 and &range2;
输入 range1 的值: 8
输入 range2 的值: 1980
原值 1: select * from emp_h where empno between &range1 and &range2
新值 1: select * from emp_h where empno between 8 and 1980

已选择1973行。

执行计划

Plan hash value: 731629521


| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

| 0 | SELECT STATEMENT | | 1972 | 72964 | 1735 (1)| 00:00:21 |

|* 1 | TABLE ACCESS FULL| emp_h | 1972 | 72964 | 1735 (1)| 00:00:21 |

Predicate Information (identified by operation id):

1 - filter("EMPNO"<=1980 AND "EMPNO">=8)

统计信息

1 recursive calls
0 db block gets
6388 consistent gets
0 physical reads
0 redo size
103922 bytes sent via SQL*Net to client
1961 bytes received via SQL*Net from client
133 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1973 rows processed

归纳如下,
BITMAP EMPNO (Range) B-TREE
Consistent Reads Physical Reads Consistent Reads Physical Reads
2463 0 1-2300 6410 0
2114 0 8-1980 6388 0
2572 0 1850-4250 6418 0
3172 0 28888-31850 6456 0
2762 0 82900-85478 6431 0
7254 0 984888-1000000 7254 0

从这里可以看出,位图索引要优于btree索引,这是因为btree索引的cluster factor 较大,从而优化器选择了全表扫描。即便在emp_l 表下,即clustering factor较小时,位图索引btree索引相当的。因此在distinct cardinality 较大的情况下,范围扫描的效率位图索引也是不逊色与B*索引。

如果觉得我的文章对您有用,请点赞。您的支持将鼓励我继续创作!

1

添加新评论0 条评论

Ctrl+Enter 发表

作者其他文章

相关文章

相关问题

相关资料

X社区推广