db2china2
作者db2china2·2015-07-07 14:34
技术经理·DB2咨询服务

THREE-STAR INDEX—THE IDEAL INDEX FOR A SELECT最佳理论

字数 9845阅读 951评论 3赞 0

THREE-STAR INDEX—THE IDEAL INDEX FOR A SELECT最佳理论:

Deriving the Ideal Index for a SELECT

THREE-STAR INDEX—THE IDEAL INDEX FOR A SELECT

DECLARE CURSOR41 CURSOR FOR

SELECT CNO, FNAME

FROM CUST

WHERE LNAME = NAME

AND

CITY = :CITY

ORDER BY FNAME

An index deserves the first star if the index rows relevant to the SELECT are next

to each other—as in Figure 4.2—or at least as close to each other as possible.

This minimizes the thickness of the index slice that must be scanned.

The second star is given if the index rows are in the right order for the

SELECT—as in Figure 4.2. This eliminates sorting.

If the index rows contain all the columns referred to by the SELECT—as in

Figure 4.2—the index is given the third star. This eliminates table access: The

access path is index only.

Of these three stars, the third is often the most important one. Leaving a

column out of an index may lead to many slow random reads from the disk drive.

An index that has at least the third star is a fat index for the given SELECT.

A Fat Index

A fat index is an index that has at least the third star. It contains all the table

columns referred to by the SELECT and so enables index only access.

To Qualify for the First Star

Pick the columns from all equal predicates (WHERE COL = . . .). Make these

the first columns of the index—in any order. For CURSOR41, the three-star

index will begin with columns LNAME, CITY or CITY, LNAME. In both cases

the index slice that must be scanned will be as thin as possible.

To Qualify for the Second Star

Add the ORDER BY columns. Do not change the order of these columns, but

ignore columns that were already picked in step 1. For example, if CURSOR41

had redundant columns in the ORDER BY, say ORDER BY LNAME, FNAME

or ORDER BY FNAME, CITY, only FNAME would be added in this step.

When FNAME is the third index column, the result table will be in the right

order without sorting. The first FETCH call will return the row with the smallest

FNAME value.

To Qualify for the Third Star

Add all the remaining columns from the SELECT statement. The order of the

columns added in this step has no impact on the performance of the SELECT,

but the cost of updates should be reduced by placing volatile columns at the end.

Now the index contains all the columns required for an index-only access path.

The resulting three-star index is:

(LNAME, CITY, FNAME, CNO) or (CITY, LNAME, FNAME, CNO)

? The WHERE clause does not contain range predicates (BETWEEN, >,>=, etc.).

? The FROM clause refers to a single table only.

? None of the predicates seems too difficult for the optimizer.

Range Predicates and a Three-Star Index

This example, SQL 4.3, requires the same information as before, but the required

customers are now within a range of values.

DECLARE CURSOR43 CURSOR FOR

SELECT CNO, FNAME

FROM CUST

WHERE LNAME BETWEEN NAME1 AND NAME2

AND

CITY = :CITY

ORDER BY FNAME

Let’s try to design a three-star index for this CURSOR. Much of the reasoning

is the same as that for CURSOR41, but the substitution of the BETWEEN

predicate for the = predicate will have significant implications. We will consider

the three stars in reverse order, which arguably reflects the level of difficulty of

understanding.

First the easy (albeit very important) star, the third star. As before, the

index will qualify for the third star by ensuring that all the columns specified

in the SELECT are in the index. No access to the table will be required and so

synchronous reads will not cause a problem.

The index will qualify for the second star by adding the ORDER BY column,

but this will only be true if it is added before the BETWEEN predicate column

LNAME, for example, with index (CITY, FNAME, LNAME). There is only

a single CITY value (= predicate), and so the result set will be provided in

FNAME sequence simply by using this index without the need for a sort. But

if the ORDER BY column is added after the BETWEEN predicate column

LNAME, for example, with index (CITY, LNAME, FNAME), the index rows

will not be in FNAME sequence and so a sort will be required; this can be

seen in Figure 4.3. Therefore to qualify for the second star, the FNAME must be

before the BETWEEN predicate column LNAME as in index (FNAME, . . .) or

index (CITY, FNAME, . . .).

Regarding the first star, if CITY is the first column of the index, we will have

a relatively thin slice to scan (MC = 1), depending on the CITY filter factor. But

the index slice will be even thinner with the index (CITY, LNAME, . . .); now

with two matching columns we only touch the index rows we really need. But

to do this and so benefit from a very thin index slice, no other column (such as

FNAME) can come between them.

So how many stars will our ideal index have? It can certainly have the third

star, but, as we have just seen, we can have either the first star or the second

star, but not both! In other words, we can either:

? Avoid a sort—by having the second star.

or

? Have the thinnest possible index slice, which will minimize not only the

number of index entries to be processed but also any subsequent processing,

in particular synchronous reads for the table rows—by having the

first star.

The presence of the BETWEEN predicate in our example, or indeed any other

range predicate, means we cannot have both; we cannot have a three-star index.

This implies therefore, that we have to make a choice between the first

and second stars. Usually this isn’t a difficult choice since, as we will see in

Chapter 6, the first star tends to be the much more important star, although this

is not always the case as we will see at that time.

Let’s consider for a moment an index (LNAME, CITY, . . .) as shown in

Figure 4.4. LNAME is a range predicate, which means it is the last column that

can participate in the index matching process, as we saw in Chapter 3. The equal

predicate CITY would not be used in the matching process. The result of this

would be only one matching column—a much thicker index slice than we would

have with the index (CITY, LNAME, . . .).

ALGORITHM TO DERIVE THE BEST INDEX FOR A SELECT

The ideal index will be a three-star index for the reasons discussed above. We

have seen, however, that with a range predicate this may not be possible; we may

have to sacrifice (probably) the second star in order to achieve a thin index slice

(the first star). The best index will then only have two stars. This is why we are

careful to distinguish between ideal and best. The ideal in this case would just

not be possible. Taking this into account, we can formulate the rules for creating

the best (perhaps not ideal) index under all conditions. The result might have

three stars or two stars.

First a fat index (third star) is designed, where the scanned index slice is

as thin as possible (first star). If this index does not imply a sort (second star),

it is a three-star index. Otherwise it will only be a two-star index, having sacrificed

the second star. Another candidate should then be derived that eliminates

sorting, thereby having the second star but having sacrificed the first star. One

of the resulting two star indexes will then be the best possible index for the

given SELECT.

The algorithm to derive the best index for a SELECT follows.

Candidate A:

1. Pick the equal predicate columns that are not too difficult for the optimizer

(discussed in Chapter 6). These are the first index columns—in any order.

2. The next index column is the column in the most selective range predicate,

if any. Most selective means the lowest filter factor with the worst

input. Only consider the range predicates that are not too difficult for the

optimizer (discussed in Chapter 6).

3. Add the ORDER BY columns in the correct order (and with DESC if

an ORDER BY column has DESC). Ignore columns that were already

picked in step 1 or 2.

4. Add all remaining columns referred to by the SELECT statement, in any

order (but start with the nonvolatile columns).

Example: CURSOR43

Candidate A will be (CITY, LNAME, FNAME, CNO).

   Candidate A causes a sort for CURSOR43 because FNAME comes after the

range predicate column, LNAME.

Candidate B:

If candidate A leads to a sort with the given SELECT, candidate B is designed.

By definition, the second star is more important than the first star for candidate B.

1. Pick the equal predicate columns that are not too difficult for the optimizer.

These are the first index columns—in any order

2. Add the ORDER BY columns in the correct order (and with DESC if

an ORDER BY column has DESC). Ignore columns that were already

picked in step 1.

3. Add all remaining columns referred to by the SELECT statement, in any

order (but start with the nonvolatile columns).

If candidate A leads to a sort with the given SELECT, candidate B is designed.

By definition, the second star is more important than the first star for candidate B.

1. Pick the equal predicate columns that are not too difficult for the optimizer.

These are the first index columns—in any order

2. Add the ORDER BY columns in the correct order (and with DESC if

an ORDER BY column has DESC). Ignore columns that were already

picked in step 1.

3. Add all remaining columns referred to by the SELECT statement, in any

order (but start with the nonvolatile columns).

We now have two candidates for the best index, one with the first star,

one with the second star. To determine which will be the best index, we could

analyze the performance of each index as we did at the beginning of this chapter.

This would be quite time consuming, however, and Chapter 5 presents a simpler

method (the QUBE) for estimating which candidate will provide the faster access

path for the SELECT.

It is important to realize that all we have done so far is to design the ideal

or best index. Whether this would be practical or not, at this stage we are not

in a position to say.

如果觉得我的文章对您有用,请点赞。您的支持将鼓励我继续创作!

0

添加新评论3 条评论

db2china2db2china2技术经理DB2咨询服务
2015-07-10 12:30
哦,我Q:44973863.
db2china2db2china2技术经理DB2咨询服务
2015-07-10 12:24
有一本书,个人认为比较好的:Wiley_-_Relational_Database_Index_Design_and_the_Optimizers_-_DB2,_Oracle,_SQL_Server,.pdf,加QQ吧,发你。
atpeace331atpeace331数据库管理员银行
2015-07-10 09:48
能推荐一些相关书籍吗?
Ctrl+Enter 发表

作者其他文章

相关文章

相关问题

相关资料

X社区推广