IT咨询服务数据库 db2 9.5

THREE-STAR INDEX—THE IDEAL INDEX FOR A SELECT最佳理论

THREE-STAR INDEX—THE IDEAL INDEX FOR A SELECT最佳理论：
Deriving the Ideal Index for a SELECT

THREE-STAR INDEX—THE IDEAL INDEX FOR A SELECT

DECLARE CURSOR41 CURSOR FOR
SELECT CNO, FNAME
FROM CUST
WHERE LNAME = :LNAME
AND
CITY = :CITY
ORDER BY FNAME

An index deserves the first star if the index rows relevant to the SELECT are next
to each other—as in Figure 4.2—or at least as close to each other as possible.
This minimizes the thickness of the index slice that must be scanned.

The second star is given if the index rows are in the right order for the
SELECT—as in Figure 4.2. This eliminates sorting.

If the index rows contain all the columns referred to by the SELECT—as in
Figure 4.2—the index is given the third star. This eliminates table access: The
access path is index only.

Of these three stars, the third is often the most important one. Leaving a
column out of an index may lead to many slow random reads from the disk drive.
An index that has at least the third star is a fat index for the given SELECT.

A Fat Index
A fat index is an index that has at least the third star. It contains all the table
columns referred to by the SELECT and so enables index only access.

To Qualify for the First Star
Pick the columns from all equal predicates (WHERE COL = . . .). Make these
the first columns of the index—in any order. For CURSOR41, the three-star
index will begin with columns LNAME, CITY or CITY, LNAME. In both cases
the index slice that must be scanned will be as thin as possible.

To Qualify for the Second Star
Add the ORDER BY columns. Do not change the order of these columns, but
ignore columns that were already picked in step 1. For example, if CURSOR41
had redundant columns in the ORDER BY, say ORDER BY LNAME, FNAME
or ORDER BY FNAME, CITY, only FNAME would be added in this step.
When FNAME is the third index column, the result table will be in the right
order without sorting. The first FETCH call will return the row with the smallest
FNAME value.

To Qualify for the Third Star
Add all the remaining columns from the SELECT statement. The order of the
columns added in this step has no impact on the performance of the SELECT,
but the cost of updates should be reduced by placing volatile columns at the end.
Now the index contains all the columns required for an index-only access path.
The resulting three-star index is:
(LNAME, CITY, FNAME, CNO) or (CITY, LNAME, FNAME, CNO)

? The WHERE clause does not contain range predicates (BETWEEN, >,>=, etc.).
? The FROM clause refers to a single table only.
? None of the predicates seems too difficult for the optimizer.

Range Predicates and a Three-Star Index

This example, SQL 4.3, requires the same information as before, but the required
customers are now within a range of values.

DECLARE CURSOR43 CURSOR FOR
SELECT CNO, FNAME
FROM CUST
WHERE LNAME BETWEEN :LNAME1 AND :LNAME2
AND
CITY = :CITY
ORDER BY FNAME

Let’s try to design a three-star index for this CURSOR. Much of the reasoning
is the same as that for CURSOR41, but the substitution of the BETWEEN
predicate for the = predicate will have significant implications. We will consider
the three stars in reverse order, which arguably reflects the level of difficulty of
understanding.

First the easy (albeit very important) star, the third star. As before, the
index will qualify for the third star by ensuring that all the columns specified
in the SELECT are in the index. No access to the table will be required and so
synchronous reads will not cause a problem.

The index will qualify for the second star by adding the ORDER BY column,
but this will only be true if it is added before the BETWEEN predicate column
LNAME, for example, with index (CITY, FNAME, LNAME). There is only
a single CITY value (= predicate), and so the result set will be provided in
FNAME sequence simply by using this index without the need for a sort. But
if the ORDER BY column is added after the BETWEEN predicate column
LNAME, for example, with index (CITY, LNAME, FNAME), the index rows
will not be in FNAME sequence and so a sort will be required; this can be
seen in Figure 4.3. Therefore to qualify for the second star, the FNAME must be

before the BETWEEN predicate column LNAME as in index (FNAME, . . .) or
index (CITY, FNAME, . . .).
Regarding the first star, if CITY is the first column of the index, we will have
a relatively thin slice to scan (MC = 1), depending on the CITY filter factor. But
the index slice will be even thinner with the index (CITY, LNAME, . . .); now
with two matching columns we only touch the index rows we really need. But
to do this and so benefit from a very thin index slice, no other column (such as
FNAME) can come between them.
So how many stars will our ideal index have? It can certainly have the third
star, but, as we have just seen, we can have either the first star or the second
star, but not both! In other words, we can either:
? Avoid a sort—by having the second star.
or
? Have the thinnest possible index slice, which will minimize not only the
number of index entries to be processed but also any subsequent processing,
in particular synchronous reads for the table rows—by having the
first star.

The presence of the BETWEEN predicate in our example, or indeed any other
range predicate, means we cannot have both; we cannot have a three-star index.

This implies therefore, that we have to make a choice between the first
and second stars. Usually this isn’t a difficult choice since, as we will see in
Chapter 6, the first star tends to be the much more important star, although this
is not always the case as we will see at that time.

Let’s consider for a moment an index (LNAME, CITY, . . .) as shown in
Figure 4.4. LNAME is a range predicate, which means it is the last column that
can participate in the index matching process, as we saw in Chapter 3. The equal
predicate CITY would not be used in the matching process. The result of this
would be only one matching column—a much thicker index slice than we would
have with the index (CITY, LNAME, . . .).

ALGORITHM TO DERIVE THE BEST INDEX FOR A SELECT

The ideal index will be a three-star index for the reasons discussed above. We
have seen, however, that with a range predicate this may not be possible; we may
have to sacrifice (probably) the second star in order to achieve a thin index slice
(the first star). The best index will then only have two stars. This is why we are
careful to distinguish between ideal and best. The ideal in this case would just
not be possible. Taking this into account, we can formulate the rules for creating
the best (perhaps not ideal) index under all conditions. The result might have
three stars or two stars.

First a fat index (third star) is designed, where the scanned index slice is
as thin as possible (first star). If this index does not imply a sort (second star),
it is a three-star index. Otherwise it will only be a two-star index, having sacrificed
the second star. Another candidate should then be derived that eliminates
sorting, thereby having the second star but having sacrificed the first star. One
of the resulting two star indexes will then be the best possible index for the
given SELECT.

The algorithm to derive the best index for a SELECT follows.

Candidate A:
1. Pick the equal predicate columns that are not too difficult for the optimizer
(discussed in Chapter 6). These are the first index columns—in any order.
2. The next index column is the column in the most selective range predicate,
if any. Most selective means the lowest filter factor with the worst
input. Only consider the range predicates that are not too difficult for the
optimizer (discussed in Chapter 6).
3. Add the ORDER BY columns in the correct order (and with DESC if
an ORDER BY column has DESC). Ignore columns that were already
picked in step 1 or 2.
4. Add all remaining columns referred to by the SELECT statement, in any
order (but start with the nonvolatile columns).

Example: CURSOR43
Candidate A will be (CITY, LNAME, FNAME, CNO).
Candidate A causes a sort for CURSOR43 because FNAME comes after the
range predicate column, LNAME.

Candidate B:
If candidate A leads to a sort with the given SELECT, candidate B is designed.
By definition, the second star is more important than the first star for candidate B.
1. Pick the equal predicate columns that are not too difficult for the optimizer.
These are the first index columns—in any order
2. Add the ORDER BY columns in the correct order (and with DESC if
an ORDER BY column has DESC). Ignore columns that were already
picked in step 1.
3. Add all remaining columns referred to by the SELECT statement, in any
order (but start with the nonvolatile columns).

If candidate A leads to a sort with the given SELECT, candidate B is designed.
By definition, the second star is more important than the first star for candidate B.
1. Pick the equal predicate columns that are not too difficult for the optimizer.
These are the first index columns—in any order
2. Add the ORDER BY columns in the correct order (and with DESC if
an ORDER BY column has DESC). Ignore columns that were already
picked in step 1.
3. Add all remaining columns referred to by the SELECT statement, in any
order (but start with the nonvolatile columns).

We now have two candidates for the best index, one with the first star,
one with the second star. To determine which will be the best index, we could
analyze the performance of each index as we did at the beginning of this chapter.
This would be quite time consuming, however, and Chapter 5 presents a simpler
method (the QUBE) for estimating which candidate will provide the faster access
path for the SELECT.

It is important to realize that all we have done so far is to design the ideal
or best index. Whether this would be practical or not, at this stage we are not
in a position to say.

关注1

参与1

0同行回答

“答”则兼济天下，请您为题主分忧！

THREE-STAR INDEX—THE IDEAL INDEX FOR A SELECT最佳理论

0同行回答

提问者

相关问题

相关资料

相关文章

问题状态