Too much to cover it all.
First, indexes absolutely can be used by
queries containing inequalities in the predicate for the relevant columns.
Whether they will be used depends on your statistics (possibly including
histograms), what release of Oracle you’re on, and either an heuristic
clause-access path rating built into the RBO or a permutation limited
evaluation of the “COST” that will be related to (but not
exclusively determined by) what fraction of the blocks of the index and table
the CBO expects (based on the statistics) to have to visit to produce the
required result set.
Now there are various sophisticated
numeric methods for grabbing the set of points within a box defined by four
lat,lon pairs and you should probably employ one of those but let’s just
pretend someone is only interested in the data between zero and ten degrees for
each. Then an index on lat, or an index on lon, or an index on lat and lon
could each potentially help very much.
Likewise, if one of your columns is
household income and you only want to see rows related to household incomes
over 1000000 USD, then an index on that data value is likely to be very helpful
except perhaps in
Since you’re reading up on 10.2, I’ll
presume you’re at least on a version where lack of a predicate on the
left most set of columns of an index does not rule out use of the index if the
CBO determines skipping through the index is still less in expected cost than
other means of accessing the column values of the rows you need. The
calculation of whether various single column indexes or combinations of
concatenated column indexes is highly dependent on the query pattern and the
actual data. If a common leading edge set of columns gets you to a dramatic
subset of the rows, then you do want to try to always include those columns in
the predicate.
It is correct that if all the required
columns for some table are in an index, then you might avoid actually visiting
table blocks altogether.
Your description of possibly using a
function based index is a case that often works out well in the balance of cost
of maintenance versus cost of queries. Mileage varies.
To the extent you have knowledge of the
query predicate patterns that will actually be used, you have a huge
opportunity to match your indexes very well to that anticipated use.
Every index you add introduces extra
overhead in the insert, update, and deletion of rows in the requisite table. I
think Craig Shallahammer has a paper out there somewhere that addresses the
detailed math of whether a particular index is good, and despite being an old
paper it is certainly still relevant.
Finally (from me, this could be a long
book to treat your questions comprehensively), if your data is very old and the
relative rate of change is modest compared to the accumulated data, and if you
do in fact know the likely patterns of access then physically ordering your
data so that index row selectivity corresponds well to block selectivity has
huge potential benefits. (Which are of course potentially perishable as rows are
inserted, updated, and deleted). In your case you mention some columns that are
always relevant. Especially if old job_id,lab_id rows are unlikely to be added
to, then that might be an effective order in your case, and the rebuild could
be the chance to copy all the rows with nulls in those columns to table_old
while copying in job_id, lab_id order (possibly with additional ordering) to
table_current. Various cluster based storage methods might also turn out well
for you (I’d start with reading Steve Adams’ stuff on clusters and
cross reference that with what release of Oracle you’ll be using, and do
remember the tradeoff of using clusters versus long term maintenance in context
of whether or not you’ll be using partitioning).
Please notice I’ve used a lot of
words like “might” and “potential” and that
correspondence of the best indexing strategy to an actual case involves at
least the rate of change of the data and the query pattern.
Regards,
mwf
From:
oracle-l-bounce@freelists.org [mailto:oracle-l-bounce@freelists.org] On Behalf Of Bill Ferguson
Sent: Thursday, July 12, 2007 8:35
AM
To: oracle-l@freelists.org
Subject: (stupid) Question about
indexing
<snip>
3. I don't believe that indexes are used when > or < symbols are
used in sql, am I correct?
data_value
1. again, nobody has ever queried for a single value, only by a range of
values. i.e. why query for ICP Ti=.014 instead of .015
At this point, I do not see any advantage for indexing these fields. Am I
missing something?
So, starting with the numeric indexes, the application allows the user
to specify a "box" (N, S, E, W) of latitude/longitude coordinates.
I'm thinking that having these indexed will improve performance, as the where
clause does a between comparison. With the data_value, the query will do either
a ">=" or a "<=" depending on the user selection, and
again I think an index helps here as well. I also asked for a couple function
based indexes for a couple fields, where the data exists as upper, lower or
mixed, so I don't have to do a case conversion during the query and negate the
benefit of having the field indexed. The submitter field is one example. It can
either be all upper case, all lower case, or both.
<snip>