>>>The thing to remember
is that a hit ratio does not reliably correlate to performance, and therefore is
NOT a valid performance metric. If you approach a database that is having
performance problems, look at a hit ratio, and determine it's "too low", you're
likely to spin your wheels trying to "fix" the hit ratio by improving it, but
miss the root cause of the performance problem.<<<
I'll have to disagree with that
somewhat. Consider that case of index scans that are serviced mainly from the DB
Buffer Cache.
Your hit ratio will be good
because the query(ies) are continually hitting the cache and not reading from
disk.
Once you start getting cache
misses then your performance drops as well....
I know what most of
the response will be -- go tune the code to be more efficient...of course that
is the ideal solution but in reality that is not always a
possilbility...
So you are left with
things like trying to buffer as much of the data as possible (i.e. big DB cache
sizes, keep pools, etc..)
I'm not saying you
don't have a valid point (most of it is) but there are always
exceptions...
Bryan,
Never
be afraid to ask a question...and there are no stupid questions...only stupid
answers. I shall endeavor to ensure this is not a stupid
answer.
There
is nothing inherently wrong with hit ratios. The problem is when people
rely on hit ratios as a valid performance metric. If you want to monitor
hit ratios and you happen to notice a precipitous drop in a particular hit
ratio, well, perhaps there's something there....perhaps.
The
thing to remember is that a hit ratio does not reliably correlate to
performance, and therefore is NOT a valid performance metric. If you
approach a database that is having performance problems, look at a hit ratio,
and determine it's "too low", you're likely to spin your wheels trying to "fix"
the hit ratio by improving it, but miss the root cause of the performance
problem.
A few
words about 100% buffer cache hit ratio. This is actually often a sign of
trouble, perhaps big trouble. The theory goes like this "disk I/O is X
times slower than memory access (pick your favorite number for X, 100, 1000,
10000, whatever) so therefore, you want all your I/O to be logical (from the
buffer cache) thereby eliminating all that slow disk I/O." Sounds good,
right? There are some things you have to remember.....Oracle's buffer
cache is part of the SGA, which is a shared memory segment, which all the Oracle
server processes share access to. In order to ensure the integrity of that
data in shared memory, Oracle has some pretty elaborate locking
mechanisms. I won't go into details, but, very breifly (and incompletely)
a "latch" is a small chunk of memory which serializes access to a specific part
of the SGA. The buffer cache is protected by latches and buffer locks that
make sure that while one process is reading data from a particular block,
another process can't overwrite that buffer.
So,
when you consider the serialization overhead of Oracle's SGA, the difference in
cost between a logical I/O and a physical I/O is nowhere near 10000 or 1000 or
even 100. If I recall correctly, Cary Millsap published a number, based on
huge amounts of empirical data (trace files) that calculated it to be something
like a factor of 37. Ok, that's still 37x slower! That's worth
eliminating, right??
Ok,
next, a little change in focus here. There is no problem if a particular
well-tuned query has a hit ratio of 100%. In fact, most anyone would agree
that this is a good thing. So, what's the problem with a 100% (or nearly
100%) buffer cache hit ratio instance wide?? Well, maybe
nothing....maybe. But it's also very possible to have an instance with a
100% buffer cache hit ratio that is so bottlenecked that noone can get any use
out of it. The problem lies in the serialization I mentioned before.
Even if you have a 100% buffer cache hit ratio, every buffer cache access is
serialized through latches, buffer locks, and buffer pins. This is
necessary for Oracle to maintain data integrity. So, what happens if you
have a query that (for example) uses an index very inefficiently? You have
a query that hits the same blocks over and over again. Because they are
repeatedly hit, they remain in the buffer cache, so, no disk I/O!
Yay! But, the serialization is killing you. What happens when you
run several instances of these inefficient queries concurrently? They will
push your buffer cache hit ratio well past 99.99%, but, the system will be
suffering. The serialization invlolved in locking and latching will bring
the system to it's knees. So, in this case, you have the "ideal" BCHR, but
your system will run like a dog.
The
most effective way to determine the root cause of a performance problem is to
specifically analyze the process that's having the problem. Look at the
performance profile. Where is time actually being spent? Focus on
that! Tools like the 10046 trace at level 8, the Oracle wait interface,
and StatsPack, can be very useful in determining the root
cause.
-Mark
PS If you go to http://www.hotsos.com/ and click on Library,
Cary Millsap has a paper there called something like "Why a 99+% buffer cache
hit ratio is bad", and he makes a much more eloquent argument than I do
above. Lots of other good papers there too....take some time to read
through some of them. Registration is required, but
free.
--
Mark J. Bobak
Senior Oracle Architect
ProQuest Information & Learning
For a successful technology, reality must
take precedence over public relations, for Nature cannot be fooled.
--Richard P. Feynman, 1918-1988
.com