Well after reading Tom's comments I'd say there is not
enough room in here for his ego anyhow.
Good riddance Tom. Go masturbate your ego somewhere
else.
Ooo. Is it getting hot in here?
Let me state that this is the last challenge I will
answer.
The database was spending extra CPU time.
The particular
database which went belly up (but I had cloned and fixed and fed the backlogged
data to) was unusable. It would not open up. So no steps could have been
taken. I could quote iTARS from Oracle Support on this but that is Oracle
and Cingular confidential.
The $30 million figure was my boss's.
He told me how much they were planning to budget to get the databases working at
a proper speed. Not prevent loss of data, just to upgrade
hardware.
I'm sorry to say that yes, my solution would have required
psychic abilities or perhaps a somewhat talented DBA, because Oracle Support had
the problem of the slowness and deadlocks elevated and elevated. They
couldn't go any higher and Oracle did not have a solution until I dreamed about
data dictionary corruption and came upon hcheck.sql and deduced what that
problem was. It took a while for Oracle Support to work with developers to
verify that truly this corruption was the cause of the increasing
slowness. Then it took a couple weeks of negotiations (threats from
Cingular to go to DB2) before they agreed to allow me to fix the data dictionary
and keep the database instead of their memorized method of fixing the data
dictionary so you could export the data and import it into another database.
How many times do I have to tell you. I ran Statspack reports at
the highest level of detail until I was blue in the face. I ran
traces. I set events. But I also am by nature intuitive and tend
often to use intuition to solve a problem with facts to back up my intuitive
conclusion. So after providing all of this stuff to Oracle Support, they
were at a loss, well, they were very eager to look at corruption as a cause,
because they didn't have another solution.
Yes, the problems was
solved. Over the duration of my stint with Cingular (I had one database
which Oracle and I had to work up DML to the data dictionary for a couple
months, then apply it to a clone, which resulted in the clone pegging the CPU
with SMON running for 6 weeks straight). And I had many of these
databases. The problem got cleared up when finally all of the 5 types of
data dictionary corruption were fixed with a total of 12 techniques, which not
only speeded up the databases (saving $30 million in hardware upgrades and
perhaps having to go to RAC), and then converting to LMT.
So yes, I
started on the problem during my first week at Cingular end converted the last
database to LMT during my last week at Cingular, working on this problem (and
the usual development/production DBA work) for the duration of my tenure
there. The databases now have 10X as much data than they had when
they were built but run as fast as they did when they were built years before.
I am hereby ending my participation in this thread. Flame me all
you want, I will just hit the delete key.
Tom in Austin
On 8/25/07, Jeremiah
Wilton <jeremiah@ora-600.net> wrote:
Tom,
You say that the 'orphaned segments' caused a performance
problem. What
was the database spending time doing to cause this
performance problem?
If you had done nothing about the orphaned
segments, what would have
prevented someone from taking the same steps to
manually update the data
dictionary at the point that the database became
so slow as to be unusable.
Your assertion that you saved Cingular $30MM
seems to imply that had you
not taken action that there would have been
complete loss of data. Can
you characterize how that data loss
would have occurred?
This response actually is not very
technical. My chief gripe is that it
doesn't say how a person
like myself with no apparent psychic abilities
vis-a-vis Oracle databases
might have detected and resolved the problem.
Most people on this list
(hopefully) use wait events, preferably via
ASH, to detect the root cause
of performance problems. How was the time
being
accounted for in the wait event interface? DD reads
are
accounted in that interface just as normal index and heap segment
reads
are. So you can see why some people here who approach
problems in an
empirical manner might have questions about the character of
the problem.
My questions in no way are meant to invalidate the way
that you solved
the problem. After all, if you solved it,
regardless of how you
obtained the solution, wasn't the problem
solved?
Thanks
Jeremiah Wilton
ORA-600 Consulting
http://www.ora-600.net
Tom Pall
wrote:
> I did the traces, ran Staspack till I was blue in the face, set
the
> events to trap deadlocks. I did all of the things a DBA
would do but
> decided that there was something deeper than just two
applications
> colliding, because as I worked the problem over a two
week period, I
> noticed the database slowing down. Not waits
slowing down, not I/O
> slowing down, just throughput slowing
down. Slowing down in ways
> neither I nor Oracle Support
could explain before my dream, research in
> Metalink and discovery of
hcheck.sql in Metalink.
>
> Is this technical
enough?
No virus found in this incoming message.
Checked by AVG Free
Edition.
Version: 7.5.484 / Virus Database: 269.12.10/976 - Release Date:
8/27/2007 6:20 PM
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.485 / Virus Database: 269.13.5/988 - Release Date: 9/4/2007 9:14 AM