Java Mailing List Archive

http://www.dba.5341.com/

Home » Home (12/2007) » suse oracle »

[suse-oracle] *** final conclusion - 10.2.0.2 patchset BROKE RAC cluster on SLES9 SP3 x86_64 Linux - Re: [suse-oracle] OCFSv2 and Oracle 10.2 @ iSCSI

Alexei_Roudnev

2006-04-27

Replies:

I downgraded (in reality, reinstalled) RAC cluster to 10.2.0.1, and
everything runs well now.
Few previous tests eliminated other reasons for system freeze/cluster
failure.

So, applying 10.2.0.2 patchset on SLES9 SP3 x86_64 RAC cluster (ASM + iSCSI
+ OCFS or NFS) broke operating system,
causing periodic kernel loop under heavy Oracle load (by some mystery, it
happens only if
- I have 2 databases in cluster
- second database is in archive mode /but I am not sure if it is necessary
for failure or just coincidence/
- second database is under average to heavy load

then kernel got into 20 - 60 seconds internal loop, freezing all IO and all
user's processes;
after looping, kernel waked up (system show 100 - 200 LoadAverage in this
moment). It caused cluster to recognize failure and fence
itself (both,CSSD or OCFSv2, reboots in different tests), and (on the other
hand) allowed delayed IO to proceed (which can damage OCFSv2 journal in some
cases because other nodes ALREADY decided that node is dead and took
control).

So it's the worst case scenario I can image at all. (I will fix
hangcheck_timer to reboot systems before it happen, but anyway, no one
cluster can survive such failures without risk of split-brain or delay-io
damage). It means that we cleaned OCFSv2 (as normal - few minor problems can
be seen, but it was not a root cause), and I can clean Hugetlb (I reverted
it back) and few other things in SLES9 SP3 linux.
Final test was simple:
- run DB on 10.2.0.2, see that linux hangs;
- remove Oracle
- install Oracle RAC onto 10.2.0.1
- create the same databases and see that it all works stable (no any
attempts to broke cluster, run into 100 LA etc).

I will open TAR with Oracle about it.

----- Original Message -----
From: "Kevin Closson" <kevinc@(protected)>
To: <suse-oracle@(protected)>
Sent: Thursday, April 27, 2006 5:50 AM
Subject: RE: [suse-oracle] OCFSv2 and Oracle 10.2 @ iSCSI


>>>> >>>Reading documentation and sources make me conclude that
>>>> >>>OCFSv2 cannot be used in current conditions at all,
>>>even if I saw a
>>>> >>>bug, not normal behavior (there is some indication that
>>>there was
>>>> >>>such bug in OCFS 1.18).
>>>>
>>>> ..so this is why open source is so "attractive" ? You can read the
>>>> code to determine something is completely broken even after your
>>>> testing showed the same? :-)
>>>
>>>Kevin, could you kindly stop that sneering?

Yes, bad humor. I apologize.

--
To unsubscribe, email: suse-oracle-unsubscribe@(protected)
For additional commands, email: suse-oracle-help@(protected)
Please see http://www.suse.com/oracle/ before posting



--
To unsubscribe, email: suse-oracle-unsubscribe@(protected)
For additional commands, email: suse-oracle-help@(protected)
Please see http://www.suse.com/oracle/ before posting

©2008 dba.5341.com - Jax Systems, LLC, U.S.A.