Java Mailing List Archive

http://www.dba.5341.com/

Home » Home (12/2007) » suse oracle »

Re: [suse-oracle] OCFSv2 and Oracle 10.2 @ iSCSI

Alexei_Roudnev

2006-04-25

Replies:

No, high load was by Oracle ASM, not by OCFSv2.

Here is the problem - after upgrading to Oracle 10.2.0.2 (which I guess I
should not do next time, because 10.2.0.2. looks broken in a few minor
places and it indicates bad testing as usual), when I run high load tests on
Oracle (with moderate or low load on OCFSv2), it cause OCFSv2 instability.

Reading documentation and sources make me conclude that OCFSv2 cannot be
used in current conditions at all, even if I saw a bug, not normal behavior
(there is some indication that there was such bug in OCFS 1.18). Adding OCFS
into Oracle RAC environment increases a chances to sporadic system reboots
(because of clueless fencing in OCFSv2), and so can make the whole cluster
useless or damage files (I had already damaged udev cache because of these
reboots).

Moreover, after a few fencings I run full ocfsv2 fsck and find few
inconsistencies /fixing them did not helped with fencing, btw/. In some
point, failed cluster caused third server (which I used as a passive mounter
to maintain a quorum) to experience a kernel error (and stick on OCFSv2
forever). So, it all together increase instability in thousands time.

I guess, btw, why I never ever saw anything like this in OCFSv1? Why, if
OCFSv2 have not pending IO operations, it ever bother about quorum and
heartbeats (if I have not IO, I can easily re-establish cluster relations
after any pause - just need to expire buffer cache and remount again). I
still count my fencing problems on the (possible) bug in 1.18, but the
overall design looks so dangerous, that I better count on NetAPP NFS as a
shared storage for archive logs etc (it cause another problem - you cannot
set up Flash recovery onto NFS and primary area on ASM, and use dbca
defaults - dbca will create you control files in flash recovery area and
will try to use direct_io /defaults/ which will not work with NFS).

So.

- test your system under heavy Oracle load.
- be very careful with 10.2.0.2 and cluster. It breaks at least one thing
* 10.2.0.1 allows (and it's default) creating services (let's anyone
explain me what is it) with the same name as DB name;
* 10.2.0.2 prohibits it (why?!) and srvctl refuse to start/stop services
if they have the same name (so you are trapped after the upgrade - CRS
starts services, but if you stop DB and it cause services failure, you
cannot start them back without CRS restart.).

Good point is that 10.2.0.1 --> 10.2.0.2 upgrade works fine (after you
manager to compile dumb README file into readable TODO file).

PS. I am not (even) talking about EM - it is broken as usual - sometimes you
have a good luck and it work, but when it fail, you are not surprised at
all.

Another funny story. If you used EM to schedule backups (dumb idea, I know)
and then one node stops, your backups will be suspended until you bring node
back. As a result, if you have archive logs and 1 node stops, you are
guaranteed to see a dead database because of _no space for archive logs_.

----- Original Message -----
From: "Kevin Hulse" <kevin.hulse@(protected)>
To: "Alexei_Roudnev" <Alexei_Roudnev@(protected)>
Cc: <suse-oracle@(protected)>
Sent: Tuesday, April 25, 2006 6:22 AM
Subject: Re: [suse-oracle] OCFSv2 and Oracle 10.2 @ iSCSI


>
>   While I do have the usual self-fencing issues when trying
> to bring down individual nodes, I have not experienced
> problems running OCFS2 under high IO loads. Infact, I have
> run a load test where the entire cluster was doing about
> 80MB/sec of physical reads with system load averages of
> 20-30. This was with 10.2.0.1 and SLES9R2.
>
>   This is a 3 node cluster using Dell 2850's and an IBM DS400.
> So this is FC SAN and not iscsi. Linux multipath is being used
> for path redundancy.
>   I also have a 4th node that is a partially configured. It is
> intended to be added to the cluster as a 4th node but currently
> only has OCFS2 configured. It will have CRS and RAC installed
> eventually. Now, it just runs a physical standby.
>   The whole system is still in testing and I have no problem
> with just running that 4th node as a backup target when
> I need to rebuild everything. I just mount the SAN volumes
> and copy happily away.
>
>   I also run SLES9R3 with 10.2.0.1 just fine.
>
>   If 10.2.0.2 is a dud, that would be very handy to know.
>
> Alexei_Roudnev wrote:
>
> >I run few stress tests on SLES9 SP3 with Oracle 10.2 (Async IO by
default), ASM and OCFS.
> >
> >Final conclusing - it is not working combination. OCFSv2 adds a great
instability into any system, because a very poor connectivity adnd heartbeat
> >detection, that it fail (jnto panic - fencing) under heavy load easily
and kiils the whole cluster.
> >
> >Worst of all:
> >
> >- OCFSv2 have not any method to use a few redundant links, so it cannot
be configured for HA anvironment 9except using low level IP binding).
> >remember that both, iSCSI and Oarcle CRS/CSS, have such meahanisms
(Oracle can be configured to use multiple interfaces, and iSCSI have
> >both, multi port and multi path);
> >- OCFSv2 reboots server even if there was not any activity on file
system. It makes impossible using it as a near store (sich as backups etc).
> >- OCFSv2 is extremely prine to different freezes. For example, if (it
happen sometimes) it experienced internal failure in rerunning journal (on
idle node) then it stick and you cannot stop it. remembering that it require
unmounting all nodes for every simple operation (fsck, resize, changing
cluster)
> >it cause numerous reboots of the whole cluster.
> >
> >I am not sure when it all started - I remember running the same tests on
older OCFSv2 (SP3 beta) and it worked fine (at least it looked fine).
> >I will run few more experiments to determine a point, when it all became
unstable, but for now, it looks extremely unstable (in test environment,
fortunately). (At ;least it is unstable agter 10.2.0.1 -> 10.2.0.2 upgrade,
but we did not run stress tests between all upgrades so I cannot eliminate
OCFSv2 version problem, which is more likely than Oracle 10.2.0.2 but /I
cannot image how Oracle can crash OCFS/.
> >
> >
> >
> >
> >
> >
>
>


--
To unsubscribe, email: suse-oracle-unsubscribe@(protected)
For additional commands, email: suse-oracle-help@(protected)
Please see http://www.suse.com/oracle/ before posting

©2008 dba.5341.com - Jax Systems, LLC, U.S.A.