Java Mailing List Archive

http://www.dba.5341.com/

Home » Home (12/2007) » suse oracle »

[suse-oracle] Re: [Alexei_Roudnev@exigengroup.com: [suse-oracle] OCFSv2 stability - my results]

Alexei_Roudnev

2006-07-10


1) It is a test system, so I will be glad to retry tests again. It takes
some time (I need to migrate files from NFS, and it's huge amount of them),
but I'd like to try.

I am not sure, if I better wait for next SLES update (which is standard case
with them) or just compile OCFS into the kernel manually (usually it is a
bad idea because it makes all updates more difficult and creates unsupported
kernel).

2) Performance vs local file system.
It worked fast but created high CPU load.
For example, when I run our system indexing (which read a lot of files on
one node), I saw 30 - 40% CPU load via o2net on another node (sometimes).
System, which we tested, was not perfect (those version had flat directory
with a few thousands of files, fixed already), so I can not count it on
ocfs2 only, but
- it worked perfect on local FS
- it works perfect on netApp NFS
- it never ever competed for the files from different nodes, and it makes
such high system load very suspicious.

3) Use cases and reliability.

In reality, it's what I don't like in current OCFS2. If we do not compete
for the resources (nodes never changes files in the same directory,
never write the same files, and even never read the same files), system
should behave like local file system, without self-fencings and other tricky
things (it can wait longer on the very first attempt to share resources, but
can behave very well in normal conditions).
It is very (very) common usage scenario, for example:
- using OCFS2 for Oracle backups (only 1 node at a time write files)
- using OCFSv2 in application cluster (directories are never shared, because
different nodes works in different subdirectories)
- using OCFSv2 for NFS cluster (the same, because only 1 node have active
NFS service).

It can be _special usage mode_, may be, but dedicating such -special mode_
can increase reliability (which is now low because of
very high chance of self-fencing - I saw when Oracle fenced one node and
OCFS fenced another, so making whole cluster useless).

4) I was surprised by results, because I managed to run OCFSv2 in Oracle RAC
cluster (after kernel 255 was released, with
new OCFSv2 implementation) for a long time, in automated mode, and without
any problems. But I always suspected some instability
because of self-fencing problems.

5) The worst scenario was _self destruction of the system disk_. / disk was
erased (first blocks), and I have not another explanation except OCFSv2
miswriting. Unfortunately, I did not find any messages in syslogs (syslog is
on network) - everything worked excellent, then 1 node died (responded to
pings and I guess, to ocfs2 heartbeat, but did not allow any application to
run); when we restarted second node (which worked after first node died) we
find damaged system disk on it. I saw similar behavior (ocfs2 damage other
disks) some time ago, on another version (we counted it onto iSCSI, but it
looks now like OCFS2 problem).


Few questions (may be, it's a wrong list, but):
- in many cases, when system have not outstanding blocks, self-fencing and
system reboot do not make much sense. If system have not outstanding blocks
in the cache and have not active operations in the queries, reboot is
equivalent to the simple _remount_.
Of course, reboot can be necessary to resolve other things (sometimes, it
reveals overall troubles), but if we use OCFSv2 for, say, Oracle backups, it
is typical scenario when we want to remount, mount read only (as a last
choice), freeze access, but not to reboot.

- We all know, that 2-node OCFS2 cluster is not stable because of known
quorum problem. In many cases, we can add 3-d server into the cluster but
don't want to mount file system on it. Why don't allow _passive member_
mount mode - such member can maintain 'heartbeat'
and work with quorum, but can be instructed to _never reboot, just remount'
and to 'do not expose file system to the users'.

- Ethernet convergence time is 40 seconds by STP standard. While it can be
decreased a little, it is still typical timeout for many
systems, and I even find FibreChannel timeouts configured to 1 minute on our
old production system. NetApp failover time is almost the same (about 30 -
40 seconds). System can survive 30 - 40 seconds overall freeze, if
necessary, but if any 15 second network glitch cause servers to reboot, it
became extremely annoying.

- So, using multiple heartbeat channels became a must for reliable cluster.
Most server have few IP interfaces (2 Ethernet is typical,
additional FibreChannel whcih can be used for IP is typical too), and
allowing multiple interconnections could resolve this 'fencing' problem (it
eliminates all network glitches from fencing reason, so making possible to
use much stricter setting).

In addition, why dont use hangcheck_timer module for better reliability (and
I even am not speaking about 'dont reboot on panic' SLES default, which is
very annoying if you forgot about it.


----- Original Message -----
From: "Sunil Mushran" <Sunil.Mushran@(protected)>
To: <Alexei_Roudnev@(protected)>
Cc: "Wim Coekaerts" <wim.coekaerts@(protected)>;
<suse-oracle@(protected)>
Sent: Monday, July 10, 2006 4:18 PM
Subject: Re: [Alexei_Roudnev@(protected)
stability - my results]


> The corruption could be the lvb problem we fixed in 1.2.2. We had
> been tracking that for 1.2.1 but could not get a reliable testcase in
time.
>
> >>- performance is average - sometimes o2cb spent a lot of CPU.
> >>- works faster vs NFS (NFS on NetApp, OCFSv2 on NetAPp / iSCSI).
>
> Care to expand. I cannot decide how to interpret that. Average compared
> to what?
>
> Yes, the default hb timeout is probably low for a lot of setups. While
> we cannot change the default, we have documented how one can
> change the value on one's setup.
>
> >>- if symlink is not resolved, it is reported to syslog - no need, just a
noise.
>
>
> Resolved in 1.2.2.
>
> >>Conclusion - negative for document storage (for now).
>
> Sorry to hear that.
>
> >>Does anyone have opposite results (not with oracle DB or backups -
> they do
> >>work in my lab without any problems)?
>
> Good to hear that. :)
>
> Wim Coekaerts wrote:
>


--
To unsubscribe, email: suse-oracle-unsubscribe@(protected)
For additional commands, email: suse-oracle-help@(protected)
Please see http://www.suse.com/oracle/ before posting

©2008 dba.5341.com - Jax Systems, LLC, U.S.A.