SP3 and OCFSv2 are very stable - runs stability test (RAC cluster, ASM +
OCFS, permanent load) for a few weeks without
any troubles for now (it never was a case with Sp2 and OCFSv2 - had problems
all the time).
So, advice - upgrade to SP3 first of all. Then, RECREATE your file systems
(ocfsv2). Then, try again.
----- Original Message -----
From: "Kevin Lyons" <klyons@(protected)>
To: <suse-oracle@(protected)>
Sent: Tuesday, January 17, 2006 11:45 AM
Subject: Re: [suse-oracle] Problem with OCFS2 - Heartbeat write timeoutto
device
Hi Joao
>>> On Tue, Jan 17, 2006 at 8:07 am, in message
<001801c61b80$1361d3e0$d401290a@(protected)
<jd-apinf@(protected)>
wrote:
> Hi,
> Sometimes I have a problem related to OCFS2 on one of the nodes from
my
> Oracle RAC. The RAC has two nodes.
>
Does it always happen on the same node or does it vary?
> The server completely hangs after the error that I report below in
the error
> messages.
>
> Can anyone help me with this problem?
>
>
> ERROR MESSAGES:
>
>
> - > o2hb_write_timeout:165 ERROR: Heartbeat write timeout to device
dm- 1
> after 12000 milliseconds
>
> - > o2hb_stop_all_regions:1674 ERROR: stopping heartbeat on all
active
> regions.
>
> - > Kernel panic: ocfs2 is very sorry to be fencing this system by
panicing
>
>
> Device dm- 1 is configured with OCFS2 and multipath to access my EMC
storage.
> I use it for Voting Disk file (CRS) and Cluster Registry (OCR). The
storage
> is working fine.
>
It may be that the hardware is working fine, but OCFS2 thinks it cannot
hit the disks. What version of OCFS2 do you have on the system? If you
have what was shipped with base SP2 you may want to upgrade to the
latest version of the kernel and OCFS tools as people have reported this
type of problem with the version that ships with SP2. SP3 or the latest
kernels have a more stable version.
Also, what type of load is the system under? Are there any multipath
failovers going on when the node panics?
You may want to try adjusting the O2CB_HEARTBEAT_THRESHOLD parameter in
the /etc/sysconfig/o2cb file and restarting o2cb. It defaults to 7
which works out to be something like 12000 milliseconds (I forget the
formula). Increasing it should increase the time OCFS2 waits before
fencing.
Thanks,
Kevin
--
To unsubscribe, email: suse-oracle-unsubscribe@(protected)
For additional commands, email: suse-oracle-help@(protected)
Please see http://www.suse.com/oracle/ before posting
--
To unsubscribe, email: suse-oracle-unsubscribe@(protected)
For additional commands, email: suse-oracle-help@(protected)
Please see http://www.suse.com/oracle/ before posting