Try SP1 (ask for RC3) first of all. It have a sugnificant improvements vs
SLES10 release.
(
As I said before:
SLES10 release is in reality SLES10 open beta
SLES10 SP1 will be in reality SLES10 first real release.
The same was with SLES9
SLES9 release had a quality of beta (was not stable, had a critical VM bugs,
had a compatibility problems.)
SLES9 SP1 became a first production-ready version.
Why should we expect a difference with SLES10? We are in unofficial
beta-stage now until the middle of the May /when SP1 wil be released/.
My experiments with both SLEs10 and SLES10 Sp1 proved it for me.
).
----- Original Message -----
From: "Peter Santos" <psantos@(protected)>
To: "Alexei_Roudnev" <Alexei_Roudnev@(protected)>;
<oracleasm-users@(protected)>
Cc: <suse-oracle@(protected)>
Sent: Wednesday, May 02, 2007 9:39 AM
Subject: Re: [suse-oracle] re: SELS 10 - Kernel 2.6.16.27.0.9 locks up -
Again.
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Alexei,
>
> so I decided to turn off everything that was "oracle" related and
> by running a couple of "dd" commands in parallel, I got the machine to
> lock up again.
>
> I know that you mentioned in a previous posting that SLES 10 is just
> not production ready .. and I'm wondering if I'm just hitting some sort
> of hardware issue.
>
> One thing I did notice was the following in the /var/log/messages ...
which is some sort of
> incompatibility with the dvd-rom, but from my research I couldn't tell if
this could cause
> the machine to lock up.
>
>
> May 2 11:37:53 s_dgram@(protected)
confused (ireason = 0x01). Trying to recover
> by ending request.
> May 2 11:48:09 s_dgram@(protected)
confused (ireason = 0x01). Trying to recover
> by ending request.
> May 2 11:53:56 s_dgram@(protected)
confused (ireason = 0x01). Trying to recover
> by ending request.
> May 2 11:54:02 s_dgram@(protected)
confused (ireason = 0x01). Trying to recover
> by ending request.
> May 2 11:54:29 s_dgram@(protected)
confused (ireason = 0x01). Trying to recover
> by ending request.
> May 2 12:03:04 s_dgram@(protected)
confused (ireason = 0x01). Trying to recover
> by ending request.
> May 2 12:06:32 s_dgram@(protected)
confused (ireason = 0x01). Trying to recover
> by ending request.
>
> We have another 3 node RAC cluster on SLES 9 (SP3), so we just might go
back to that ...
>
> - -peter
>
>
> Peter Santos wrote:
> > Alexei,
> > the reason we are using asmlib is because our experience with managing
> > raw devices is limited and we don't want to run into additional trouble
> > down the road.
> >
> > we've tried these tests over and over and it seems that the machine just
> > locks up when we run consecutive "dd" commands .. after about an hr the
> > machine locks up. When the oracleasm is down we can't reproduce this,
but when
> > the service is up, we get the locking problem. The only thing that I'm
> > uncertain about is that when the raw service starts up the raw devices
> > are bound, but the permissions on those devices were root:root when
> > oracleasm started. Only after did I change the permissions. I'm going
to
> > try this test one more time in this sequence.
> > 1. bind the raw devices.
> > 2. set the proper permissions on those devices
> > 3. start the oracleasm service.
> > 4. do /etc/init.d/oracleasm/status and listdisks to make sure that
> > everything looks correct.
> > 5. run a number of "dd" commands to some local storage and see if
> > machine locks up.
> > prompt> dd if=/dev/zero of=/z0/test/testthere3 bs=4k count=22000000
> >
> > The frustrating thing is that the machine just locks up and there is no
logging. Also
> > it requires that we go to the data center to physically restart the
machine.
> >
> > The other thing is that our hardware is certified on SLES 9 (SP3), but
not on SLES 10. Again,
> > I'm not show how important this is, but we can/might try SLES 9 if we
can't get this resolved.
> > The certification bulletin for our hardware on SLES 9 is 83873.
> >
> > Here is the module information for ASM.
> >
> > dbt1:~ # modinfo oracleasm
> > filename:
/lib/modules/2.6.16.27-0.9-smp/kernel/drivers/addon/oracleasm/oracleasm.ko
> > license: GPL
> > version: 2.0.3
> > author: Joel Becker <joel.becker@(protected)>
> > description: Kernel driver backing the Generic Linux ASM Library.
> > vermagic: 2.6.16.27-0.9-smp SMP gcc-4.1
> > depends:
> > srcversion: B35F9F20EF40931C318A5EA
> >
> > Any ideas on how to troubleshoot this would be great!
> >
> >
> > -peter
> >
> >
> > Alexei_Roudnev wrote:
> >>> Advice # 1 - drop asmlib and never use it. It is useless piece of
software.
> >>> Linux have 'raw' which do the same but is standard component, not omee
made
> >>> as asmlib.
> >>>
> >>> Then repeat tests again.
> >>>
> >>> ----- Original Message -----
> >>> From: "Peter Santos" <psantos@(protected)>
> >>> To: <suse-oracle@(protected)>
> >>> Sent: Monday, April 30, 2007 12:15 PM
> >>> Subject: [suse-oracle] re: SEL 10 - Kernel 2.6.16.27.0.9 locks up
> >>>
> >>>
> >>> Folks,
> >>> I'm trying to find out how to go about investigating an issue
> >>> where our test server running 10.2.0.3 (x86_64) is locking up when we
run
> >>>> a
> >>> few dd commands sequentially (dd if=/dev/zero of=/z0/test/testthere2
bs=4k
> >>>> count=5000000) .. where /z0 was
> >>> just some local storage.
> >>>
> >>> He did a kernel upgrade to version 2.6.16.27.0.9 a couple of weeks
ago. We
> >>>> then installed
> >>> the following ASM packages on top of that.
> >>>
> >>> oracleasmlib-2.0.2-1.x86_64.rpm
> >>> oracleasm-support-2.0.3-1.x86_64.rpm
> >>> oracleasm-2.6.16.27-0.9-smp-2.0.3-1.x86_64.rpm
> >>>
> >>> We are using SEL 10 + 10.2.0.3 + ASM via ASMLib.
> >>>
> >>> At random intervals the machine would crash with no information in the
> >>>> /var/log/messages. We ran a memory test
> >>> on it and it was fine. Finally our SA recompiled the latest kernel
from
> >>>> source ( 2.6.21-smp) and after a number
> >>> of "dd" tests ,the machine did NOT crash. With the latest kernel from
> >>>> source, ASM was not started because of
> >>> version mismatch!
> >>>
> >>> ASM may or may not be the problem, but what is the best way to
> >>>> troubleshoot this?
> >>> The machine has the following spec:
> >>> - Dell 6800 with 4 dual core CPUs (Intel(R) Xeon(TM) CPU 2.60GHz )
> >>> - Storage is DS4400
> >>> - Storage Driver: Fibre Channel: QLogic Corp. QLA2312 Fibre Channel
> >>>> Adapter (rev 02)
> >>> -peter
> >>>
> >>>
> > --
> > To unsubscribe, email: suse-oracle-unsubscribe@(protected)
> > For additional commands, email: suse-oracle-help@(protected)
> > Please see http://www.suse.com/oracle/ before posting
> >>>>
> >
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.1 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFGOL63oyy5QBCjoT0RAsxaAJwLJsVT/W08N2l/C/gqRqUv/qONtQCePYqx
> uqmvU6kXkneqzsF08gFSbUk=
> =ZfIh
> -----END PGP SIGNATURE-----
>
--
To unsubscribe, email: suse-oracle-unsubscribe@(protected)
For additional commands, email: suse-oracle-help@(protected)
Please see http://www.suse.com/oracle/ before posting