Java Mailing List Archive

http://www.dba.5341.com/

Home » Home (12/2007) » suse oracle »

Re: [suse-oracle] re: SELS 10 - Kernel 2.6.16.27.0.9 locks up
 - Again.

Arun Singh

2007-05-02

Replies:

Peter,

As you wrote this box is SLES9 (SP3) certified http://developer.novell.com/yes/83873.htm. I will suggest opening support request either with Novell or Dell to find out If there is any known issue with SLES10 and work with them to figure out why it's locking with simple dd.

Another option, is try with latest build (RC3) of upcoming SLES10 SP1. You can request this with Novell support.

-Arun


>>> On 5/2/2007 at 9:39 AM, Peter Santos <psantos@(protected):
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Alexei,
>
> so I decided to turn off everything that was "oracle" related and
> by running a couple of "dd" commands in parallel, I got the machine to
> lock up again.
>
> I know that you mentioned in a previous posting that SLES 10 is just
> not production ready .. and I'm wondering if I'm just hitting some sort
> of hardware issue.
>
> One thing I did notice was the following in the /var/log/messages ... which
> is some sort of
> incompatibility with the dvd-rom, but from my research I couldn't tell if
> this could cause
> the machine to lock up.
>
>
> May 2 11:37:53 s_dgram@(protected)
> confused (ireason = 0x01). Trying to recover
> by ending request.
> May 2 11:48:09 s_dgram@(protected)
> confused (ireason = 0x01). Trying to recover
> by ending request.
> May 2 11:53:56 s_dgram@(protected)
> confused (ireason = 0x01). Trying to recover
> by ending request.
> May 2 11:54:02 s_dgram@(protected)
> confused (ireason = 0x01). Trying to recover
> by ending request.
> May 2 11:54:29 s_dgram@(protected)
> confused (ireason = 0x01). Trying to recover
> by ending request.
> May 2 12:03:04 s_dgram@(protected)
> confused (ireason = 0x01). Trying to recover
> by ending request.
> May 2 12:06:32 s_dgram@(protected)
> confused (ireason = 0x01). Trying to recover
> by ending request.
>
> We have another 3 node RAC cluster on SLES 9 (SP3), so we just might go back
> to that ...
>
> - -peter
>
>
> Peter Santos wrote:
>> Alexei,
>>  the reason we are using asmlib is because our experience with managing
>>  raw devices is limited and we don't want to run into additional trouble
>>  down the road.
>>  
>>  we've tried these tests over and over and it seems that the machine just
>>  locks up when we run consecutive "dd" commands .. after about an hr the
>>  machine locks up. When the oracleasm is down we can't reproduce this, but
> when
>>  the service is up, we get the locking problem. The only thing that I'm
>>  uncertain about is that when the raw service starts up the raw devices
>>  are bound, but the permissions on those devices were root:root when
>>  oracleasm started. Only after did I change the permissions. I'm going to  
>>  try this test one more time in this sequence.
>>    1. bind the raw devices.
>>    2. set the proper permissions on those devices
>>    3. start the oracleasm service.
>>    4. do /etc/init.d/oracleasm/status and listdisks to make sure that
>>      everything looks correct.
>>    5. run a number of "dd" commands to some local storage and see if
>>      machine locks up.
>>      prompt> dd if=/dev/zero of=/z0/test/testthere3 bs=4k count=22000000
>>
>>  The frustrating thing is that the machine just locks up and there is no
> logging. Also
>>  it requires that we go to the data center to physically restart the machine.
>>
>>  The other thing is that our hardware is certified on SLES 9 (SP3), but not
> on SLES 10. Again,
>>  I'm not show how important this is, but we can/might try SLES 9 if we can't
> get this resolved.
>>  The certification bulletin for our hardware on SLES 9 is 83873.
>>  
>>  Here is the module information for ASM.
>>
>>  dbt1:~ # modinfo oracleasm
>>  filename:    
> /lib/modules/2.6.16.27-0.9-smp/kernel/drivers/addon/oracleasm/oracleasm.ko
>>  license:     GPL
>>  version:     2.0.3
>>  author:      Joel Becker <joel.becker@(protected)>
>>  description:   Kernel driver backing the Generic Linux ASM Library.
>>  vermagic:     2.6.16.27-0.9-smp SMP gcc-4.1
>>  depends:
>>  srcversion:   B35F9F20EF40931C318A5EA
>>
>>  Any ideas on how to troubleshoot this would be great!
>>
>>
>> -peter
>>
>>
>> Alexei_Roudnev wrote:
>>>> Advice # 1 - drop asmlib and never use it. It is useless piece of software.
>>>> Linux have 'raw' which do the same but is standard component, not omee made
>>>> as asmlib.
>>>>
>>>> Then repeat tests again.
>>>>
>>>> ----- Original Message -----
>>>> From: "Peter Santos" <psantos@(protected)>
>>>> To: <suse-oracle@(protected)>
>>>> Sent: Monday, April 30, 2007 12:15 PM
>>>> Subject: [suse-oracle] re: SEL 10 - Kernel 2.6.16.27.0.9 locks up
>>>>
>>>>
>>>> Folks,
>>>> I'm trying to find out how to go about investigating an issue
>>>> where our test server running 10.2.0.3 (x86_64) is locking up when we run
>>>>> a
>>>> few dd commands sequentially (dd if=/dev/zero of=/z0/test/testthere2 bs=4k
>>>>> count=5000000) .. where /z0 was
>>>> just some local storage.
>>>>
>>>> He did a kernel upgrade to version 2.6.16.27.0.9 a couple of weeks ago. We
>>>>> then installed
>>>> the following ASM packages on top of that.
>>>>
>>>> oracleasmlib-2.0.2-1.x86_64.rpm
>>>> oracleasm-support-2.0.3-1.x86_64.rpm
>>>> oracleasm-2.6.16.27-0.9-smp-2.0.3-1.x86_64.rpm
>>>>
>>>> We are using SEL 10 + 10.2.0.3 + ASM via ASMLib.
>>>>
>>>> At random intervals the machine would crash with no information in the
>>>>> /var/log/messages. We ran a memory test
>>>> on it and it was fine. Finally our SA recompiled the latest kernel from
>>>>> source ( 2.6.21-smp) and after a number
>>>> of "dd" tests ,the machine did NOT crash. With the latest kernel from
>>>>> source, ASM was not started because of
>>>> version mismatch!
>>>>
>>>> ASM may or may not be the problem, but what is the best way to
>>>>> troubleshoot this?
>>>> The machine has the following spec:
>>>> - Dell 6800 with 4 dual core CPUs (Intel(R) Xeon(TM) CPU 2.60GHz )
>>>> - Storage is DS4400
>>>> - Storage Driver: Fibre Channel: QLogic Corp. QLA2312 Fibre Channel
>>>>> Adapter (rev 02)
>>>> -peter
>>>>
>>>>
>> --
>> To unsubscribe, email: suse-oracle-unsubscribe@(protected)
>> For additional commands, email: suse-oracle-help@(protected)
>> Please see http://www.suse.com/oracle/ before posting
>>>>>
>>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.1 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFGOL63oyy5QBCjoT0RAsxaAJwLJsVT/W08N2l/C/gqRqUv/qONtQCePYqx
> uqmvU6kXkneqzsF08gFSbUk=
> =ZfIh
> -----END PGP SIGNATURE-----




--
To unsubscribe, email: suse-oracle-unsubscribe@(protected)
For additional commands, email: suse-oracle-help@(protected)
Please see http://www.suse.com/oracle/ before posting

©2008 dba.5341.com - Jax Systems, LLC, U.S.A.