Alexei,
You have said a lot here but I don't think you have a very good grasp
around why one would want to use ASM redundancy over another vendors
product. I don't want to get into the marketing specifics, but in the
scenarios you have listed most have been tested and yes there was bugs
with the initial releases where problems were seen specific to certain
platforms. This was really a porting issues and not a design
problems. In most of the cases I saw, the underlying problem was due to
behavior outside of Oracle's control and was actually fixed by OS vendor
patches.
Once again I agree that external redundancy and a hardware raid solution
is preferable. But you have not covered any of the benefits that you
get from ASM redundancy. Yes ASM is a new emerging technology and there
was issues that need to be sorted out. For some cases it makes total
sense for someone to use ASM redundancy over another vendors product.
Two of which is cost and complexity. Keep in mind that ASM as it is now
with the 10g products is built by engineers that understand Oracle RDBMS
IO inside and out. Also don't assume that the development engineers
don't have a background in raid technologies.
Note that the poster states they are testing.
-Bennett
Alexei_Roudnev wrote:
> ASM + normal redundancy == software RAID-1.
>
> First of all, is author of the post sure that he need software RAID0-1
> over existing iSCSI LUN-s (which are, 90% RAID already)?
>
> Second, software RAID can not advantage from write-back cache (because
> system memory is power-dependent) so
> it is always slower then hardware raid.
>
> Third, 90% of the RAID-features rely to the _disk failure, automated
> rebuild, recovery after power failure, how to deal with 2 disks from
> the same RAID if 1 was disconnected and then connected again, how to
> rebuild raid automatically and so on_. This is all hardware-specifics,
> and require a long experience to be designed properly. Oracle
> engineering is not well skilled in this area (just because they had
> not enough time), so I never can trust to software raid (ASM) from
> Oracle (and it is well confirmed by numerous problems reported on the
> different forums). What happoen if you remnove disk1 and then insert
> it back - can system recover properly? What happen if you lost
> connection to LUN-1 for 1 minute, then it recovers and then you lost
> connection to LUN-2 for another 1 minute - can system recover from it?
> What happen if you lost LUN-1, then connect new disk as LUN-3 - how to
> ask system to rebuild RAID? What happen if you shutdown system, then
> replace LUN-1 by another disk, then start it up? And so so on... It
> tooks years for NetApp, for numerous RAID Chips vendors, for EMC and
> other brands to troubleshout all such scenarios and create really
> reliable raids. It took years for Microsofts to create a reliable
> software raid (and it is not really reliable in Windows 2000 Server,
> if you worked with it in a failure scenarios). So how can I trust to
> Oracle ASM if they had 0 experience in RAID area and have almost 0%
> user's base - users who uses ASM with normal redundancy and
> experienced real failures?
>
>
>
> ----- Original Message ----- From: "Bennett Leve"
> <bennett.leve@(protected)>
> To: "Alexei_Roudnev" <Alexei_Roudnev@(protected)>
> Cc: "Hahn, Klaus" <klaushahn@(protected)>
> Sent: Thursday, November 15, 2007 11:37 AM
> Subject: Re: [suse-oracle] >>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE
> LOCK!
>
>
>> Alexei,
>>
>> I agree that external redundancy is preferable but not sure what you are
>> getting at. What are you eluding to as far as trusing ASM as a RAID
>> system?
>>
>> -Bennett
>>
>> Alexei_Roudnev wrote:
>>> Advice - don't use normal redundancy; use external redundancy only.
>>> Oracle is not good in RAID development, so we can not trust to ASM as
>>> a RAID system. (IN addition, HW raids are always much faster vs
>>> software raid, because of power-independent cache on the hardware
>>> raids).
>>>
>>>
>>>
>>> ----- Original Message ----- From: "Bennett Leve"
>>> <bennett.leve@(protected)>
>>> To: "Hahn, Klaus" <klaushahn@(protected)>
>>> Cc: <suse-oracle@(protected)>
>>> Sent: Thursday, November 15, 2007 6:52 AM
>>> Subject: Re: [suse-oracle] >>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE
>>> LOCK!
>>>
>>>
>>>> Klaus,
>>>>
>>>> What version or the RDBMS are you using for the ASM instance and
>>>> for the
>>>> database? There are several known issues that may be the cause here.
>>>> Also how is raw11 presented to st02 and raw12 presented to st01?
>>>>
>>>> -Bennett
>>>>
>>>> Hahn, Klaus wrote:
>>>>> Hi list,
>>>>>
>>>>> I'm testing RAC + ASM + iSCSI-shared-storage on SUSE SLES10 SP1.
>>>>>
>>>>> Configuration of RAC:
>>>>> 1. 2 nodes (a + b) with 2 instances TEST1 and TEST2
>>>>>
>>>>> Configuration of ASM:
>>>>> 1. normal redundancy
>>>>> 2. Diskgroup data with 2 failure groups st01_data and st02_data.
>>>>> Each failure group has 1 disk (/dev/raw/raw11 and
>>>>> /dev/raw/raw12)
>>>>> raw11 is an iSCSI-Target from node st01.
>>>>> raw12 is an iSCSI-Target from node st02.
>>>>> 3. Diskgroup reco (recovery area) with 2 failure groups st01_reco
>>>>> and
>>>>> st02_reco.
>>>>> Each failure group has 1 disk (/dev/raw/raw6 and /dev/raw/raw7)
>>>>> raw6 is an iSCSI-Target from node st01.
>>>>> raw7 is an iSCSI-Target from node st02.
>>>>>
>>>>> Taking offline of the iSCSI-Target node st01 produce a strange error
>>>>> within the database:
>>>>> - no commit possible
>>>>> - alter system switch logfile not possible
>>>>> - database seems to be readonly
>>>>>
>>>>> srvctl says: All RAC-Services and Instances are running.
>>>>>
>>>>>
>>>>>
>>>>> alert.log of RAC-Instance TEST1:
>>>>>
>>>>>
>>>>>>>>>> ..
>>>>>>>>>>
>>>>>
ORA-27091: I/O kann nicht in Queue gestellt werden
>>>>>
ORA-27072: Datei-I/O-Fehler
>>>>> Linux-x86_64 Error: 5: Input/output error
>>>>> Additional information: 4
>>>>> Additional information: 192544
>>>>> Additional information: -1
>>>>> Tue Oct 30 11:57:10 2007
>>>>> Errors in file /opt/oracle/admin/TEST/udump/test1_ora_30342.trc:
>>>>>
ORA-27091: I/O kann nicht in Queue gestellt werden
>>>>>
ORA-27072: Datei-I/O-Fehler
>>>>> Linux-x86_64 Error: 5: Input/output error
>>>>> Additional information: 4
>>>>> Additional information: 192544
>>>>> Additional information: -1
>>>>> WARNING: offlining disk 2.2526723344 (RECO_0002) with mask 0x3
>>>>> Tue Oct 30 11:57:19 2007
>>>>> Errors in file /opt/oracle/admin/TEST/bdump/test1_ckpt_15122.trc:
>>>>>
ORA-27091: I/O kann nicht in Queue gestellt werden
>>>>>
ORA-27072: Datei-I/O-Fehler
>>>>> Linux-x86_64 Error: 5: Input/output error
>>>>> Additional information: 4
>>>>> Additional information: 192608
>>>>> Additional information: -1
>>>>> WARNING: offlining disk 2.2526723344 (RECO_0002) with mask 0x3
>>>>> Tue Oct 30 12:34:16 2007
>>>>>
>>>>>>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=16
>>>>>>>>
>>>>> System State dumped to trace file
>>>>> /opt/oracle/admin/TEST/bdump/test1_mmon_15153.trc
>>>>> Tue Oct 30 12:47:03 2007
>>>>>
>>>>>>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=37
>>>>>>>>
>>>>> System State dumped to trace file
>>>>> /opt/oracle/admin/TEST/udump/test1_ora_23960.trc
>>>>> Tue Oct 30 12:50:53 2007
>>>>>
>>>>>>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=54
>>>>>>>>
>>>>> System State dumped to trace file
>>>>> /opt/oracle/admin/TEST/udump/test1_ora_14206.trc
>>>>>
>>>>> <<<<<<
>>>>>
>>>>> alert.log of ASM-Instance: +ASM1
>>>>>
>>>>> <<<<<<<<<<<<
>>>>>
ORA-27091: unable to queue I/O
>>>>>
ORA-27072: File I/O error
>>>>> Linux-x86_64 Error: 5: Input/output error
>>>>> Additional information: 4
>>>>> Additional information: 2056
>>>>> Additional information: -1
>>>>> Tue Oct 30 11:57:19 2007
>>>>> Errors in file /opt/oracle/admin/+ASM/bdump/+asm1_gmon_28227.trc:
>>>>>
ORA-27091: unable to queue I/O
>>>>>
ORA-27072: File I/O error
>>>>> Linux-x86_64 Error: 5: Input/output error
>>>>> Additional information: 4
>>>>> Additional information: 2048
>>>>> Additional information: -1
>>>>> Tue Oct 30 11:57:19 2007
>>>>> Errors in file /opt/oracle/admin/+ASM/bdump/+asm1_gmon_28227.trc:
>>>>>
ORA-27091: unable to queue I/O
>>>>>
ORA-27072: File I/O error
>>>>> Linux-x86_64 Error: 5: Input/output error
>>>>> Additional information: 4
>>>>> Additional information: 2056
>>>>> Additional information: -1
>>>>> NOTE: cache closing disk 2 of grp 2: RECO_0002
>>>>> Tue Oct 30 11:57:19 2007
>>>>> NOTE: group RECO: relocated PST to: disk 0000 (PST copy 0)
>>>>> Tue Oct 30 11:57:22 2007
>>>>> NOTE: PST update: grp = 2, dsk = 2, mode = 0x4
>>>>> Tue Oct 30 11:57:22 2007
>>>>> NOTE: group RECO: relocated PST to: disk 0000 (PST copy 0)
>>>>> NOTE: cache closing disk 2 of grp 2: RECO_0002
>>>>> Tue Oct 30 11:57:52 2007
>>>>> NOTE: PST refresh pending for group 1/0x175a4df9 (DATA)
>>>>> SUCCESS: refreshed PST for 1/0x175a4df9 (DATA)
>>>>> NOTE: PST refresh pending for group 2/0x506a4dfe (RECO)
>>>>> SUCCESS: refreshed PST for 2/0x506a4dfe (RECO)
>>>>> Tue Oct 30 11:59:24 2007
>>>>> SUCCESS: refreshed PST for 1/0x175a4df9 (DATA)
>>>>> SUCCESS: refreshed PST for 2/0x506a4dfe (RECO)
>>>>> <<<<<<<<<<<
>>>>>
>>>>> Reconfiguration of ASM by " alter diskgroup add failgroup ..."
>>>>> (after
>>>>> deleting the asm-metadata on the raw device)
>>>>> takes no effect.
>>>>>
>>>>> After restart of the RAC-instances all work fine !
>>>>>
>>>>> Any hint ?
>>>>>
>>>>> Regards
>>>>> Klaus
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --------------------------------------------------------------------------------
>>>
>>>
>>>
>>>
>>>> --
>>>> To unsubscribe, email: suse-oracle-unsubscribe@(protected)
>>>> For additional commands, email: suse-oracle-help@(protected)
>>>> Please see http://www.suse.com/oracle/ before posting
>>>
>>>
>>
>>
>
>