  | | | > > > WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! | > > > WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! 2007-11-15 - By Alexei_Roudnev
Back ASM + normal redundancy == software RAID-1 (See http://AID-1.ora-code.com).
First of all, is author of the post sure that he need software RAID0-1 over existing iSCSI LUN-s (which are, 90% RAID already)?
Second, software RAID can not advantage from write-back cache (because system memory is power-dependent) so it is always slower then hardware raid.
Third, 90% of the RAID-features rely to the _disk failure, automated rebuild, recovery after power failure, how to deal with 2 disks from the same RAID if 1 was disconnected and then connected again, how to rebuild raid automatically and so on_. This is all hardware-specifics, and require a long experience to be designed properly. Oracle engineering is not well skilled in this area (just because they had not enough time), so I never can trust to software raid (ASM) from Oracle (and it is well confirmed by numerous problems reported on the different forums). What happoen if you remnove disk1 and then insert it back - can system recover properly? What happen if you lost connection to LUN-1 (See http://LUN-1.ora-code.com) for 1 minute, then it recovers and then you lost connection to LUN-2 (See http://LUN-2.ora-code.com) for another 1 minute - can system recover from it? What happen if you lost LUN-1 (See http://LUN-1.ora-code.com), then connect new disk as LUN-3 (See http://LUN-3.ora-code.com) - how to ask system to rebuild RAID? What happen if you shutdown system, then replace LUN-1 (See http://LUN-1.ora-code.com) by another disk, then start it up? And so so on... It tooks years for NetApp, for numerous RAID Chips vendors, for EMC and other brands to troubleshout all such scenarios and create really reliable raids. It took years for Microsofts to create a reliable software raid (and it is not really reliable in Windows 2000 Server, if you worked with it in a failure scenarios). So how can I trust to Oracle ASM if they had 0 experience in RAID area and have almost 0% user's base - users who uses ASM with normal redundancy and experienced real failures?
-- -- Original Message -- -- From: "Bennett Leve" <bennett.leve@(protected)> To: "Alexei_Roudnev" <Alexei_Roudnev@(protected)> Cc: "Hahn, Klaus" <klaushahn@(protected)>; <suse-oracle@(protected)> Sent: Thursday, November 15, 2007 11:37 AM Subject: Re: [suse-oracle] >>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!
> Alexei, > > I agree that external redundancy is preferable but not sure what you are > getting at. What are you eluding to as far as trusing ASM as a RAID > system? > > -Bennett > > Alexei_Roudnev wrote: >> Advice - don't use normal redundancy; use external redundancy only. >> Oracle is not good in RAID development, so we can not trust to ASM as >> a RAID system. (IN addition, HW raids are always much faster vs >> software raid, because of power-independent cache on the hardware raids). >> >> >> >> -- -- Original Message -- -- From: "Bennett Leve" >> <bennett.leve@(protected)> >> To: "Hahn, Klaus" <klaushahn@(protected)> >> Cc: <suse-oracle@(protected)> >> Sent: Thursday, November 15, 2007 6:52 AM >> Subject: Re: [suse-oracle] >>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE >> LOCK! >> >> >>> Klaus, >>> >>> What version or the RDBMS are you using for the ASM instance and for the >>> database? There are several known issues that may be the cause here. >>> Also how is raw11 presented to st02 and raw12 presented to st01? >>> >>> -Bennett >>> >>> Hahn, Klaus wrote: >>>> Hi list, >>>> >>>> I'm testing RAC + ASM + iSCSI-shared-storage on SUSE SLES10 SP1. >>>> >>>> Configuration of RAC: >>>> 1. 2 nodes (a + b) with 2 instances TEST1 and TEST2 >>>> >>>> Configuration of ASM: >>>> 1. normal redundancy >>>> 2. Diskgroup data with 2 failure groups st01_data and st02_data. >>>> Each failure group has 1 disk (/dev/raw/raw11 and /dev/raw/raw12) >>>> raw11 is an iSCSI-Target from node st01. >>>> raw12 is an iSCSI-Target from node st02. >>>> 3. Diskgroup reco (recovery area) with 2 failure groups st01_reco and >>>> st02_reco. >>>> Each failure group has 1 disk (/dev/raw/raw6 and /dev/raw/raw7) >>>> raw6 is an iSCSI-Target from node st01. >>>> raw7 is an iSCSI-Target from node st02. >>>> >>>> Taking offline of the iSCSI-Target node st01 produce a strange error >>>> within the database: >>>> - no commit possible >>>> - alter system switch logfile not possible >>>> - database seems to be readonly >>>> >>>> srvctl says: All RAC-Services and Instances are running. >>>> >>>> >>>> >>>> alert.log of RAC-Instance TEST1: >>>> >>>> >>>>>>>>> .. >>>>>>>>> >>>> ORA-27091 (See http://ORA-27091.ora-code.com): I/O kann nicht in Queue gestellt werden >>>> ORA-27072 (See http://ORA-27072.ora-code.com): Datei-I/O-Fehler >>>> Linux-x86_64 Error: 5: Input/output error >>>> Additional information: 4 >>>> Additional information: 192544 >>>> Additional information: -1 >>>> Tue Oct 30 11:57:10 2007 >>>> Errors in file /opt/oracle/admin/TEST/udump/test1_ora_30342.trc: >>>> ORA-27091 (See http://ORA-27091.ora-code.com): I/O kann nicht in Queue gestellt werden >>>> ORA-27072 (See http://ORA-27072.ora-code.com): Datei-I/O-Fehler >>>> Linux-x86_64 Error: 5: Input/output error >>>> Additional information: 4 >>>> Additional information: 192544 >>>> Additional information: -1 >>>> WARNING: offlining disk 2.2526723344 (RECO_0002) with mask 0x3 >>>> Tue Oct 30 11:57:19 2007 >>>> Errors in file /opt/oracle/admin/TEST/bdump/test1_ckpt_15122.trc: >>>> ORA-27091 (See http://ORA-27091.ora-code.com): I/O kann nicht in Queue gestellt werden >>>> ORA-27072 (See http://ORA-27072.ora-code.com): Datei-I/O-Fehler >>>> Linux-x86_64 Error: 5: Input/output error >>>> Additional information: 4 >>>> Additional information: 192608 >>>> Additional information: -1 >>>> WARNING: offlining disk 2.2526723344 (RECO_0002) with mask 0x3 >>>> Tue Oct 30 12:34:16 2007 >>>> >>>>>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=16 >>>>>>> >>>> System State dumped to trace file >>>> /opt/oracle/admin/TEST/bdump/test1_mmon_15153.trc >>>> Tue Oct 30 12:47:03 2007 >>>> >>>>>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=37 >>>>>>> >>>> System State dumped to trace file >>>> /opt/oracle/admin/TEST/udump/test1_ora_23960.trc >>>> Tue Oct 30 12:50:53 2007 >>>> >>>>>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=54 >>>>>>> >>>> System State dumped to trace file >>>> /opt/oracle/admin/TEST/udump/test1_ora_14206.trc >>>> >>>> <<<<<< >>>> >>>> alert.log of ASM-Instance: +ASM1 >>>> >>>> <<<<<<<<<<<< >>>> ORA-27091 (See http://ORA-27091.ora-code.com): unable to queue I/O >>>> ORA-27072 (See http://ORA-27072.ora-code.com): File I/O error >>>> Linux-x86_64 Error: 5: Input/output error >>>> Additional information: 4 >>>> Additional information: 2056 >>>> Additional information: -1 >>>> Tue Oct 30 11:57:19 2007 >>>> Errors in file /opt/oracle/admin/+ASM/bdump/+asm1_gmon_28227.trc: >>>> ORA-27091 (See http://ORA-27091.ora-code.com): unable to queue I/O >>>> ORA-27072 (See http://ORA-27072.ora-code.com): File I/O error >>>> Linux-x86_64 Error: 5: Input/output error >>>> Additional information: 4 >>>> Additional information: 2048 >>>> Additional information: -1 >>>> Tue Oct 30 11:57:19 2007 >>>> Errors in file /opt/oracle/admin/+ASM/bdump/+asm1_gmon_28227.trc: >>>> ORA-27091 (See http://ORA-27091.ora-code.com): unable to queue I/O >>>> ORA-27072 (See http://ORA-27072.ora-code.com): File I/O error >>>> Linux-x86_64 Error: 5: Input/output error >>>> Additional information: 4 >>>> Additional information: 2056 >>>> Additional information: -1 >>>> NOTE: cache closing disk 2 of grp 2: RECO_0002 >>>> Tue Oct 30 11:57:19 2007 >>>> NOTE: group RECO: relocated PST to: disk 0000 (PST copy 0) >>>> Tue Oct 30 11:57:22 2007 >>>> NOTE: PST update: grp = 2, dsk = 2, mode = 0x4 >>>> Tue Oct 30 11:57:22 2007 >>>> NOTE: group RECO: relocated PST to: disk 0000 (PST copy 0) >>>> NOTE: cache closing disk 2 of grp 2: RECO_0002 >>>> Tue Oct 30 11:57:52 2007 >>>> NOTE: PST refresh pending for group 1/0x175a4df9 (DATA) >>>> SUCCESS: refreshed PST for 1/0x175a4df9 (DATA) >>>> NOTE: PST refresh pending for group 2/0x506a4dfe (RECO) >>>> SUCCESS: refreshed PST for 2/0x506a4dfe (RECO) >>>> Tue Oct 30 11:59:24 2007 >>>> SUCCESS: refreshed PST for 1/0x175a4df9 (DATA) >>>> SUCCESS: refreshed PST for 2/0x506a4dfe (RECO) >>>> <<<<<<<<<<< >>>> >>>> Reconfiguration of ASM by " alter diskgroup add failgroup ..." (after >>>> deleting the asm-metadata on the raw device) >>>> takes no effect. >>>> >>>> After restart of the RAC-instances all work fine ! >>>> >>>> Any hint ? >>>> >>>> Regards >>>> Klaus >>>> >>>> >>>> >>>> >>> >>> >>> >> >> >> -- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- --- ---- >> >> >> >>> -- >>> To unsubscribe, email: suse-oracle-unsubscribe@(protected) >>> For additional commands, email: suse-oracle-help@(protected) >>> Please see http://www.suse.com/oracle/ before posting >> >> > >
-- To unsubscribe, email: suse-oracle-unsubscribe@(protected) For additional commands, email: suse-oracle-help@(protected) Please see http://www.suse.com/oracle/ before posting
|
|
 |