I run few stress tests on SLES9 SP3 with Oracle 10.2 (Async IO by default), ASM and OCFS.
Final conclusing - it is not working combination. OCFSv2 adds a great instability into any system, because a very poor connectivity adnd heartbeat
detection, that it fail (jnto panic - fencing) under heavy load easily and kiils the whole cluster.
Worst of all:
- OCFSv2 have not any method to use a few redundant links, so it cannot be configured for HA anvironment 9except using low level IP binding).
remember that both, iSCSI and Oarcle CRS/CSS, have such meahanisms (Oracle can be configured to use multiple interfaces, and iSCSI have
both, multi port and multi path);
- OCFSv2 reboots server even if there was not any activity on file system. It makes impossible using it as a near store (sich as backups etc).
- OCFSv2 is extremely prine to different freezes. For example, if (it happen sometimes) it experienced internal failure in rerunning journal (on idle node) then it stick and you cannot stop it. remembering that it require unmounting all nodes for every simple operation (fsck, resize, changing cluster)
it cause numerous reboots of the whole cluster.
I am not sure when it all started - I remember running the same tests on older OCFSv2 (SP3 beta) and it worked fine (at least it looked fine).
I will run few more experiments to determine a point, when it all became unstable, but for now, it looks extremely unstable (in test environment, fortunately). (At ;least it is unstable agter 10.2.0.1 -> 10.2.0.2 upgrade, but we did not run stress tests between all upgrades so I cannot eliminate OCFSv2 version problem, which is more likely than Oracle 10.2.0.2 but /I cannot image how Oracle can crash OCFS/.