Run pings. If you can run pings of 8K between serevrs, then more likely tcp
and udp (RAC uses udp) will work as well. In addition, run ftp for the big
(1 GB) file and verify that you have bottleneck in network bandwidth or disk
system - so seen about 80 MB/second, if disks are very fast, or 10 - 30
MB/secobd, on usual speed disks. If network have problem, you wil see more
likely <= 1 MB/second transfer speed.
More likely, your network admins had it configured but did not saved
configuration (I am both, sys and network admin, so I can control it here
easily), or you run 9K pings before setting up 9K MTU (in this case ping wil
work tru fragmentation).
I always run many performance tests after systems are network/configured but
before RAC installation.
Btw, I saw some performance improvement with jumbo frames (sugnificant on
iSCSI and some on RAC interconnection).
----- Original Message -----
From: "Peter Santos" <psantos@(protected)>
To: <suse-oracle@(protected)>
Sent: Friday, August 10, 2007 9:24 PM
Subject: Re: [suse-oracle] re: 10.2.0.3 - root.sh fails when adding 2nd node
?
> We finally figured it out.
>
> Turns out that when our sa's gave us the configured machine, the MTU on
> the private interconnect was set to Jumbo Frames (MTU = 9000), and
> I remember running a check where I ping'd each private interconnect with
> a ping -s 9000 <ip of interconnect> .. and it worked. Maybe the
> switch wasn't setup to handle jumbo frames... anyway when we set the MTU
> back to 1500 it all worked like a charm .. only wasted about
> 3 days.
>
> Does anyone know of the best way to ensure that Jumbo Frames is working
> .. This is not the firs time our SA's have told me that Jumbo Frames
> was configured on the switch when it wasn't. I wonder if there is a good
> way to test this without bringing up the web interface to the
> switch etc .. It seems like sending the interconnect IP's large packets
> will succeed even when Jumbo Frames is not configured?
>
> -peter
>
>
>
>
> Bart Goossens wrote:
>> How are you logged in to the 2nd node? If you use vnc, it is possible
>> to encounter problems when you run the root.sh and you didn't start
>> the vncserver as root.
>>
>> Peter Santos wrote:
>>> Folks,
>>> I'm trying to install a 2nd node on my 2 node test cluster and I
>>> can't seem to
>>> get past running the root.sh script on the 2nd node. Whenever I
>>> execute the root.sh
>>> on the newly added node (this is the very last step of the
>>> installer), the CSS deamon
>>> doesn't come up and eventually it reboots my original node.
>>>
>>> From the ocssd.log files, I can tell that it has something to do
>>> with the 2 nodes speaking
>>> to each other ... either via the ocr/vote disks or network
>>> connectivity.
>>>
>>> I've setup my raw partitions via fdisk, bound them in /etc/raw and
>>> setup permissions in udev.permissions.
>>> I've even cksum'd all raw devices from both nodes .. and it all
>>> looks good.
>>>
>>> Could I be missing something else? Any ideas?
>>>
>>> Here is that the ocssd.log complains about.
>>>
>>> [ CSSD]2007-08-09 15:40:27.547 >USER: CSS daemon log for node
>>> sdbe3, number 2, in cluster oracm_crs
>>> [ clsdmt]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=sdbe3DBG_CSSD))
>>> [ CSSD]2007-08-09 15:40:27.642 [2546082016] >TRACE: clssscmain:
>>> local-only set to false
>>> [ CSSD]2007-08-09 15:40:34.506 [2546082016] >TRACE:
>>> clssnmReadNodeInfo: added node 1 (sdbe1) to cluster
>>> [ CSSD]2007-08-09 15:40:34.543 [2546082016] >TRACE:
>>> clssnmReadNodeInfo: added node 2 (sdbe3) to cluster
>>> [ CSSD]2007-08-09 15:40:34.548 [1082145120] >TRACE:
>>> clssnm_skgxnmon: skgxn init failed, rc 1
>>> [ CSSD]2007-08-09 15:40:34.548 [2546082016] >TRACE:
>>> clssnm_skgxnonline: Using vacuous skgxn monitor
>>> [ CSSD]2007-08-09 15:40:37.912 [2546082016] >TRACE:
>>> clssnmInitNMInfo: misscount set to 60
>>> [ CSSD]2007-08-09 15:40:37.918 [2546082016] >TRACE:
>>> clssnmDiskStateChange: state from 1 to 2 disk (0//dev/raw/raw1)
>>> [ CSSD]2007-08-09 15:40:37.979 [2546082016] >TRACE:
>>> clssnmDiskStateChange: state from 1 to 2 disk (1//dev/raw/raw3)
>>> [ CSSD]2007-08-09 15:40:37.981 [2546082016] >TRACE:
>>> clssnmDiskStateChange: state from 1 to 2 disk (2//dev/raw/raw5)
>>> [ CSSD]2007-08-09 15:40:40.816 [1084246368] >TRACE:
>>> clssnmDiskStateChange: state from 2 to 4 disk (1//dev/raw/raw3)
>>> [ CSSD]2007-08-09 15:40:40.825 [1082145120] >TRACE:
>>> clssnmDiskStateChange: state from 2 to 4 disk (0//dev/raw/raw1)
>>> [ CSSD]2007-08-09 15:40:40.830 [1084246368] >TRACE:
>>> clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(483) LATS(0)
>>> Disk lastSeqNo(483)
>>> [ CSSD]2007-08-09 15:40:40.837 [1082145120] >TRACE:
>>> clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(483) LATS(0)
>>> Disk lastSeqNo(483)
>>> [ CSSD]2007-08-09 15:40:41.767 [1086347616] >TRACE:
>>> clssnmDiskStateChange: state from 2 to 4 disk (2//dev/raw/raw5)
>>> [ CSSD]2007-08-09 15:40:41.779 [1086347616] >TRACE:
>>> clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(484) LATS(0)
>>> Disk lastSeqNo(484)
>>> [ CSSD]2007-08-09 15:40:41.797 [2546082016] >TRACE:
>>> clssscSclsFatal: read value of disable
>>> [ CSSD]2007-08-09 15:40:41.797 [1090550112] >TRACE:
>>> clssnmFatalThread: spawned
>>> [ CSSD]2007-08-09 15:40:41.797 [2546082016] >TRACE:
>>> clssscSclsFatal: read value of disable
>>> [ CSSD]2007-08-09 15:40:41.798 [1092651360] >TRACE: clssnmconnect:
>>> connecting to node 2, flags 0x0001, connector 1
>>> [ CSSD]2007-08-09 15:40:41.798 [1092651360] >TRACE: clssnmconnect:
>>> connecting to node 0, flags 0x0000, connector 1
>>> [ CSSD]2007-08-09 15:40:41.799 [1092651360] >TRACE: clssnmconnect:
>>> connecting to node 1, flags 0x0001, connector 0
>>> [ CSSD]2007-08-09 15:40:41.801 [1094752608] >TRACE:
>>> clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)
>>> (KEY=Oracle_CSS_LclLstnr_oracm_crs_2))
>>> [ CSSD]2007-08-09 15:40:41.801 [1094752608] >TRACE:
>>> clssgmclientlsnr: listening on
>>> (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_sdbe3_oracm_crs))
>>> [ CSSD]2007-08-09 15:40:42.832 [1092651360] >TRACE:
>>> clssnmConnComplete: connected to node 1 (con 0x2a981016c0),
>>> state 3 birth 0, unique 1186687891/1186687891 prevConuni(0)
>>> [ CSSD]2007-08-09 15:40:43.307 [1105258848] >TRACE:
>>> clssnmSendingThread: Connection complete
>>> [ CSSD]2007-08-09 15:40:43.307 [1103157600] >TRACE:
>>> clssnmPollingThread: Connection complete
>>> [ CSSD]2007-08-09 15:40:43.307 [1107360096] >TRACE:
>>> clssnmRcfgMgrThread: Connection complete
>>> [ CSSD]2007-08-09 15:40:43.307 [1107360096] >TRACE:
>>> clssnmRcfgMgrThread: Local Join
>>> [ CSSD]2007-08-09 15:40:43.307 [1107360096] >TRACE:
>>> clssnmLocalJoinEvent: set node(1) inactive
>>> [ CSSD]2007-08-09 15:40:43.307 [1107360096] >WARNING:
>>> clssnmLocalJoinEvent: takeover aborted due to UNKNOWN nodes
>>> [ CSSD]2007-08-09 15:40:43.992 [1092651360] >TRACE:
>>> clssnmHandleSync: Acknowledging sync: src[1] srcName[sdbe1] seq[5]
>>> sync[2]
>>> [ CSSD]2007-08-09 15:40:44.309 [1107360096] >TRACE:
>>> clssnmRcfgMgrThread: lastleader(1) unique(1186688418)
>>> [ CSSD]2007-08-09 15:40:44.994 [1092651360] >TRACE:
>>> clssnmSendVoteInfo: node(1) syncSeqNo(2)
>>> [ CSSD]2007-08-09 15:40:46.998 [1092651360] >TRACE:
>>> clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0)
>>> birth (0/0)
>>> (old/new)
>>> [ CSSD]2007-08-09 15:40:46.998 [1092651360] >TRACE:
>>> clssnmDeactivateNode: node 0 () left cluster
>>>
>>> [ CSSD]2007-08-09 15:40:46.998 [1092651360] >TRACE:
>>> clssnmUpdateNodeState: node 1, state (4/3) unique
>>> (1186687891/1186687891)
>>> prevConuni(0) birth (0/1) (old/new)
>>> [ CSSD]2007-08-09 15:40:46.998 [1092651360] >TRACE:
>>> clssnmUpdateNodeState: node 2, state (1/2) unique
>>> (1186688418/1186688418)
>>> prevConuni(0) birth (0/2) (old/new)
>>> [ CSSD]2007-08-09 15:40:46.998 [1092651360] >USER:
>>> clssnmHandleUpdate: SYNC(2) from node(1) completed
>>> [ CSSD]2007-08-09 15:40:46.998 [1092651360] >USER:
>>> clssnmHandleUpdate: NODE 1 (sdbe1) IS ACTIVE MEMBER OF CLUSTER
>>> [ CSSD]2007-08-09 15:40:46.998 [1092651360] >USER:
>>> clssnmHandleUpdate: NODE 2 (sdbe3) IS ACTIVE MEMBER OF CLUSTER
>>> [ CSSD]2007-08-09 15:40:47.002 [2546082016] >USER: NMEVENT_SUSPEND
>>> [00][00][00][00]
>>> [ CSSD]2007-08-09 15:40:47.003 [1109461344] >TRACE:
>>> clssgmReconfigThread: started for reconfig (2)
>>> [ CSSD]2007-08-09 15:40:47.003 [1109461344] >USER:
>>> NMEVENT_RECONFIG [00][00][00][06]
>>> [ CSSD]2007-08-09 15:40:47.003 [1109461344] >TRACE:
>>> clssgmEstablishConnections: 2 nodes in cluster incarn 2
>>> [ CSSD]2007-08-09 15:40:47.075 [1101056352] >TRACE:
>>> clssgmInitialRecv: (0x774770) accepted a new
>>> connection from node 1 born at 1 active (2, 2), vers (10,3,1,2)
>>> [ CSSD]2007-08-09 15:40:47.075 [1101056352] >TRACE:
>>> clssgmInitialRecv: conns done (2/2)
>>> [ CSSD]2007-08-09 15:40:47.075 [1109461344] >TRACE:
>>> clssgmEstablishMasterNode: MASTER for 2 is node(1) birth(1)
>>> [ CSSD]2007-08-09 15:40:47.075 [1109461344] >TRACE:
>>> clssgmChangeMasterNode: requeued 0 RPCs
>>> [ CSSD]2007-08-09 15:40:47.590 [1084246368] >TRACE:
>>> clssnmvFatalCheck: extra node 1
>>> [ CSSD]2007-08-09 15:40:47.590 [1084246368] >TRACE:
>>> clssnmvFatalCheck: fatal 1, sclsfatal 0
>>> [ CSSD]2007-08-09 15:40:47.593 [1086347616] >TRACE:
>>> clssnmvFatalCheck: extra node 1
>>> [ CSSD]2007-08-09 15:40:47.593 [1086347616] >TRACE:
>>> clssnmvFatalCheck: fatal 1, sclsfatal 0
>>> [ CSSD]2007-08-09 15:40:47.600 [1082145120] >TRACE:
>>> clssnmvFatalCheck: extra node 1
>>> [ CSSD]2007-08-09 15:40:47.600 [1082145120] >TRACE:
>>> clssnmvFatalCheck: fatal 1, sclsfatal 0
>>> [ CSSD]2007-08-09 15:40:47.824 [1090550112] >TRACE:
>>> clssnmFatalThread: Fatal mode enabled
>>> [ CSSD]2007-08-09 15:40:48.045 [1092651360] >TRACE:
>>> clssnmSendFatalOn: req to syncLeader(1)
>>> [ CSSD]2007-08-09 15:40:51.322 [1103157600] >TRACE:
>>> clssnmPollingThread: node sdbe3 (2) missed(2) checkin(s)
>>> [ CSSD]2007-08-09 15:41:17.132 [1109461344] >ERROR:
>>> clssgmSlaveCMSync: reconfig timeout on master 1
>>>
>>> [ CSSD]2007-08-09 15:41:17.132 [1109461344] >TRACE:
>>> clssgmReconfigThread: completed for reconfig(2), with status(0)
>>> [ CSSD]2007-08-09 15:41:17.190 [2546082016] >ERROR:
>>> clssgmStartNMMon: reconfig incarn 2 failed. Retrying.
>>>
>>>
>>>
>>>
>>
>>
>
> --
> To unsubscribe, email: suse-oracle-unsubscribe@(protected)
> For additional commands, email: suse-oracle-help@(protected)
> Please see http://www.suse.com/oracle/ before posting
>
>
--
To unsubscribe, email: suse-oracle-unsubscribe@(protected)
For additional commands, email: suse-oracle-help@(protected)
Please see http://www.suse.com/oracle/ before posting