Java Mailing List Archive

http://www.dba.5341.com/

Home » Home (12/2007) » suse oracle »

Re: [suse-oracle] re: 10.2.0.3 - root.sh fails when adding 2nd node
?

Peter Santos

2007-08-11

Replies:

We finally figured it out.

Turns out that when our sa's gave us the configured machine, the MTU on
the private interconnect was set to Jumbo Frames (MTU = 9000), and
I remember running a check where I ping'd each private interconnect with
a ping -s 9000 <ip of interconnect> .. and it worked. Maybe the
switch wasn't setup to handle jumbo frames... anyway when we set the MTU
back to 1500 it all worked like a charm .. only wasted about
3 days.

Does anyone know of the best way to ensure that Jumbo Frames is working
.. This is not the firs time our SA's have told me that Jumbo Frames
was configured on the switch when it wasn't. I wonder if there is a good
way to test this without bringing up the web interface to the
switch etc .. It seems like sending the interconnect IP's large packets
will succeed even when Jumbo Frames is not configured?

-peter




Bart Goossens wrote:
> How are you logged in to the 2nd node? If you use vnc, it is possible
> to encounter problems when you run the root.sh and you didn't start
> the vncserver as root.
>
> Peter Santos wrote:
>> Folks,
>>   I'm trying to install a 2nd node on my 2 node test cluster and I
>> can't seem to
>>   get past running the root.sh script on the 2nd node. Whenever I
>> execute the root.sh
>>   on the newly added node (this is the very last step of the
>> installer), the CSS deamon
>>   doesn't come up and eventually it reboots my original node.
>>
>>   From the ocssd.log files, I can tell that it has something to do
>> with the 2 nodes speaking
>>   to each other ... either via the ocr/vote disks or network
>> connectivity.
>>
>>   I've setup my raw partitions via fdisk, bound them in /etc/raw and
>> setup permissions in udev.permissions.
>>   I've even cksum'd all raw devices from both nodes .. and it all
>> looks good.
>>
>>   Could I be missing something else? Any ideas?
>>
>>   Here is that the ocssd.log complains about.
>>
>> [   CSSD]2007-08-09 15:40:27.547 >USER:   CSS daemon log for node
>> sdbe3, number 2, in cluster oracm_crs
>> [ clsdmt]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=sdbe3DBG_CSSD))
>> [   CSSD]2007-08-09 15:40:27.642 [2546082016] >TRACE:  clssscmain:
>> local-only set to false
>> [   CSSD]2007-08-09 15:40:34.506 [2546082016] >TRACE:
>> clssnmReadNodeInfo: added node 1 (sdbe1) to cluster
>> [   CSSD]2007-08-09 15:40:34.543 [2546082016] >TRACE:
>> clssnmReadNodeInfo: added node 2 (sdbe3) to cluster
>> [   CSSD]2007-08-09 15:40:34.548 [1082145120] >TRACE:
>> clssnm_skgxnmon: skgxn init failed, rc 1
>> [   CSSD]2007-08-09 15:40:34.548 [2546082016] >TRACE:
>> clssnm_skgxnonline: Using vacuous skgxn monitor
>> [   CSSD]2007-08-09 15:40:37.912 [2546082016] >TRACE:
>> clssnmInitNMInfo: misscount set to 60
>> [   CSSD]2007-08-09 15:40:37.918 [2546082016] >TRACE:
>> clssnmDiskStateChange: state from 1 to 2 disk (0//dev/raw/raw1)
>> [   CSSD]2007-08-09 15:40:37.979 [2546082016] >TRACE:
>> clssnmDiskStateChange: state from 1 to 2 disk (1//dev/raw/raw3)
>> [   CSSD]2007-08-09 15:40:37.981 [2546082016] >TRACE:
>> clssnmDiskStateChange: state from 1 to 2 disk (2//dev/raw/raw5)
>> [   CSSD]2007-08-09 15:40:40.816 [1084246368] >TRACE:
>> clssnmDiskStateChange: state from 2 to 4 disk (1//dev/raw/raw3)
>> [   CSSD]2007-08-09 15:40:40.825 [1082145120] >TRACE:
>> clssnmDiskStateChange: state from 2 to 4 disk (0//dev/raw/raw1)
>> [   CSSD]2007-08-09 15:40:40.830 [1084246368] >TRACE:
>> clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(483) LATS(0)
>> Disk lastSeqNo(483)
>> [   CSSD]2007-08-09 15:40:40.837 [1082145120] >TRACE:
>> clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(483) LATS(0)
>> Disk lastSeqNo(483)
>> [   CSSD]2007-08-09 15:40:41.767 [1086347616] >TRACE:
>> clssnmDiskStateChange: state from 2 to 4 disk (2//dev/raw/raw5)
>> [   CSSD]2007-08-09 15:40:41.779 [1086347616] >TRACE:
>> clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(484) LATS(0)
>> Disk lastSeqNo(484)
>> [   CSSD]2007-08-09 15:40:41.797 [2546082016] >TRACE:
>> clssscSclsFatal: read value of disable
>> [   CSSD]2007-08-09 15:40:41.797 [1090550112] >TRACE:
>> clssnmFatalThread: spawned
>> [   CSSD]2007-08-09 15:40:41.797 [2546082016] >TRACE:
>> clssscSclsFatal: read value of disable
>> [   CSSD]2007-08-09 15:40:41.798 [1092651360] >TRACE:  clssnmconnect:
>> connecting to node 2, flags 0x0001, connector 1
>> [   CSSD]2007-08-09 15:40:41.798 [1092651360] >TRACE:  clssnmconnect:
>> connecting to node 0, flags 0x0000, connector 1
>> [   CSSD]2007-08-09 15:40:41.799 [1092651360] >TRACE:  clssnmconnect:
>> connecting to node 1, flags 0x0001, connector 0
>> [   CSSD]2007-08-09 15:40:41.801 [1094752608] >TRACE:
>> clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)
>>    (KEY=Oracle_CSS_LclLstnr_oracm_crs_2))
>> [   CSSD]2007-08-09 15:40:41.801 [1094752608] >TRACE:
>> clssgmclientlsnr: listening on
>> (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_sdbe3_oracm_crs))
>> [   CSSD]2007-08-09 15:40:42.832 [1092651360] >TRACE:
>> clssnmConnComplete: connected to node 1 (con 0x2a981016c0),
>>    state 3 birth 0, unique 1186687891/1186687891 prevConuni(0)
>> [   CSSD]2007-08-09 15:40:43.307 [1105258848] >TRACE:
>> clssnmSendingThread: Connection complete
>> [   CSSD]2007-08-09 15:40:43.307 [1103157600] >TRACE:
>> clssnmPollingThread: Connection complete
>> [   CSSD]2007-08-09 15:40:43.307 [1107360096] >TRACE:
>> clssnmRcfgMgrThread: Connection complete
>> [   CSSD]2007-08-09 15:40:43.307 [1107360096] >TRACE:
>> clssnmRcfgMgrThread: Local Join
>> [   CSSD]2007-08-09 15:40:43.307 [1107360096] >TRACE:
>> clssnmLocalJoinEvent: set node(1) inactive
>> [   CSSD]2007-08-09 15:40:43.307 [1107360096] >WARNING:
>> clssnmLocalJoinEvent: takeover aborted due to UNKNOWN nodes
>> [   CSSD]2007-08-09 15:40:43.992 [1092651360] >TRACE:
>> clssnmHandleSync: Acknowledging sync: src[1] srcName[sdbe1] seq[5]
>> sync[2]
>> [   CSSD]2007-08-09 15:40:44.309 [1107360096] >TRACE:
>> clssnmRcfgMgrThread: lastleader(1) unique(1186688418)
>> [   CSSD]2007-08-09 15:40:44.994 [1092651360] >TRACE:
>> clssnmSendVoteInfo: node(1) syncSeqNo(2)
>> [   CSSD]2007-08-09 15:40:46.998 [1092651360] >TRACE:
>> clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0)
>> birth (0/0)
>>    (old/new)
>> [   CSSD]2007-08-09 15:40:46.998 [1092651360] >TRACE:
>> clssnmDeactivateNode: node 0 () left cluster
>>
>> [   CSSD]2007-08-09 15:40:46.998 [1092651360] >TRACE:
>> clssnmUpdateNodeState: node 1, state (4/3) unique
>> (1186687891/1186687891)
>>     prevConuni(0) birth (0/1) (old/new)
>> [   CSSD]2007-08-09 15:40:46.998 [1092651360] >TRACE:
>> clssnmUpdateNodeState: node 2, state (1/2) unique
>> (1186688418/1186688418)
>>    prevConuni(0) birth (0/2) (old/new)
>> [   CSSD]2007-08-09 15:40:46.998 [1092651360] >USER:
>> clssnmHandleUpdate: SYNC(2) from node(1) completed
>> [   CSSD]2007-08-09 15:40:46.998 [1092651360] >USER:
>> clssnmHandleUpdate: NODE 1 (sdbe1) IS ACTIVE MEMBER OF CLUSTER
>> [   CSSD]2007-08-09 15:40:46.998 [1092651360] >USER:
>> clssnmHandleUpdate: NODE 2 (sdbe3) IS ACTIVE MEMBER OF CLUSTER
>> [   CSSD]2007-08-09 15:40:47.002 [2546082016] >USER:   NMEVENT_SUSPEND
>> [00][00][00][00]
>> [   CSSD]2007-08-09 15:40:47.003 [1109461344] >TRACE:
>> clssgmReconfigThread: started for reconfig (2)
>> [   CSSD]2007-08-09 15:40:47.003 [1109461344] >USER:
>> NMEVENT_RECONFIG [00][00][00][06]
>> [   CSSD]2007-08-09 15:40:47.003 [1109461344] >TRACE:
>> clssgmEstablishConnections: 2 nodes in cluster incarn 2
>> [   CSSD]2007-08-09 15:40:47.075 [1101056352] >TRACE:
>> clssgmInitialRecv: (0x774770) accepted a new
>>     connection from node 1 born at 1 active (2, 2), vers (10,3,1,2)
>> [   CSSD]2007-08-09 15:40:47.075 [1101056352] >TRACE:
>> clssgmInitialRecv: conns done (2/2)
>> [   CSSD]2007-08-09 15:40:47.075 [1109461344] >TRACE:
>> clssgmEstablishMasterNode: MASTER for 2 is node(1) birth(1)
>> [   CSSD]2007-08-09 15:40:47.075 [1109461344] >TRACE:
>> clssgmChangeMasterNode: requeued 0 RPCs
>> [   CSSD]2007-08-09 15:40:47.590 [1084246368] >TRACE:
>> clssnmvFatalCheck: extra node 1
>> [   CSSD]2007-08-09 15:40:47.590 [1084246368] >TRACE:
>> clssnmvFatalCheck: fatal 1, sclsfatal 0
>> [   CSSD]2007-08-09 15:40:47.593 [1086347616] >TRACE:
>> clssnmvFatalCheck: extra node 1
>> [   CSSD]2007-08-09 15:40:47.593 [1086347616] >TRACE:
>> clssnmvFatalCheck: fatal 1, sclsfatal 0
>> [   CSSD]2007-08-09 15:40:47.600 [1082145120] >TRACE:
>> clssnmvFatalCheck: extra node 1
>> [   CSSD]2007-08-09 15:40:47.600 [1082145120] >TRACE:
>> clssnmvFatalCheck: fatal 1, sclsfatal 0
>> [   CSSD]2007-08-09 15:40:47.824 [1090550112] >TRACE:
>> clssnmFatalThread: Fatal mode enabled
>> [   CSSD]2007-08-09 15:40:48.045 [1092651360] >TRACE:
>> clssnmSendFatalOn: req to syncLeader(1)
>> [   CSSD]2007-08-09 15:40:51.322 [1103157600] >TRACE:
>> clssnmPollingThread: node sdbe3 (2) missed(2) checkin(s)
>> [   CSSD]2007-08-09 15:41:17.132 [1109461344] >ERROR:
>> clssgmSlaveCMSync: reconfig timeout on master 1
>>
>> [   CSSD]2007-08-09 15:41:17.132 [1109461344] >TRACE:
>> clssgmReconfigThread: completed for reconfig(2), with status(0)
>> [   CSSD]2007-08-09 15:41:17.190 [2546082016] >ERROR:
>> clssgmStartNMMon: reconfig incarn 2 failed. Retrying.
>>
>>
>>
>>  
>
>

--
To unsubscribe, email: suse-oracle-unsubscribe@(protected)
For additional commands, email: suse-oracle-help@(protected)
Please see http://www.suse.com/oracle/ before posting

©2008 dba.5341.com - Jax Systems, LLC, U.S.A.