Check List:
-- are all IP resolved on second node and on the fist 1 to the corerct IP:
host1, host1-priv, host1-vip
host2, host2-priv, hgost2-vip
(use your names)
-- Are privare anbd public on the same interface on all nodes?
-- Are all jost names short enough (I'd better avoid names > 8 symbols).
-- For OCRFile, is output of 'od <your-OCR-file> | head -100' the same on
all nodes.
- For CSS File, the same
-- Is CSSFile writable by Oracle user on all nodes ('disk' group is not
enough).
-- Is linux the same (uname -a')?
-- can you ping host1-priv from host-2? Vice versa?
-- can oracle slogin from host1 to host2? Vice versa?
(I dont remember exact command, but check OCRFile configuration and CSSFile
configuration on all nodes).
----- Original Message -----
From: "Peter Santos" <psantos@(protected)>
To: <suse-oracle@(protected)>
Sent: Friday, August 10, 2007 6:38 AM
Subject: [suse-oracle] re: 10.2.0.3 - root.sh fails when adding 2nd node ?
> Folks,
> I'm trying to install a 2nd node on my 2 node test cluster and I
> can't seem to
> get past running the root.sh script on the 2nd node. Whenever I
> execute the root.sh
> on the newly added node (this is the very last step of the
> installer), the CSS deamon
> doesn't come up and eventually it reboots my original node.
>
> From the ocssd.log files, I can tell that it has something to do
> with the 2 nodes speaking
> to each other ... either via the ocr/vote disks or network
> connectivity.
>
> I've setup my raw partitions via fdisk, bound them in /etc/raw and
> setup permissions in udev.permissions.
> I've even cksum'd all raw devices from both nodes .. and it all
> looks good.
>
> Could I be missing something else? Any ideas?
>
> Here is that the ocssd.log complains about.
>
> [ CSSD]2007-08-09 15:40:27.547 >USER: CSS daemon log for node
> sdbe3, number 2, in cluster oracm_crs
> [ clsdmt]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=sdbe3DBG_CSSD))
> [ CSSD]2007-08-09 15:40:27.642 [2546082016] >TRACE: clssscmain:
> local-only set to false
> [ CSSD]2007-08-09 15:40:34.506 [2546082016] >TRACE:
> clssnmReadNodeInfo: added node 1 (sdbe1) to cluster
> [ CSSD]2007-08-09 15:40:34.543 [2546082016] >TRACE:
> clssnmReadNodeInfo: added node 2 (sdbe3) to cluster
> [ CSSD]2007-08-09 15:40:34.548 [1082145120] >TRACE:
> clssnm_skgxnmon: skgxn init failed, rc 1
> [ CSSD]2007-08-09 15:40:34.548 [2546082016] >TRACE:
> clssnm_skgxnonline: Using vacuous skgxn monitor
> [ CSSD]2007-08-09 15:40:37.912 [2546082016] >TRACE:
> clssnmInitNMInfo: misscount set to 60
> [ CSSD]2007-08-09 15:40:37.918 [2546082016] >TRACE:
> clssnmDiskStateChange: state from 1 to 2 disk (0//dev/raw/raw1)
> [ CSSD]2007-08-09 15:40:37.979 [2546082016] >TRACE:
> clssnmDiskStateChange: state from 1 to 2 disk (1//dev/raw/raw3)
> [ CSSD]2007-08-09 15:40:37.981 [2546082016] >TRACE:
> clssnmDiskStateChange: state from 1 to 2 disk (2//dev/raw/raw5)
> [ CSSD]2007-08-09 15:40:40.816 [1084246368] >TRACE:
> clssnmDiskStateChange: state from 2 to 4 disk (1//dev/raw/raw3)
> [ CSSD]2007-08-09 15:40:40.825 [1082145120] >TRACE:
> clssnmDiskStateChange: state from 2 to 4 disk (0//dev/raw/raw1)
> [ CSSD]2007-08-09 15:40:40.830 [1084246368] >TRACE:
> clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(483) LATS(0)
> Disk lastSeqNo(483)
> [ CSSD]2007-08-09 15:40:40.837 [1082145120] >TRACE:
> clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(483) LATS(0)
> Disk lastSeqNo(483)
> [ CSSD]2007-08-09 15:40:41.767 [1086347616] >TRACE:
> clssnmDiskStateChange: state from 2 to 4 disk (2//dev/raw/raw5)
> [ CSSD]2007-08-09 15:40:41.779 [1086347616] >TRACE:
> clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(484) LATS(0)
> Disk lastSeqNo(484)
> [ CSSD]2007-08-09 15:40:41.797 [2546082016] >TRACE:
> clssscSclsFatal: read value of disable
> [ CSSD]2007-08-09 15:40:41.797 [1090550112] >TRACE:
> clssnmFatalThread: spawned
> [ CSSD]2007-08-09 15:40:41.797 [2546082016] >TRACE:
> clssscSclsFatal: read value of disable
> [ CSSD]2007-08-09 15:40:41.798 [1092651360] >TRACE: clssnmconnect:
> connecting to node 2, flags 0x0001, connector 1
> [ CSSD]2007-08-09 15:40:41.798 [1092651360] >TRACE: clssnmconnect:
> connecting to node 0, flags 0x0000, connector 1
> [ CSSD]2007-08-09 15:40:41.799 [1092651360] >TRACE: clssnmconnect:
> connecting to node 1, flags 0x0001, connector 0
> [ CSSD]2007-08-09 15:40:41.801 [1094752608] >TRACE:
> clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)
> (KEY=Oracle_CSS_LclLstnr_oracm_crs_2))
> [ CSSD]2007-08-09 15:40:41.801 [1094752608] >TRACE:
> clssgmclientlsnr: listening on
> (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_sdbe3_oracm_crs))
> [ CSSD]2007-08-09 15:40:42.832 [1092651360] >TRACE:
> clssnmConnComplete: connected to node 1 (con 0x2a981016c0),
> state 3 birth 0, unique 1186687891/1186687891 prevConuni(0)
> [ CSSD]2007-08-09 15:40:43.307 [1105258848] >TRACE:
> clssnmSendingThread: Connection complete
> [ CSSD]2007-08-09 15:40:43.307 [1103157600] >TRACE:
> clssnmPollingThread: Connection complete
> [ CSSD]2007-08-09 15:40:43.307 [1107360096] >TRACE:
> clssnmRcfgMgrThread: Connection complete
> [ CSSD]2007-08-09 15:40:43.307 [1107360096] >TRACE:
> clssnmRcfgMgrThread: Local Join
> [ CSSD]2007-08-09 15:40:43.307 [1107360096] >TRACE:
> clssnmLocalJoinEvent: set node(1) inactive
> [ CSSD]2007-08-09 15:40:43.307 [1107360096] >WARNING:
> clssnmLocalJoinEvent: takeover aborted due to UNKNOWN nodes
> [ CSSD]2007-08-09 15:40:43.992 [1092651360] >TRACE:
> clssnmHandleSync: Acknowledging sync: src[1] srcName[sdbe1] seq[5] sync[2]
> [ CSSD]2007-08-09 15:40:44.309 [1107360096] >TRACE:
> clssnmRcfgMgrThread: lastleader(1) unique(1186688418)
> [ CSSD]2007-08-09 15:40:44.994 [1092651360] >TRACE:
> clssnmSendVoteInfo: node(1) syncSeqNo(2)
> [ CSSD]2007-08-09 15:40:46.998 [1092651360] >TRACE:
> clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0)
> birth (0/0)
> (old/new)
> [ CSSD]2007-08-09 15:40:46.998 [1092651360] >TRACE:
> clssnmDeactivateNode: node 0 () left cluster
>
> [ CSSD]2007-08-09 15:40:46.998 [1092651360] >TRACE:
> clssnmUpdateNodeState: node 1, state (4/3) unique (1186687891/1186687891)
> prevConuni(0) birth (0/1) (old/new)
> [ CSSD]2007-08-09 15:40:46.998 [1092651360] >TRACE:
> clssnmUpdateNodeState: node 2, state (1/2) unique (1186688418/1186688418)
> prevConuni(0) birth (0/2) (old/new)
> [ CSSD]2007-08-09 15:40:46.998 [1092651360] >USER:
> clssnmHandleUpdate: SYNC(2) from node(1) completed
> [ CSSD]2007-08-09 15:40:46.998 [1092651360] >USER:
> clssnmHandleUpdate: NODE 1 (sdbe1) IS ACTIVE MEMBER OF CLUSTER
> [ CSSD]2007-08-09 15:40:46.998 [1092651360] >USER:
> clssnmHandleUpdate: NODE 2 (sdbe3) IS ACTIVE MEMBER OF CLUSTER
> [ CSSD]2007-08-09 15:40:47.002 [2546082016] >USER: NMEVENT_SUSPEND
> [00][00][00][00]
> [ CSSD]2007-08-09 15:40:47.003 [1109461344] >TRACE:
> clssgmReconfigThread: started for reconfig (2)
> [ CSSD]2007-08-09 15:40:47.003 [1109461344] >USER:
> NMEVENT_RECONFIG [00][00][00][06]
> [ CSSD]2007-08-09 15:40:47.003 [1109461344] >TRACE:
> clssgmEstablishConnections: 2 nodes in cluster incarn 2
> [ CSSD]2007-08-09 15:40:47.075 [1101056352] >TRACE:
> clssgmInitialRecv: (0x774770) accepted a new
> connection from node 1 born at 1 active (2, 2), vers (10,3,1,2)
> [ CSSD]2007-08-09 15:40:47.075 [1101056352] >TRACE:
> clssgmInitialRecv: conns done (2/2)
> [ CSSD]2007-08-09 15:40:47.075 [1109461344] >TRACE:
> clssgmEstablishMasterNode: MASTER for 2 is node(1) birth(1)
> [ CSSD]2007-08-09 15:40:47.075 [1109461344] >TRACE:
> clssgmChangeMasterNode: requeued 0 RPCs
> [ CSSD]2007-08-09 15:40:47.590 [1084246368] >TRACE:
> clssnmvFatalCheck: extra node 1
> [ CSSD]2007-08-09 15:40:47.590 [1084246368] >TRACE:
> clssnmvFatalCheck: fatal 1, sclsfatal 0
> [ CSSD]2007-08-09 15:40:47.593 [1086347616] >TRACE:
> clssnmvFatalCheck: extra node 1
> [ CSSD]2007-08-09 15:40:47.593 [1086347616] >TRACE:
> clssnmvFatalCheck: fatal 1, sclsfatal 0
> [ CSSD]2007-08-09 15:40:47.600 [1082145120] >TRACE:
> clssnmvFatalCheck: extra node 1
> [ CSSD]2007-08-09 15:40:47.600 [1082145120] >TRACE:
> clssnmvFatalCheck: fatal 1, sclsfatal 0
> [ CSSD]2007-08-09 15:40:47.824 [1090550112] >TRACE:
> clssnmFatalThread: Fatal mode enabled
> [ CSSD]2007-08-09 15:40:48.045 [1092651360] >TRACE:
> clssnmSendFatalOn: req to syncLeader(1)
> [ CSSD]2007-08-09 15:40:51.322 [1103157600] >TRACE:
> clssnmPollingThread: node sdbe3 (2) missed(2) checkin(s)
> [ CSSD]2007-08-09 15:41:17.132 [1109461344] >ERROR:
> clssgmSlaveCMSync: reconfig timeout on master 1
>
> [ CSSD]2007-08-09 15:41:17.132 [1109461344] >TRACE:
> clssgmReconfigThread: completed for reconfig(2), with status(0)
> [ CSSD]2007-08-09 15:41:17.190 [2546082016] >ERROR:
> clssgmStartNMMon: reconfig incarn 2 failed. Retrying.
>
>
>
> --
> To unsubscribe, email: suse-oracle-unsubscribe@(protected)
> For additional commands, email: suse-oracle-help@(protected)
> Please see http://www.suse.com/oracle/ before posting
>
>
--
To unsubscribe, email: suse-oracle-unsubscribe@(protected)
For additional commands, email: suse-oracle-help@(protected)
Please see http://www.suse.com/oracle/ before posting