Oracle Consulting Oracle Training Oracle Support Development
Home
Catalog
Oracle Books
SQL Server Books
IT Books
Job Interview Books
eBooks
Rampant Horse Books
Rampant Horse Books
911 Series
Pedagogue Books

Oracle Software
image
image
Write for Rampant
Publish with Rampant
Rampant News
Rampant Authors
Rampant Staff
  Phone
  252-431-0050
Oracle News
Oracle Forum
Oracle Tips
Articles by our Authors
Press Releases
SQL Server Books
image
image

Oracle 11g Books

Oracle tuning

Oracle training

Oracle support

Remote Oracle

STATSPACK Viewer

    Privacy Policy

 

 

Build Your Own Oracle RAC 10g Release 2 Cluster on Linux and FireWire
by Jeffrey Hunter - OTN

Install Oracle 10g Clusterware Software

Perform the following installation procedures on only one node in the cluster! The Oracle Clusterware software will be installed to all other nodes in the cluster by the Oracle Universal Installer.

You are now ready to install the "cluster" part of the environment: the Oracle Clusterware. In the previous section, you downloaded and extracted the install files for Oracle Clusterware to linux1 in the directory /u01/app/oracle/orainstall/clusterware. This is the only nodefrom which you need to perform the install.

During the installation of Oracle Clusterware, you will be asked for the nodes involved and to configure in the RAC cluster. Once the actual installation starts, it will copy the required software to all nodes using the remote access we configured in the section Section 13 ("Configure RAC Nodes for Remote Access").

So, what exactly is the Oracle Clusterware responsible for?

It contains all of the cluster and database configuration metadata along with several system management features for RAC. It allows the DBA to register and invite an Oracle instance (or instances) to the cluster. During normal operation, Oracle Clusterware will send messages (via a special ping operation) to all nodes configured in the cluster, often called the "heartbeat." If the heartbeat fails for any of the nodes, it checks with the Oracle Clusterware configuration files (on the shared disk) to distinguish between a real node failure and a network failure.

After installing Oracle Clusterware, the Oracle Universal Installer (OUI) used to install the Oracle 10g database software (next section) will automatically recognize these nodes. Like the Oracle Clusterware install you will be performing in this section, the Oracle Database 10g software only needs to be run from one node. The OUI will copy the software packages to all nodes configured in the RAC cluster.

Oracle Clusterware Shared Files

The two shared files used by Oracle Clusterware will be stored on the OCFS2 filesystem we created earlier. The two shared Oracle Clusterware files are:

  • Oracle Cluster Registry (OCR)
    • Location: /u02/oradata/orcl/OCRFile
    • Size: ~ 100MB
  • CRS Voting Disk
    • Location: /u02/oradata/orcl/CSSFile
    • Size: ~ 20MB

Note: For our installation here, it is not possible to use ASM for the two Oracle Clusterware files (OCR or CRS Voting Disk). The problem is that these files need to be in place and accessible before any Oracle instances can be started. For ASM to be available, the ASM instance would need to be run first. The two shared files could be stored on the OCFS2, shared RAW devices, or another vendor's clustered file system.

Verifying Environment Variables

Before starting the OUI, you should first run the xhost command as root from the console to allow X Server connections. Then unset the ORACLE_HOME variable and verify that each of the nodes in the RAC cluster defines a unique ORACLE_SID. We also should verify that we are logged in as the oracle user account:

Login as oracle

# xhost +
access control disabled, clients can connect from any host

# su - oracle
Unset ORACLE_HOME
$ unset ORA_CRS_HOME
$ unset ORACLE_HOME
$ unset ORA_NLS10
$ unset TNS_ADMIN

Verify Environment Variables on linux1

$ env | grep ORA
ORACLE_SID=orcl1
ORACLE_BASE=/u01/app/oracle
ORACLE_TERM=xterm

Verify Environment Variables on linux2

$ env | grep ORA
ORACLE_SID=orcl2
ORACLE_BASE=/u01/app/oracle
ORACLE_TERM=xterm

Installing Cluster Ready Services

Note: CSS Timeout Computation in Oracle RAC 10g 10.1.0.3 Please note that after the Oracle Clusterware software is installed, you will need to modify the CSS timeout value for Clusterware. This is especially true for 10.1.0.3 and later as the CSS timeout is computed differently than with 10.1.0.2. Several problems have been documented as a result of the CSS daemon timing out starting with Oracle 10.1.0.3 on the Linux platform (including IA32, IA64, and x86-64). This has been a big problem for me in the past, especially during database creation (DBCA). During the database creation process, for example, it was not uncommon for the database creation process to fail with the error: ORA-03113: end-of-file on communication channel. The key error was reported in the log file $ORA_CRS_HOME/css/log/ocssd1.log as:
clssnmDiskPingMonitorThread: voting device access hanging (45010 miliseconds)
The problem is essentially slow disks and the default value for CSS misscount. The CSS misscount value is the number of heartbeats missed before CSS evicts a node. CSS uses this number to calculate the time after which an I/O to the voting disk should be considered timed out and thus terminating itself to prevent split brain conditions. The default value for CSS misscount on Linux for Oracle 10.1.0.2 and higher is 60. The formula for calculating the timeout value (in seconds), however, did change from release 10.1.0.2 to 10.1.0.3.

With 10.1.0.2, the timeout value was calculated as follows:

time_in_secs > CSS misscount, then EXIT
With the default value of 60, for example, the timeout period would be 60 seconds.

Starting with 10.1.0.3, the formula was changed to:

disktimeout_in_secs = MAX((3 * CSS misscount)/4, CSS misscount - 15)
Again, using the default CSS misscount value of 60, this would result in a timeout of 45 seconds.

This change was motivated mainly in order to allow for a faster cluster reconfiguration in case of node failure. With the default CSS misscount value of 60 in 10.1.0.2, we would have to wait at least 60 seconds for a timeout, where this same default value of 60 can be shaved by 15 seconds to 45 seconds starting with 10.1.0.3.

OK, so why all the talk about CSS misscount? As I mentioned earlier, I would often have the database creation process fail (or other high I/O loads to the system) from the Oracle Clusterware crashing. The high I/O would cause lengthy timeouts for CSS while attempting to query the voting disk. When the calculated timeout was exceeded, Oracle Clusterware crashed. This has been common with this article as the FireWire drives we are using are not the fastest. The slower the drive, the more often this will occur.

Well, the good news is that you can modify the CSS misscount value from its default value of 60 (for Linux) to allow for lengthier timeouts. For the drives you have been using with this article, you can get away with a CSS misscount value of 360. Although I haven't been able to verify this, I believe the CSS misscount can be set as large as 600.

So how do you modify the default value for CSS misscount? Well, there are several ways. The easiest way is to modify the root.sh for Oracle Clusterware before running it on each node in the cluster. (The instructions for modifying the root.sh script for Oracle Clusterware can be found here.)

If Oracle Clusterware is already installed, you can still modify the CSS misscount value using the $ORA_CRS_HOME/bin/crsctl command. (The instructions for verifying and modifying the CSS misscount using crsctl can be found in the section "Verify Oracle Clusterware / CSS misscount value".)

Perform the following tasks to install the Oracle Clusterware:

$ cd ~oracle
$ /u01/app/oracle/orainstall/clusterware/runInstaller -ignoreSysPrereqs

 

Screen Name Response
Welcome Screen Click Next
Specify Inventory directory and credentials Accept the default values:
   Inventory directory: /u01/app/oracle/oraInventory
   Operating System group name: dba
Specify Home Details Leave the default value for the Source directory. Set the destination for the ORACLE_HOME name (actually the $ORA_CRS_HOME that I will be using in this article) and location as follows:
   Name: OraCrs10g_home
   Location: /u01/app/oracle/product/crs
Product-Specific Prerequisite Checks The installer will run through a series of checks to determine if the node meets the minimum requirements for installing and configuring the Oracle Clusterware software. If any of the checks fail, you will need to manually verify the check that failed by clicking on the checkbox. For my installation, all checks passed with no problems.

Click Next to continue.

Specify Cluster Configuration Cluster Name: crs
 
Public Node Name Private Node Name Virtual Node Name
linux1 int-linux1 vip-linux1
linux2 int-linux2 vip-linux2
Specify Network Interface Usage
Interface Name Subnet Interface Type
eth0 192.168.1.0 Public
eth1 192.168.2.0 Private
Specify OCR Location Starting with Oracle Database 10g Release 2 (10.2) with RAC, Oracle Clusterware provides for the creation of a mirrored OCR file, enhancing cluster reliability. For the purpose of this example, I did choose to mirror the OCR file by keeping the default option of "Normal Redundancy":

Specify OCR Location: /u02/oradata/orcl/OCRFile
Specify OCR Mirror Location: /u02/oradata/orcl/OCRFile_mirror

Specify Voting Disk Location Starting with Oracle Database 10g Release 2 (10.2) with RAC, CSS has been modified to allow you to configure CSS with multiple voting disks. In Release 1 (10.1), you could configure only one voting disk. By enabling multiple voting disk configuration, the redundant voting disks allow you to configure a RAC database with multiple voting disks on independent shared physical disks. This option facilitates the use of the iSCSI network protocol, and other Network Attached Storage (NAS) storage solutions. Note that to take advantage of the benefits of multiple voting disks, you must configure at least three voting disks. For the purpose of this example, I did choose to mirror the voting disk by keeping the default option of "Normal Redundancy":

Voting Disk Location: /u02/oradata/orcl/CSSFile
Additional Voting Disk 1 Location: /u02/oradata/orcl/CSSFile_mirror1
Additional Voting Disk 2 Location: /u02/oradata/orcl/CSSFile_mirror2

Summary For some reason, the OUI fails to create the directory "$ORA_CRS_HOME/log" before starting the installation. You should manually create this directory before clicking the "Install" button.

For this installation, manually create the file /u01/app/oracle/product/crs/log on all nodes in the cluster. The OUI will log all errors to a log file in this directory only if it exists.

Click Install to start the installation!

Execute Configuration Scripts After the installation has completed, you will be prompted to run the orainstRoot.sh and root.sh script. Open a new console window on each node in the RAC cluster, (starting with the node you are performing the install from), as the "root" user account.

Navigate to the /u01/app/oracle/oraInventory directory and run orainstRoot.sh ON ALL NODES in the RAC cluster.

Within the same new console window on each node in the RAC cluster, (starting with the node you are performing the install from), stay logged in as the "root" user account.

As mentioned earilier in the "CSS Timeout Computation in 10g RAC 10.1.0.3" section, you should modify the entry for CSS misscount from 60 to 360 in the file $ORA_CRS_HOME/install/rootconfig as follows (on each node in the cluster). Change the following entry that can be found on line 356:

CLSCFG_MISCNT="-misscount 60"

to

CLSCFG_MISCNT="-misscount 360"

Now, navigate to the /u01/app/oracle/product/crs directory and locate the root.sh file for each node in the cluster - (starting with the node you are performing the install from). Run the root.sh file ON ALL NODES in the RAC cluster ONE AT A TIME.

You will receive several warnings while running the root.sh script on all nodes. These warnings can be safely ignored.

The root.sh may take awhile to run. When running the root.sh on the last node, you will receive a critical error and the output should look like:

...
Expecting the CRS daemons to be up within 600 seconds.
CSS is active on these nodes.
    linux1
    linux2
CSS is active on all nodes.
Waiting for the Oracle CRSD and EVMD to start
Oracle CRS stack installed and running under init(1M)
Running vipca(silent) for configuring nodeapps
The given interface(s), "eth0" is not public. Public interfaces should be used to configure virtual IPs.

This issue is specific to Oracle 10.2.0.1 (noted in bug 4437727) and needs to be resolved before continuing. The easiest workaround is to re-run vipca (GUI) manually as root from the last node in which the error occurred. Please keep in mind that vipca is a GUI and will need to set your DISPLAY variable accordingly to your X server:

# $ORA_CRS_HOME/bin/vipca

When the "VIP Configuration Assistant" appears, this is how I answered the screen prompts:

   Welcome: Click Next
   Network interfaces: Select both interfaces - eth0 and eth1
   Virtual IPs for cluster notes:
       Node Name: linux1
       IP Alias Name: vip-linux1
       IP Address: 192.168.1.200
       Subnet Mask: 255.255.255.0
 

       Node Name: linux2
       IP Alias Name: vip-linux2
       IP Address: 192.168.1.201
       Subnet Mask: 255.255.255.0
 

   Summary: Click Finish
   Configuration Assistant Progress Dialog: Click OK after configuration is complete.
   Configuration Results: Click Exit
 

Go back to the OUI and acknowledge the "Execute Configuration scripts" dialog window.

End of installation At the end of the installation, exit from the OUI.

Verify Oracle Clusterware / CSS misscount value

In the section "CSS Timeout Computation in 10g RAC 10.1.0.3", I mentioned the need to modify the CSS misscount value from its default value of 60 to 360 (or higher). Within that section I explained how to accomplish that by modifying the root.sh script before running it on each node in the cluster. If you were not able to modify the CSS misscount value within the root.sh script, you can still perform this action by using the $ORA_CRS_HOME/bin/crsctl program. For example, to obtain the current value for CSS misscount, use the following:

$ORA_CRS_HOME/bin/crsctl get css misscount
360
If you get back a value of 60, you will want to modify it to 360 as follows:
  • Start only one node in the cluster. For my example, I would shutdown linux2 and startup only linux1.
  • From the one node (linux1), login as the root user account and type:
    $ORA_CRS_HOME/bin/crsctl set css misscount 360
  • Reboot the single node (linux1).
  • Start all other nodes in the cluster.

Verify Oracle Clusterware Installation

After the installation of Oracle Clusterware, we can run through several tests to verify the install was successful. Run the following commands on all nodes in the RAC cluster.

Check cluster nodes

$ /u01/app/oracle/product/crs/bin/olsnodes -n
linux1  1
linux2  2
Check Oracle Clusterware Auto-Start Scripts
$ ls -l /etc/init.d/init.*
-r-xr-xr-x  1 root root  1951 Oct  4 14:21 /etc/init.d/init.crs*
-r-xr-xr-x  1 root root  4714 Oct  4 14:21 /etc/init.d/init.crsd*
-r-xr-xr-x  1 root root 35394 Oct  4 14:21 /etc/init.d/init.cssd*
-r-xr-xr-x  1 root root  3190 Oct  4 14:21 /etc/init.d/init.evmd*

 

 


 

 

   

 Copyright © 1996 -2009 by Burleson Enterprises, Inc. All rights reserved.


Oracle® is the registered trademark of Oracle Corporation. SQL Server® is the registered trademark of Microsoft Corporation. 
Many of the designations used by computer vendors to distinguish their products are claimed as Trademarks