|
|
|
|
Build Your Own RAC Cluster on Linux and FireWire
by Jeffrey Hunter - OTN
- June 2004 |
|
|
Jeffrey
Hunter is the author of
Conducting the Java Job Interview and
Conducting the J2EE Job Interview
by Rampant TechPress
Build
Your Own RAC Cluster on Linux and
FireWire
by Jeffrey
Hunter
Learn how
to set up and configure an Oracle Real
Applications Cluster for less than
$1,500 (for development and testing
only)
Overview
One of the
most efficient ways to become familiar
with Oracle Real Application
Clusters (RAC) technology is to have
access to an actual Oracle RAC cluster.
In learning this new technology, you
will soon start to realize the benefits
Oracle RAC has to offer like fault
tolerance, new levels of security, load
balancing, and the ease of upgrading
capacity. The challenge, however, is the
price of the hardware required for a
typical production RAC configuration. A
small two-node cluster, for example, can
run anywhere from $10,000 to well over
$20,000. This cost would not even
include shared storage, the heart of a
production RAC environment.
For those who
simply want to become familiar with
Oracle RAC, this article provides a
low-cost alternative for configuring an
Oracle9i RAC system using
commercial off-the-shelf components and
downloadable software. The estimated
cost for this configuration could be
anywhere from $1,000 to $1,500. The
system will comprise a dual-node
cluster, both running Linux (Red Hat
Linux Fedora Core 1 in this example)
with a shared disk array based on
IEEE1394 (FireWire)
drive technology.
Please note
that this is not the only way to build a
low-cost Oracle9i RAC system. I
have seen other solutions that utilize
an implementation based on SCSI rather
than FireWire for shared storage. In
most cases, SCSI will cost more than our
FireWire solution where a typical SCSI
card is priced around $70 and an 80GB
external SCSI drive will cost
$700-$1,000. Keep in mind that some
motherboards may already include
built-in SCSI controllers.
It is
important to note that this
configuration should never be run
in a production environment. In a
production environment, fiber channel is
the technology of choice, since it is
the high-speed serial-transfer interface
that can connect systems and storage
devices in either point-to-point or
switched topologies. FireWire is able to
offer a low-cost alternative to fiber
channel for testing and development, but
it is not ready for production.
NOTE: At the time of
this writing, I had not verified that
these instructions will work with Oracle
Database 10g. I will be
providing a separate article in the next
several months on how to perform a
similar install using 10g.
Oracle9i
Real Application Clusters (RAC)
Introduction
Oracle Real
Application Clusters (RAC) is the
successor to Oracle Parallel Server
(OPS). RAC allows multiple instances to
access the same database (storage)
simultaneously. RAC provides fault
tolerance, load balancing, and
performance benefits by allowing the
system to scale out, and at the same
time since all nodes access the same
database, the failure of one instance
will not cause the loss of access to the
database.
At the heart
of Oracle RAC is a shared disk
subsystem. All nodes in the cluster must
be able to access all of the data, redo
log files, control files and parameter
files for all nodes in the cluster. The
data disks must be globally available in
order to allow all nodes to access the
database. Each node has its own redo log
and control files, but the other nodes
must be able to access them in order to
recover that node in the event of a
system failure.
Not all
clustering solutions use shared storage.
Some vendors use an approach known as a
federated cluster, in which data is
spread across several machines rather
than shared by all. With Oracle RAC,
however, multiple nodes use the same set
of disks for storing data. With Oracle
RAC, the data, redo log, control, and
archived log files reside on shared
storage on raw-disk devices or on a
clustered file system. Oracle's approach
to clustering leverages the collective
processing power of all the nodes in the
cluster and at the same time provides
failover security.
Although it is
not absolutely necessary, Oracle
recommendeds that you install the Oracle
Cluster File System (OCFS). OCFS makes
disk management much easier for you by
creating the same file system on all the
nodes. This isn't necessary, but without
OCFS, you will have to make all
partitions manually. (NOTE:
This article does not go into the
details of installing or utilizing OCFS,
but rather uses all manual methods for
creating partitions and binding raw
devices to those partitions.)
One of the
main reasons why I do not use the Oracle
Cluster File System for Red Hat Linux is
that OCFS comes in the form of RPMs. All
the RPM modules and the precompiled
modules are tied to the Red Hat
Enterprise Linux AS ($1,200)
kernel-naming standard and will not load
in the supplied 2.4.20 linked kernel.
The biggest
difference between Oracle RAC and OPS is
the addition of Cache Fusion. With OPS a
request for data from one node to
another required the data to be written
to disk first, then the requesting node
can read that data. With cache fusion,
data is passed along with locks.
Pre-configured
Oracle9i RAC solutions are
available from vendors such as Dell, IBM
and HP for production environments. This
article, however, focuses on putting
together your own Oracle9i RAC
environment for development and testing
by using Linux servers and a low cost
shared disk solution; FireWire.
What software
is necessary for RAC? Does it have a
separate installation CD to order?
RAC is
contained within the Oracle9i
Database Enterprise Edition. (Oracle
recently announced that RAC is now
available in Oracle Database 10g
Standard Edition as well.) If you
install Oracle9i Enterprise
Edition onto a cluster, and the Oracle
Universal Installer (OUI) recognizes the
cluster, you will be provided the option
of installing RAC. Most UNIX platforms
require an OSD installation for the
necessary clusterware. For Intel
platforms (Linux and Windows), Oracle
provides the OSD software within the
Oracle9i Enterprise Edition
release.
Shared
Storage Overview
Today,
fiber-channel is one of the most popular
solutions for shared storage. As
mentioned earlier, fiber-channel is a
high-speed serial-transfer interface
that is used to connect systems and
storage devices in either point-to-point
or switched topologies. Protocols
supported by fiber channel include SCSI
and IP. Fiber channel configurations can
support as many as 127 nodes and have a
throughput of up to 2.12 gigabits per
second. Fiber-channel, although, is very
expensive. Just the fiber-channel switch
alone can run as much as $1,000. This
does not even include the fiber-channel
storage array and high-end drives, which
can reach prices of about $300 for a
36GB drive. A typical fiber-channel
setup which includes fiber-channel cards
for the servers, a basic setup is
roughly $5,000, which does not include
the cost of the servers that make up the
cluster.
A less
expensive alternative to fiber-channel
is SCSI. SCSI technology provides
acceptable performance for shared
storage, but for administrators and
developers who are accustomed to
GPL-based Linux prices, even SCSI can
come in over budget, at around $1,000 to
$2,000 for a two-node cluster.
Another
popular solution is the Sun NFS (Network
File System). It can be used for shared
storage but only if you are using a
network appliance or something similar.
Specifically, you need servers that
guarantee direct I/O over NFS.
FireWire
Technology
Developed by
Apple Computer and Texas Instruments,
FireWire is a cross-platform
implementation of a high-speed serial
data bus. With its high bandwidth, long
distances (up to 100 meters in length)
and high-powered bus, FireWire is being
used in applications such as digital
video (DV), professional audio, hard
drives, high-end digital still cameras
and home entertainment devices. Today,
FireWire operates at transfer rates of
up to 800 megabits per second while next
generation FireWire calls for speeds to
a theoretical bit rate to 1,600 Mbps and
then up to a staggering 3,200 Mbps.
That's 3.2 gigabits per second. This
speed will make FireWire indispensable
for transferring massive data files and
for even the most demanding video
applications, such as working with
uncompressed high-definition (HD) video
or multiple standard-definition (SD)
video streams.
The following
chart shows speed comparisons of the
various types of disk interface. For
each interface, I provide the maximum
transfer rates in kilobits (kb),
kilobytes (KB), megabits (Mb), and
megabytes (MB) per second. As you can
see, the capabilities of IEEE1394
compare very favorably with other
available disk interface technologies.
| Disk
Interface |
Speed |
| Serial |
115 kb/s - (.115 Mb/s) |
| Parallel (standard) |
115 KB/s - (.115 MB/s) |
| USB 1.1 |
12 Mb/s - (1.5 MB/s) |
| Parallel (ECP/EPP) |
3.0 MB/s |
| IDE |
3.3 - 16.7 MB/s |
| ATA |
3.3 - 66.6 MB/sec |
| SCSI-1 |
5 MB/s |
| SCSI-2 (Fast SCSI / Fast
Narrow SCSI) |
10 MB/s |
| Fast Wide SCSI (Wide SCSI) |
20 MB/s |
| Ultra SCSI (SCSI-3 / Fast-20
/ Ultra Narrow) |
20 MB/s |
| Ultra IDE |
33 MB/s |
| Wide Ultra SCSI (Fast Wide
20) |
40 MB/s |
| Ultra2 SCSI |
40 MB/s |
| IEEE1394(b) |
100 - 400Mb/s -
(12.5 - 50 MB/s) |
| USB 2.x |
480 Mb/s - (60 MB/s) |
| Wide Ultra2 SCSI |
80 MB/s |
| Ultra3 SCSI |
80 MB/s |
| Wide Ultra3 SCSI |
160 MB/s |
| FC-AL Fiber Channel |
100 - 400 MB/s |
Hardware &
Costs
The hardware used to
build our example Oracle9i RAC
environment consists of two Linux
servers and components that can be
purchased at any local computer store or
over the Internet.
|
Server 1 (linux1) |
Dell Dimension XPS D266
Computer
- 266MHz Pentium II
- 384MB RAM
- 60GB Internal HD
- CDROM and Floppy |
$400 |
2 - Ethernet LAN Cards
-
Linksys 10/100 Mpbs - (To
public network)
-
Linksys 10/100 Mpbs - (Used
for Interconnect to linux2)
|
$20
$20 |
1 - FireWire Card
-
SIIG, Inc. 3-Port 1394 I/O Card
|
|
Note: Cards with
chipsets made by VIA or
TI are known to work.
|
|
$30 |
|
Server 2 (linux2) |
Pentium IV Computer
- 1.8GHz Pentium IV
- 300W Power Supply
- 512MB RAM
- 40GB Internal HD
- 32MB AGP Video Card
- CDROM and Floppy |
$600 |
2 - Ethernet LAN Cards
-
Linksys 10/100 Mpbs - (To
public network)
-
Linksys 10/100 Mpbs - (Used
for Interconnect to linux1)
|
$20
$20 |
1 - FireWire Card
-
Belkin FireWire 3-Port 1394 PCI
Card
| |
Note: Cards with
chipsets made by VIA or
TI are known to work.
|
|
$40 |
|
Miscellaneous Components |
FireWire Hard
Drive
-
Maxtor One Touch 200GB USB 2.0 /
Firewire External Hard Drive
| |
Ensure that the FireWire
drive you purchase
supports multiple
logins. If the drive has
a chipset that does not
allow for concurrent
access for more than one
server, the disk and its
partitions can only be
seen by one server at a
time. Disks with the
Oxford 911 chipset are
known to work. Here are
the details about the
disk that I purchased
for this test:
Vendor: Maxtor
Model: OneTouch
Mfg. Part No. or KIT
No.: A01A200 or A01A250
Capacity: 200GB or 250GB
Cache Buffer: 8MB
Spin Rate: 7200 RPM
"Combo" Interface: IEEE
1394 and SPB-2 compliant
(100 to 400 Mbits/sec)
plus USB 2.0 and USB 1.1
compatible |
|
$270 |
1 - Extra FireWire Cable
-
Belkin 6-pin to 6-pin 1394 Cable
|
$15 |
1 - Ethernet hub or switch
-
Linksys EtherFast 10/100 5-port
Ethernet Switch (used for
interconnect int-linux1 /
int-linux2) |
$40 |
4 - Network Cables
-
Category 5e patch cable -
(Connect linux1 to public
network)
-
Category 5e patch cable -
(Connect linux2 to public
network)
-
Category 5e patch cable -
(Connect linux1 to interconnect
ethernet switch)
-
Category 5e patch cable -
(Connect linux2 to interconnect
ethernet switch) |
$5
$5
$5
$5 |
|
Total
|
$1,495 |
A Brief Walk
Through the Process
Before
presenting the details of building our
Oracle9i RAC system, I thought
it would be beneficial to take a brief
walk through the steps involved in
building the environment. (See Figure
1.)
Our
implementation describes a dual node
cluster (each with a single processor),
each server running Red Hat Linux Fedora
Core 1. Note that most of the tasks
within this document will need to be
performed on both servers. I will
indicate at the beginning of each
section whether or not the task(s)
should be performed on both nodes.
|
|
1.
Install Red Hat Linux / Fedora
Core 1 (on both nodes)
For this example configuration,
you will be installing Red Hat
Linux (Fedora Core 1) on both
nodes that make up the RAC
cluster.
|
|
|
2. Configure network settings
(on both nodes)
After installing the Red Hat
Linux software on both nodes,
you will then need to configure
the network on both nodes. This
includes configuring the public
network as well as the
interconnect for the cluster.
You should also adjust the
default and maximum send buffer
size settings for the
interconnect for better
performance when using cache
fusion buffer transfers between
instances. These settings will
be put in your /etc/sysctl.conf
file.
|
|
|
3. Obtain and Install a proper
Linux Kernel (on both nodes)
In this section, we will be
downloading and installing a new
Linux kernel—one that supports
multiple logins to the Fire Wire
storage device. The kernel can
be downloaded from Oracle's
Linux Projects development
group—
http://oss.oracle.com. Once
the new kernel is installed,
there are several configuration
steps in order to load the
FireWire stack.
|
|
|
4. Create UNIX oracle
user account (dba
group) (on both nodes)
We will then create an Oracle
UNIX user id on all nodes within
the RAC cluster. This section
also provides an example login
script (.bash_profile)
that can be used to set all
required environment variables
for the oracle user.
|
|
|
5. Create Partitions on the
Shared FireWire Storage Device
(run once only from a single
node)
This is where we create the
physical and logical volumes
using Logical Volume Manager
(LVM). Instructions will be
provided on how to remove all
partitions from our FireWire
drive and then how to use LVM to
create all of our logical
partitions.
|
|
|
6. Create RAW Bindings (on both
nodes)
After creating our logical
partitions, we need to configure
raw devices on our FireWire
shared storage to be used for
all physical Oracle database
files.
|
|
|
7. Create Symbolic Links From
RAW Volumes (on both nodes)
It is helpful to create symbolic
links from the RAW volumes to
human readable names to make
file recognition easier.
Although this step is optional,
it is highly recommended.
|
|
|
8. Configuring the Linux Servers
(on both nodes)
This section will detail the
steps involved to configure both
Linux machines in order to
prepare them for an Oracle9i
RAC install.
|
|
|
9. Configuring the
hangcheck-timer Kernel Module
(on both nodes)
Oracle9i RAC uses a
kernel module called the
hangcheck-timer to monitor
the health of the cluster and to
restart a RAC mode in case of a
failure. This section explains
the steps required to configure
the hangcheck-timer kernel
module. Although the
hangcheck-timer module is not
required for Oracle Cluster
Manager operation, it is highly
recommended by Oracle.
|
|
|
10. Configuring RAC Nodes for
Remote Access (on both nodes)
When installing Oracle9i
RAC, the Oracle Installer will
use the rsh
command to copy the Oracle
software to all other nodes
within the RAC cluster. Included
in this section are the
instructions for configuring all
nodes within your RAC cluster to
run r* commands like
rsh, rcp, and
rlogin on a RAC node
against other RAC nodes without
a password.
|
|
|
11. Configuring a Machine
Startup Script (on both nodes)
Up to this point, we have talked
in great detail about the
parameters and resources that
will need to be configured on
both nodes for our Oracle9i
RAC configuration. This section
will take a breather and recap
those parameters and commands
(in previous sections of this
document) that need to happen on
each node when the machine is
cycled. Although there are
several ways to do this, I
simply provide a listing of the
commands that you can put into a
startup script (i.e.
/etc/rc.local) that setup all
required resources (disks,
memory, etc.) each time the
machine is booted. Other startup
scripts are included within this
section in order to provide a
check as to whether you have
updated all required scripts
when each machine in the cluster
is booted.
|
|
|
12. Update Red Hat Linux System
(on both nodes)
There are several RPMs that will
need to be applied to all nodes
within the RAC cluster in
preparation for the Oracle
install. All the RPMs are
included on the CDs for Fedora
Core 1, plus I also put links to
the files from this article.
After applying all of the RPMs,
you will then need to apply
Oracle/Linux Patch 3006854.
After applying all required
patches, you should reboot all
nodes within the RAC cluster.
|
|
|
13. Download / Unpack the
Oracle9i Installation
Files (from a single node)
This section includes the steps
to download and unpack the
Oracle9i software
distribution. The software can
be downloaded from
http://otn.oracle.com.
|
|
|
14. Install Oracle9i
Cluster Manager ( from a single
node)
Installing Oracle9i RAC
is a two-step process: (1)
Install the Oracle9i
Cluster Manager and (2) Install
the Oracle9i RDBMS
software. In this section, we
will go through the steps to
install, configure and start the
Oracle Cluster Manager software.
Keep in mind that the
installation of Oracle Cluster
Manager only needs to be
preformed on one of the nodes
(the installation process will
rsh the files out to
all other nodes contained within
the cluster), but the
configuring and starting the
Cluster Manager needs to be
preformed on both nodes.
|
|
|
15. Install Oracle9i
RAC (only needs to be preformed
from a single node)
After installing Oracle Cluster
Manager, it is time to install
the RAC software. This section
provides many of the tasks
involved to install the software
as well as many post
installation tasks that should
be preformed before creating the
Oracle cluster database.
|
|
|
16. Create the Oracle Database
(from a single node)
After all the software has been
installed, we will now use the
Oracle Database Configuration
Assistant (DBCA) to create our
clustered database on the shared
storage (FireWire) device.
|
|
|
17. Creating TNS Networking
Files (on both nodes)
This section simply provides an
example listing of my
listener.ora and
tnsnames.ora files. These
will need to be configured for
each node in the RAC cluster.
The Oracle Installer and Oracle
Database Configuration Assistant
do a great job in keeping these
files up to date. I do, however,
like to make a few changes to
the tnsnames.ora file.
|
|
|
18. Verify the RAC Cluster /
Database Configuration (on both
nodes)
After the Oracle Database
Configuration Assistant has
completed in creating the
clustered database, you should
have a fully functional Oracle9i
RAC cluster running. This
section provides several
commands SQL queries that can be
used to validate your Oracle9i
RAC configuration.
|
|
|
19. Starting & Stopping the
Cluster ( from a single node)
Examples will be given in this
section on how to start and stop
the cluster. This includes how
to fully bring up or down the
entire cluster, along with
examples of how to bring up and
shutdown individual instances
within the cluster.
|
|
|
20. Transparent Application
Failover (TAF) (on one or both
nodes)
Now that we have our cluster up
and running, this section
provides an example on how to
test the Transparent Application
Failover features of Oracle9i
RAC. I will demonstrate how
session failure works and how to
setup your TNS configuration to
take advantage of TAF.
|
Install Red
Hat Linux (Fedora Core 1)
After procuring the
required hardware, it is time to start
the configuration process. The first
step in the process is to install the
Red Hat Linux Fedora Core 1 software on
both servers.
NOTE:
This article does not provide detailed
instructions for installing Red Hat
Linux Fedora Core 1. For the purpose of
this article, I choose to perform a
Custom installation and then "Install
Everything" when prompted for which
products to install. Documentation for
installing Red Hat Linux can be found at
http://www.redhat.com/docs/manuals/.
Configure
Network Settings
Configuring Public and Private Network
Let's start our
Oracle RAC Linux configuration by
ensuring the correct network
configuration. In our two-node example,
we will need to configure the network on
both nodes.
The easiest
way to configure network settings in
RedHat Linux is via the program Network
Configuration. This application can be
started from the command-line as the
"root" user id as follows:
# su -
# /usr/bin/redhat-config-network &
NOTE:
Do not use DHCP naming as the
interconnects need hard IP addresses!
Using the
Network Configuration application, you
will need to configure both NIC devices
as well as the /etc/hosts file.
Both of these tasks can be completed
using the Network Configuration GUI.
Notice that the /etc/hosts
settings are the same for both nodes.
Our example
configuration will use the following
settings:
|
Server 1 (linux1) |
| Device |
IP Address |
Subnet |
Purpose |
| eth0 |
192.168.1.100 |
255.255.255.0 |
Connects linux1 to the
public network |
| eth1 |
192.168.2.100 |
255.255.255.0 |
Connects linux1
(interconnect) to linux2
(int-linux2) |
| /etc/hosts |
127.0.0.1 localhost loopback
192.168.1.100 linux1
192.168.2.100 int-linux1
192.168.1.101 linux2
192.168.2.101 int-linux2
|
|
Server 2 (linux2) |
| Device |
IP Address |
Subnet |
Purpose |
| eth0 |
192.168.1.101 |
255.255.255.0 |
Connects linux2 to the
public network |
| eth1 |
192.168.2.101 |
255.255.255.0 |
Connects linux2
(interconnect) to linux1
(int-linux1) |
| /etc/hosts |
127.0.0.1 localhost loopback
192.168.1.100 linux1
192.168.2.100 int-linux1
192.168.1.101 linux2
192.168.2.101 int-linux2
|
In the screenshots
below, only node 1 (linux1) is shown.
Ensure to make all the proper network
settings to both nodes.
Figure 1: Network
Configuration Screen, Node 1 (linux1)
Figure 2:
Ethernet Device Screen, eth0 (linux1)
Figure 3:
Ethernet Device Screen, eth1 (linux1)
Figure 4: Network
Configuration Screen, /etc/hosts
(linux1)
Adjusting
Network Settings
With Oracle
9.2.0.1 and above, Oracle uses UDP as
the default protocol on Linux for
interprocess communication (IPC), such
as cache fusion buffer transfers
between instances within the RAC
cluster.
Oracle
strongly suggests to adjust the default
and maximum send buffer size (SO_SNDBUF
socket option) to 256KB, and the default
and maximum receive buffer size (SO_RCVBUF
socket option) to 256KB.
The receive
buffers are used by TCP and UDP to hold
received data until is is read by the
application. The receive buffer cannot
overflow because the peer is not allowed
to send data beyond the buffer size
window. This means that datagrams will
be discarded if they don't fit in the
socket receive buffer. This could cause
the sender to overwhelm the receiver.
NOTE:
The default and maximum window size can
be changed in the /proc file
system without reboot:
su - root
# Default setting in bytes of the socket receive buffer
sysctl -w net.core.rmem_default=262144
# Default setting in bytes of the socket send buffer
sysctl -w net.core.wmem_default=262144
# Maximum socket receive buffer size which may be set by using
# the SO_RCVBUF socket option
sysctl -w net.core.rmem_max=262144
# Maximum socket send buffer size which may be set by using
# the SO_SNDBUF socket option
sysctl -w net.core.wmem_max=262144
You should make the
above changes permanent by adding the
following lines to the
/etc/sysctl.conf file for each node
in your RAC cluster:
net.core.rmem_default=262144
net.core.wmem_default=262144
net.core.rmem_max=262144
net.core.wmem_max=262144
|
|
|
Listing 1
select event,
total_waits,
round(100 * (total_waits / sum_waits),2) pct_waits,
time_wait_sec,
round(100 * (time_wait_sec /
greatest(sum_time_waited,1)),2)
pct_time_waited,
total_timeouts,
round(100 * (total_timeouts /
greatest(sum_timeouts,1)),2)
pct_timeouts,
average_wait_sec
from
(select event,
total_waits,
round((time_waited / 100),2) time_wait_sec,
total_timeouts,
round((average_wait / 100),2) average_wait_sec
from sys.v_$system_event
where event not in
('lock element cleanup',
'pmon timer',
'rdbms ipc message',
'rdbms ipc reply',
'smon timer',
'SQL*Net message from client',
'SQL*Net break/reset to client',
'SQL*Net message to client',
'SQL*Net more data from client',
'dispatcher timer',
'Null event',
'parallel query dequeue wait',
'parallel query idle wait - Slaves',
'pipe get',
'PL/SQL lock timer',
'slave wait',
'virtual circuit status',
'WMON goes to sleep',
'jobq slave wait',
'Queue Monitor Wait',
'wakeup time manager',
'PX Idle Wait') AND
event not like 'DFS%' AND
event not like 'KXFX%'),
(select sum(total_waits) sum_waits,
sum(total_timeouts) sum_timeouts,
sum(round((time_waited / 100),2)) sum_time_waited
from sys.v_$system_event
where event not in
('lock element cleanup',
'pmon timer',
'rdbms ipc message',
'rdbms ipc reply',
'smon timer',
'SQL*Net message from client',
'SQL*Net break/reset to client',
'SQL*Net message to client',
'SQL*Net more data from client',
'dispatcher timer',
'Null event',
'parallel query dequeue wait',
'parallel query idle wait - Slaves',
'pipe get',
'PL/SQL lock timer',
'slave wait',
'virtual circuit status',
'WMON goes to sleep',
'jobq slave wait',
'Queue Monitor Wait',
'wakeup time manager',
'PX Idle Wait') AND
event not like 'DFS%' AND
event not like 'KXFX%')
order by 4 desc, 1 asc
Listing 2
SELECT sid,
username,
event,
total_waits,
100 * round((total_waits / sum_waits),2)
pct_of_total_waits,
time_wait_sec,
total_timeouts,
average_wait_sec,
max_wait_sec
FROM
(SELECT a.event,
b.sid sid,
decode (b.username,null,c.name,b.username) username,
a.total_waits total_waits,
round((a.time_waited / 100),2) time_wait_sec,
a.total_timeouts total_timeouts,
round((average_wait / 100),2)
average_wait_sec,
round((a.max_wait / 100),2) max_wait_sec
FROM sys.v_$session_event a,
sys.v_$session b,
sys.v_$bgprocess c,
sys.v_$process d
WHERE a.event NOT IN
('lock element cleanup',
'pmon timer',
'rdbms ipc message',
'smon timer',
'SQL*Net message from client',
'SQL*Net break/reset to client',
'SQL*Net message to client',
'SQL*Net more data from client',
'dispatcher timer',
'Null event',
'parallel query dequeue wait',
'parallel query idle wait - Slaves',
'pipe get',
'PL/SQL lock timer',
'slave wait',
'virtual circuit status',
'WMON goes to sleep'
)
AND a.event NOT LIKE 'DFS%'
AND a.event NOT LIKE 'KXFX%'
AND a.sid = b.sid
AND d.addr = b.paddr
AND c.paddr (+) = b.paddr
),
(select sum(total_waits) sum_waits
FROM sys.v_$session_event a,
sys.v_$session b
WHERE a.event NOT IN
('lock element cleanup',
'pmon timer',
'rdbms ipc message',
'smon timer',
'SQL*Net message from client',
'SQL*Net break/reset to client',
'SQL*Net more data from client',
'SQL*Net message to client',
'dispatcher timer',
'Null event',
'parallel query dequeue wait',
'parallel query idle wait - Slaves',
'pipe get',
'PL/SQL lock timer',
'slave wait',
'virtual circuit status',
'WMON goes to sleep'
)
AND a.event NOT LIKE 'DFS%'
AND a.event NOT LIKE 'KXFX%'
AND a.sid = b.sid)
order by 6 desc, 1 asc
| Obtain
and Install a Proper Linux Kernel
Overview
The next step is to
obtain and install a new Linux kernel that
supports the use of IEEE1394 devices with
multiple logins. In previous releases of
this article, I included the steps to
download a patched version of the Linux
kernel and then compile it. Thanks to
Oracle's Linux Projects development group,
this is no longer a requirement. They
provide a pre-compiled kernel for Red Hat
Enterprise Linux 3.0 (which also works with
Fedora) that can simply be downloaded and
installed. The instructions for downloading
and installing the kernel are included in
this section. Before going into the details
of how to perform these actions, however,
let's take a moment to discuss the changes
that are required in the new kernel.
While FireWire
drivers already exist for Linux, they often
do not support shared storage.
Normally, when you logon to an OS, the OS
associates the driver to a specific drive
for that machine alone. This implementation
simply will not work for our RAC
configuration. The shared storage (our
FireWire hard drive) needs to be accessed by
more than one node. We need to enable the
FireWire driver to provide nonexclusive
access to the drive so that multiple
servers—the nodes that comprise the cluster—
will be able to access the same storage.
This task is accomplished by removing the
bit mask that identifies the machine during
login in the source code. This results in
allowing nonexclusive access to the FireWire
hard drive. All other nodes in the cluster
login to the same drive during their logon
session, using the same modified driver, so
they too also have nonexclusive access to
the drive.
I'm probably
getting ahead of myself, but I want to cover
several topics before diving into the
details of installing our new Linux kernel.
When we install our new Linux kernel (one
that supports multiple logons to the
FireWire drive) the system will detect and
recognize the FireWire attached drive as a
SCSI device. You will be able to use
standard OS tools to partition the disk,
create a file system, and so on. For Oracle9i
RAC, you must make partitions for all the
files and bind raw devices to those
partitions. This article will make use of
Logical Volume Manager (LVM) to make all
needed paritions (actually to be known as
logical partitions) on the FireWire
shared drive.
Our implementation
describes a dual node cluster (each with a
single processor), each server running Red
Hat Linux Fedora Core 1. Keep in mind that
the process of installing the patched Linux
kernel will need to be performed on
both Linux nodes. Red Hat Linux
Fedora Core 1 includes kernel
linux-2.4.22-1.2115.nptl; we will need to
download the Oracle-supplied 2.4.21-9.0.1
Linux kernel from the following URL:
http://oss.oracle.com/projects/firewire/files.
Perform the
following procedures on both nodes in the
cluster:
- Download one of the following files:
kernel-2.4.21-9.0.1.ELorafw1.i686.rpm
- for single processor
- OR -
kernel-smp-2.4.21-9.0.1.ELorafw1.i686.rpm
- for multiple processors
- Make a backup of your GRUB
configuration file:
In most cases you
will be using GRUB for your boot loader.
Before actually installing the new
kernel ensure to backup a copy of your
/etc/grub.conf file:
# cp /etc/grub.conf /etc/grub.conf.original
- Install the new kernel, as user
root:
# rpm -ivh --force kernel-2.4.21-9.0.1.ELorafw1.i686.rpm - for single processor
- OR -
# rpm -ivh --force kernel-smp-2.4.21-9.0.1.ELorafw1.i686.rpm - for multiple processors
NOTE: Installing the
new kernel using RPM will also undate
your grub or lilo configuration with the
appropiate stanza. There is no need to
add any new stanza to your boot loader
configuration unless you want to have
your old kernel image available.
The following
is a listing of my /etc/grub.conf
file before and then after the kernel
install. As you can see, the install
that I did put in another stanza for the
2.4.21-9.0.1.ELorafw1 kernel.
If you want, you can change the entry (default)
in the new file so that the new kernel
will be the default one booted. By
default, the installer keeps your old
kernel the default one by setting it to
default=1.
Original
/etc/grub.conf File for Fedora Core
1
# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE: You have a /boot partition. This means that
# all kernel and initrd paths are relative to /boot/, eg.
# root (hd0,0)
# kernel /vmlinuz-version ro root=/dev/hda3
# initrd /initrd-version.img
#boot=/dev/hda
default=0
timeout=10
splashimage=(hd0,0)/grub/splash.xpm.gz
title Fedora Core (2.4.22-1.2115.nptl)
root (hd0,0)
kernel /vmlinuz-2.4.22-1.2115.nptl ro root=LABEL=/ rhgb
initrd /initrd-2.4.22-1.2115.nptl.img
Newly Configured
/etc/grub.conf File for Fedora
Core 1 After Kernel Install
# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE: You have a /boot partition. This means that
# all kernel and initrd paths are relative to /boot/, eg.
# root (hd0,0)
# kernel /vmlinuz-version ro root=/dev/hda3
# initrd /initrd-version.img
#boot=/dev/hda
default=0
timeout=10
splashimage=(hd0,0)/grub/splash.xpm.gz
title Fedora Core (2.4.21-9.0.1.ELorafw1)
root (hd0,0)
kernel /vmlinuz-2.4.21-9.0.1.ELorafw1 ro root=LABEL=/ rhgb
initrd /initrd-2.4.21-9.0.1.ELorafw1.img
title Fedora Core (2.4.22-1.2115.nptl)
root (hd0,0)
kernel /vmlinuz-2.4.22-1.2115.nptl ro root=LABEL=/ rhgb
initrd /initrd-2.4.22-1.2115.nptl.img
- Add module options:
Add the following lines to
/etc/modules.conf:
options sbp2 sbp2_exclusive_login=0
post-install sbp2 insmod sd_mod
post-remove sbp2 rmmod sd_mod
It is vital
that the parameter
sbp2_exclusive_login of the Serial
Bus Protocol module (sbp2) be
set to zero to allow multiple hosts to
login to and access the FireWire disk
concurrently. The second line ensures
the SCSI disk driver module (sd_mod)
is loaded as well since (sbp2)
requires the SCSI layer. The core SCSI
support module (scsi_mod) will
be loaded automatically if (sd_mod)
is loaded—there is no need to make a
separate entry for it.
- Reboot machine
Reboot your machine
into the new kernel. Ensure the firewire
(ieee1394) pci cards are plugged into
the machine!
- Load the firewire stack
In most cases, the
loading of the FireWire stack will
already be configured in the
/etc/rc.sysinit file. The commands
that are contained within this file that
are responsible for loading the FireWire
stack are:
# modprobe ohci1394
# modprobe sbp2
In older versions of Red Hat, this was
not the case and these commands would
have to be manually run or put within a
startup file. With Fedora Core 1 and
higher, these commands are already put
within the /etc/rc.sysinit file
and run on each boot.
- Rescan SCSI bus
In older versions of
the kernel, I would need to run the
rescan-scsi-bus.sh script in order
to detect the FireWire drive. The
purpose of this script was to create the
SCSI entry for the node by using the
following command:
echo "scsi add-single-device 0 0 0 0" > /proc/scsi/scsi
With Fedora Core 1,
the disk should be detected
automatically.
- Check for SCSI Device
After you have
rebooted the machine, the kernel should
automatically detect the disk as a SCSI
device (/dev/sdXX). This
section will provide several commands
that should be run on both nodes in the
cluster to ensure the FireWire drive was
successfully detected.
For this
configuration, I was performing the
above procedures on both nodes at the
same time. When complete, I shutdown
both machines, started linux1
first, and then linux2. The
following commands and results are from
my linux2 machine. Again, make
sure that you run the following commands
on both nodes to ensure both machine can
login to the shared drive.
Let's first
check to see that the FireWire adapter
was successfully detected:
# lspci
00:00.0 Host bridge: Intel Corp. 82845 845 (Brookdale) Chipset Host Bridge (rev 11)
00:01.0 PCI bridge: Intel Corp. 82845 845 (Brookdale) Chipset AGP Bridge (rev 11)
00:1d.0 USB Controller: Intel Corp. 82801DB USB (Hub #1) (rev 01)
00:1d.1 USB Controller: Intel Corp. 82801DB USB (Hub #2) (rev 01)
00:1d.2 USB Controller: Intel Corp. 82801DB USB (Hub #3) (rev 01)
00:1d.7 USB Controller: Intel Corp. 82801DB USB2 (rev 01)
00:1e.0 PCI bridge: Intel Corp. 82801BA/CA/DB/EB PCI Bridge (rev 81)
00:1f.0 ISA bridge: Intel Corp. 82801DB LPC Interface Controller (rev 01)
00:1f.1 IDE interface: Intel Corp. 82801DB Ultra ATA Storage Controller (rev 01)
00:1f.3 SMBus: Intel Corp. 82801DB/DBM SMBus Controller (rev 01)
01:00.0 VGA compatible controller: nVidia Corporation NV34 [GeForce FX 5200] (rev a1)
02:00.0 Ethernet controller: Linksys Network Everywhere Fast Ethernet 10/100 model NC100 (rev 11)
02:01.0 FireWire (IEEE 1394): Texas Instruments TSB12LV26 IEEE-1394 Controller (Link)
02:05.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)
02:07.0 Multimedia audio controller: C-Media Electronics Inc CM8738 (rev 10)
Second, let's
check to see that the modules are
loaded:
# lsmod |egrep "ohci1394|sbp2|ieee1394|sd_mod|scsi_mod"
sd_mod 13808 0
sbp2 20556 0
scsi_mod 109864 3 [sg sd_mod sbp2]
ohci1394 28904 0 (unused)
ieee1394 63652 0 [sbp2 ohci1394]
Third, let's make
sure the disk was detected and an entry
was made by the kernel:
# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: Maxtor Model: OneTouch Rev: 0200
Type: Direct-Access
Now let's ensure
the FireWire drive is accessible for
multiple logins and shows a valid login:
# dmesg | grep sbp2
ieee1394: sbp2: Query logins to SBP-2 device successful
ieee1394: sbp2: Maximum concurrent logins supported: 3
ieee1394: sbp2: Number of active logins: 2
ieee1394: sbp2: Logged into SBP-2 device
ieee1394: sbp2: Node[01:1023]: Max speed [S400] - Max payload [2048]
ieee1394: sbp2: Reconnected to SBP-2 device
ieee1394: sbp2: Node[01:1023]: Max speed [S400] - Max payload [2048]
From the above
output, you can see that the FireWire
drive we have can support concurrent
logins by up to 3 servers. It is vital
that you have a drive where the chipset
supports concurrent access for all nodes
within the RAC cluster.
- Troubleshoot SCSI Device Detection
If you are having
troubles with any of the procedures
(above) in detecting the SCSI device,
you can try the following:
# modprobe -r sbp2
# modprobe -r sd_mod
# modprobe -r ohci1394
# modprobe ohci1394
# modprobe sd_mod
# modprobe sbp2
Create "oracle"
User and Directories (both nodes)
Let's continue our
example by creating the UNIX dba
group and oracle userid along with
all appropriate directories.
# mkdir /u01
# mkdir /u01/app
# groupadd -g 115 dba
# useradd -u 175 -g 115 -d /u01/app/oracle -s /bin/bash -c "Oracle Software Owner" -p oracle oracle
NOTE:
When you are setting the Oracle environment
variables for each RAC node, ensure to
assign each RAC node a unique Oracle SID!
For this example,
I used:
- linux1 :
ORACLE_SID=orcl1
- linux2 :
ORACLE_SID=orcl2
NOTE:
The Oracle Universal Installer (OUI)
requires at most 400MB of free space in the
/tmp directory.
You can check the
available space in /tmp by running
the following command:
# df -k /tmp
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/hda3 36384656 6224240 28312140 19% /
If for some reason
you do not have enough space in /tmp,
you can temporarily create space in another
file system and point your TEMP and
TMPDIR to it for the duration of
the install. Here are the steps to do this:
# su -
# mkdir /<AnotherFilesystem>/tmp
# chown root.root /<AnotherFilesystem>/tmp
# chmod 1777 /<AnotherFilesystem>/tmp
# export TEMP=/<AnotherFilesystem>/tmp # used by Oracle
# export TMPDIR=/<AnotherFilesystem>/tmp # used by Linux programs
# like the linker "ld"
When the installation
of Oracle is complete, you can remove the
temporary directory using the following:
# su -
# rmdir /<AnotherFilesystem>/tmp
# unset TEMP
# unset TMPDIR
After creating the "oracle"
UNIX userid on both nodes, ensure that the
environment is setup correctly by using the
following .bash_profile:
# .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
alias ls="ls -FA"
# User specific environment and startup programs
export ORACLE_BASE=/u01/app/oracle
export ORACLE_HOME=$ORACLE_BASE/product/9.2.0
# Each RAC node must have a unique ORACLE_SID. (i.e. orcl1, orcl2,...)
export ORACLE_SID=orcl1
export PATH=.:${PATH}:$HOME/bin:$ORACLE_HOME/bin
export PATH=${PATH}:/usr/bin:/bin:/usr/bin/X11:/usr/local/bin
export ORACLE_TERM=xterm
export TNS_ADMIN=$ORACLE_HOME/network/admin
export ORA_NLS33=$ORACLE_HOME/ocommon/nls/admin/data
export LD_LIBRARY_PATH=$ORACLE_HOME/lib
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:$ORACLE_HOME/oracm/lib
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/lib:/usr/lib:/usr/local/lib
export CLASSPATH=$ORACLE_HOME/JRE
export CLASSPATH=${CLASSPATH}:$ORACLE_HOME/jlib
export CLASSPATH=${CLASSPATH}:$ORACLE_HOME/rdbms/jlib
export CLASSPATH=${CLASSPATH}:$ORACLE_HOME/network/jlib
export THREADS_FLAG=native
export TEMP=/tmp
export TMPDIR=/tmp
export LD_ASSUME_KERNEL=2.4.1
Creating
Partitions on the Shared FireWire Storage
Device (one node)
Overview
It is time to create the
physical and logical volumes to be used by
the Logical Volume Manager (LVM). (For a
more detailed view of managing the LVM, see
my article
Managing Physical & Logical Volumes.)
The following table lists the mappings of
logical partition to tablespace that we will
be accomplishing in this section of the
document:
| Logical
Volume |
RAW Volume |
Symbolic
Link |
Tablespace/ File
Name |
Tablespace/ File
Size |
Partition
Size |
| /dev/pv1/lvol1 |
/dev/raw/raw1 |
/u01/app/oracle/oradata/orcl/CMQuorumFile |
Cluster Manager Quorum File |
-
|
5MB
|
| /dev/pv1/lvol2 |
/dev/raw/raw2 |
/u01/app/oracle/oradata/orcl/SharedSrvctlConfigFile |
Shared Configuration File |
-
|
100MB
|
| /dev/pv1/lvol3 |
/dev/raw/raw3 |
/u01/app/oracle/oradata/orcl/spfileorcl.ora |
Server Parameter File |
-
|
10MB
|
| /dev/pv1/lvol4 |
/dev/raw/raw4 |
/u01/app/oracle/oradata/orcl/control01.ctl |
Control File 1 |
-
|
200MB
|
| /dev/pv1/lvol5 |
/dev/raw/raw5 |
/u01/app/oracle/oradata/orcl/control02.ctl |
Control File 2 |
-
|
200MB
|
| /dev/pv1/lvol6 |
/dev/raw/raw6 |
/u01/app/oracle/oradata/orcl/control03.ctl |
Control File 3 |
-
|
200MB
|
| /dev/pv1/lvol7 |
/dev/raw/raw7 |
/u01/app/oracle/oradata/orcl/cwmlite01.dbf |
CWMLITE |
50MB
|
55MB
|
| /dev/pv1/lvol8 |
/dev/raw/raw8 |
/u01/app/oracle/oradata/orcl/drsys01.dbf |
DRSYS |
20MB
|
25MB
|
| /dev/pv1/lvol9 |
/dev/raw/raw9 |
/u01/app/oracle/oradata/orcl/example01.dbf |
EXAMPLE |
250MB
|
255MB
|
| /dev/pv1/lvol10 |
/dev/raw/raw10 |
/u01/app/oracle/oradata/orcl/indx01.dbf |
INDX |
100MB
|
105MB
|
| /dev/pv1/lvol11 |
/dev/raw/raw11 |
/u01/app/oracle/oradata/orcl/odm01.dbf |
ODM |
50MB
|
55MB
|
| /dev/pv1/lvol12 |
/dev/raw/raw12 |
/u01/app/oracle/oradata/orcl/system01.dbf |
SYSTEM |
800MB
|
805MB
|
| /dev/pv1/lvol13 |
/dev/raw/raw13 |
/u01/app/oracle/oradata/orcl/temp01.dbf |
TEMP |
250MB
|
255MB
|
| /dev/pv1/lvol14 |
/dev/raw/raw14 |
/u01/app/oracle/oradata/orcl/tools01.dbf |
TOOLS |
100MB
|
105MB
|
| /dev/pv1/lvol15 |
/dev/raw/raw15 |
/u01/app/oracle/oradata/orcl/undotbs01.dbf |
UNDOTBS1 |
400MB
|
405MB
|
| /dev/pv1/lvol16 |
/dev/raw/raw16 |
/u01/app/oracle/oradata/orcl/undotbs02.dbf |
UNDOTBS2 |
400MB
|
405MB
|
| /dev/pv1/lvol17 |
/dev/raw/raw17 |
/u01/app/ | | | | | | |