| Oracle RAC
O2CB Cluster Service
Before we can do
anything with OCFS2
like formatting or
mounting the file
system, we need to
first have OCFS2's
cluster stack, O2CB,
running (which it
will be as a result
of the configuration
process performed
above). The
stack includes the
following services:
- NM:
Node Manager
that keep track
of all the nodes
in the
cluster.conf
- HB:
Heart beat
service that
issues up/down
notifications
when nodes join
or leave the
cluster
- TCP:
Handles
communication
between the
nodes
- DLM:
Distributed lock
manager that
keeps track of
all locks, its
owners and
status
- CONFIGFS:
User space
driven
configuration
file system
mounted at /config
- DLMFS:
User space
interface to the
kernel space DLM
All of the above
cluster services
have been packaged
in the o2cb
system service (/etc/init.d/o2cb).
Here is a short
listing of some of
the more useful
commands and options
for the o2cb
system service.
-
/etc/init.d/o2cb
status
Module "configfs": Not loaded
Filesystem "configfs": Not mounted
Module "ocfs2_nodemanager": Not loaded
Module "ocfs2_dlm": Not loaded
Module "ocfs2_dlmfs": Not loaded
Filesystem "ocfs2_dlmfs": Not mounted
Note that
with this
example, all of
the services are
not loaded. I
did an "unload"
right before
executing the
"status" option.
If you were to
check the status
of the o2cb
service
immediately
after
configuring OCFS
using
ocfs2console
utility, they
would all be
loaded.
-
/etc/init.d/o2cb
load
Loading module "configfs": OK
Mounting configfs filesystem at /config: OK
Loading module "ocfs2_nodemanager": OK
Loading module "ocfs2_dlm": OK
Loading module "ocfs2_dlmfs": OK
Mounting ocfs2_dlmfs filesystem at /dlm: OK
Loads all
OCFS modules.
-
/etc/init.d/o2cb
online ocfs2
Starting cluster ocfs2: OK
The above
command will
online the
cluster we
created, ocfs2.
-
/etc/init.d/o2cb
offline ocfs2
Unmounting ocfs2_dlmfs filesystem: OK
Unloading module "ocfs2_dlmfs": OK
Unmounting configfs filesystem: OK
Unloading module "configfs": OK
The above
command will
offline the
cluster we
created, ocfs2.
-
/etc/init.d/o2cb
unload
Cleaning heartbeat on ocfs2: OK
Stopping cluster ocfs2: OK
The above
command will
unload all OCFS
modules.
Configure O2CB
to Start on Boot
You now need to
configure the
on-boot properties
of the OC2B driver
so that the cluster
stack services will
start on each boot.
All the tasks within
this section will
need to be performed
on both nodes in
the cluster.
Note:
At the time of
writing this guide,
OCFS2 contains a bug
wherein the driver
does not get loaded
on each boot even
after configuring
the on-boot
properties to do so.
After attempting to
configure the
on-boot properties
to start on each
boot according to
the official OCFS2
documentation, you
will still get the
following error on
each boot:
...
Mounting other filesystems:
mount.ocfs2: Unable to access cluster service
Cannot initialize cluster mount.ocfs2:
Unable to access cluster service Cannot initialize cluster [FAILED]
...
Red Hat changed the
way the service is
registered between
chkconfig-1.3.11.2-1
and
chkconfig-1.3.13.2-1.
The O2CB script used
to work with the
former.
Before attempting
to configure the
on-boot properties:
- REMOVE the
following lines
in
/etc/init.d/o2cb
### BEGIN INIT INFO
# Provides: o2cb
# Required-Start:
# Should-Start:
# Required-Stop:
# Default-Start: 2 3 5
# Default-Stop:
# Description: Load O2CB cluster services at system boot.
### END INIT INFO
- Re-register
the o2cb
service.
# chkconfig --del o2cb
# chkconfig --add o2cb
# chkconfig --list o2cb
o2cb 0:off 1:off 2:on 3:on 4:on 5:on 6:off
# ll /etc/rc3.d/*o2cb*
lrwxrwxrwx 1 root root 14 Sep 29 11:56 /etc/rc3.d/S24o2cb -> ../init.d/o2cb
The service
should be
S24o2cb in
the default
runlevel.
After resolving
this bug, you can
continue to set the
on-boot properties
as follows:
# /etc/init.d/o2cb offline ocfs2
# /etc/init.d/o2cb unload
# /etc/init.d/o2cb configure
Configuring the O2CB driver.
This will configure
the on-boot
properties of the
O2CB driver. The
following questions
will determine
whether the driver
is loaded on boot.
The current values
will be shown in
brackets ('[]').
Hitting <ENTER>
without typing an
answer will keep
that current value.
Ctrl-C will abort.
Load O2CB driver on boot (y/n) [n]: y
Cluster to start on boot (Enter "none" to clear) [ocfs2]: ocfs2
Writing O2CB configuration: OK
Loading module "configfs": OK
Mounting configfs filesystem at /config: OK
Loading module "ocfs2_nodemanager": OK
Loading module "ocfs2_dlm": OK
Loading module "ocfs2_dlmfs": OK
Mounting ocfs2_dlmfs filesystem at /dlm: OK
Starting cluster ocfs2: OK
Format the
OCFS2 Filesystem
If the O2CB
cluster is offline,
start it. The format
operation needs the
cluster to be
online, as it needs
to ensure that the
volume is not
mounted on some node
in the cluster.
Create the
OCFS2 Filesystem
Unlike the other
tasks in this
section, creating
the OCFS2 filesystem
should only be
executed on one
node in the RAC
cluster. You
will be executing
all commands in this
section from
linux1 only.
Note that it is
possible to create
and mount the OCFS2
file system using
either the GUI tool
ocfs2console
or the command-line
tool mkfs.ocfs2.
From the
ocfs2console
utility, use the
menu [Tasks] -
[Format].
See the
instructions below
on how to create the
OCFS2 file system
using the
command-line tool
mkfs.ocfs2.
To create the
filesystem, use the
Oracle executable
mkfs.ocfs2.
For the purpose of
this example, I run
the following
command only from
linux1 as
the root
user account:
$ su -
# mkfs.ocfs2 -b 4K -C 32K -N 4 -L oradatafiles /dev/sda1
mkfs.ocfs2 1.0.2
Filesystem label=oradatafiles
Block size=4096 (bits=12)
Cluster size=32768 (bits=15)
Volume size=1011675136 (30873 clusters) (246984 blocks)
1 cluster groups (tail covers 30873 clusters, rest cover 30873 clusters)
Journal size=16777216
Initial number of node slots: 4
Creating bitmaps: done
Initializing superblock: done
Writing system files: done
Writing superblock: done
Writing lost+found: done
mkfs.ocfs2 successful
Mount the
OCFS2 Filesystem
Now that the file
system is created,
you can mount it.
Let's first do it
using the
command-line, then
I'll show how to
include it in the
/etc/fstab
to have it mount on
each boot. Mounting
the filesystem will
need to be performed
on all nodes in
the Oracle RAC
cluster as the
root user
account.
First, here is
how to manually
mount the OCFS2 file
system from the
command line.
Remember, this needs
to be performed as
the root
user account:
$ su -
# mount -t ocfs2 -o datavolume /dev/sda1 /u02/oradata/orcl
If the mount was
successful, you will
simply got your
prompt back. You
should, however, run
the following checks
to ensure the fil
system is mounted
correctly.
Let's use the
mount command
to ensure that the
new filesystem is
really mounted. This
should be performed
on all nodes in the
RAC cluster:
# mount
/dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw)
none on /proc type proc (rw)
none on /sys type sysfs (rw)
none on /dev/pts type devpts (rw,gid=5,mode=620)
usbfs on /proc/bus/usb type usbfs (rw)
/dev/hda1 on /boot type ext3 (rw)
none on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
cartman:SHARE2 on /cartman type nfs (rw,addr=192.168.1.120)
configfs on /config type configfs (rw)
ocfs2_dlmfs on /dlm type ocfs2_dlmfs (rw)
/dev/sda1 on /u02/oradata/orcl type ocfs2 (rw,_netdev,datavolume)
Note:
You are using the
datavolume
option to mount the
new filesystem here.
Oracle database
users must
mount any volume
that will contain
the Voting Disk
file, Cluster
Registry (OCR), Data
files, Redo logs,
Archive logs, and
Control files with
the datavolume
mount option so as
to ensure that the
Oracle processes
open the files with
the o_direct
flag.
Any other type of
volume, including an
Oracle home (not
used in this guide),
should not be
mounted with this
mount option.
The volume will
mount after a short
delay, usually
around five seconds.
It does so to let
the heartbeat thread
stabilize. In a
future release,
Oracle plans to add
support for a global
heartbeat, which
will make most
mounts
instantaneous.
Configure OCFS
to Mount
Automatically at
Startup
Let's review what
you've done so far.
You downloaded and
installed OCFS2,
which will be used
to store the files
needed by Cluster
Manager files.
After going through
the install, you
loaded the OCFS2
module into the
kernel and then
formatted the
clustered
filesystem. Finally,
you mounted the
newly created
filesystem. This
section walks
through the steps
responsible for
mounting the new
OCFS2 file system
each time the
machine(s) are
booted.
Start by adding
the following line
to the
/etc/fstab file
on all nodes in
the RAC cluster:
/dev/sda1 /u02/oradata/orcl ocfs2 _netdev,datavolume 0 0
Notice the
_netdev option
for mounting this
filesystem. The
_netdev mount
option is a must for
OCFS2 volumes; it
indicates that the
volume is to be
mounted after the
network is started
and dismounted
before the network
is shutdown.
Now, let's make
sure that the
ocfs2.ko kernel
module is being
loaded and that the
file system will be
mounted during the
boot process.
If you have been
following along with
the examples in this
article, the actions
to load the kernel
module and mount the
OCFS2 file system
should already be
enabled. However,
you should still
check those options
by running the
following on all
nodes in the RAC
cluster as the
root user
account:
$ su -
# chkconfig --list o2cb
o2cb 0:off 1:off 2:on 3:on 4:on 5:on 6:off
The flags that I
have marked in
bold should be
set to "on".
Check
Permissions on New
OCFS2 Filesystem
Use the ls
command to check
ownership. The
permissions should
be set to 0775 with
owner "oracle"
and group "dba".
If this is not the
case for all nodes
in the cluster
(which was the case
for me), then it is
very possible that
the "oracle"
UID (175 in
this example) and/or
the "dba"
GID (115 in
this example) are
not the same across
all nodes.
Let's first check
the permissions:
# ls -ld /u02/oradata/orcl
drwxr-xr-x 3 root root 4096 Sep 29 12:11 /u02/oradata/orcl
As you can see from
the listing above,
the oracle
user account (and
the dba
group) will not be
able to write to
this directory.
Let's fix that:
# chown oracle.dba /u02/oradata/orcl
# chmod 775 /u02/oradata/orcl
Let's now go back
and re-check that
the permissions are
correct for each
node in the cluster:
# ls -ld /u02/oradata/orcl
drwxrwxr-x 3 oracle dba 4096 Sep 29 12:11 /u02/oradata/orcl
Adjust the
O2CB Heartbeat
Threshold
This is a very
important section
when configuring
OCFS2 for use by
Oracle Clusterware's
two shared files on
our FireWire drive.
During testing, I
was able to install
and configure OCFS2,
format the new
volume, and finally
install Oracle
Clusterware (with
its two required
shared files; the
voting disk and OCR
file), located on
the new OCFS2
volume. I was able
to install Oracle
Clusterware and see
the shared drive,
however, during my
evaluation I was
receiving many
lock-ups and hanging
after about 15
minutes when the
Clusterware software
was running on both
nodes. It always
varied on which node
would hang (either
linux1 or
linux2 in
my example). It also
didn't matter
whether there was a
high I/O load or
none at all for it
to crash (hang).
Keep in mind that
the configuration
you are creating is
a rather low-end
setup being
configured with slow
disk access with
regards to the
FireWire drive. This
is by no means a
high-end setup and
susceptible to bogus
timeouts.
After looking
through the trace
files for OCFS2, it
was apparent that
access to the voting
disk was too slow
(exceeding the O2CB
heartbeat threshold)
and causing the
Oracle Clusterware
software (and the
node) to crash.
The solution I
used was to simply
increase the O2CB
heartbeat threshold
from its default
setting of 7, to 301
(and in some cases
as high as 900).
This is a
configurable
parameter that is
used to compute the
time it takes for a
node to "fence"
itself.
First, let's see
how to determine
what the O2CB
heartbeat threshold
is currently set to.
This can be done by
querying the
/proc file
system as follows:
# cat /proc/fs/ocfs2_nodemanager/hb_dead_threshold
7
The value is 7, but
what does this value
represent? Well, it
is used in the
formula below to
determine the fence
time (in seconds):
[fence time in seconds] = (O2CB_HEARTBEAT_THRESHOLD - 1) * 2
So, with a O2CB
heartbeat threshold
of 7, you would have
a fence time of:
(7 - 1) * 2 = 12 seconds
You need a much
larger threshold
(600 seconds to be
exact) given your
slower FireWire
disks. For 600
seconds, you will
want a
O2CB_HEARTBEAT_THRESHOLD
of 301 as shown
below:
(301 - 1) * 2 = 600 seconds
Let's see now how
to increase the O2CB
heartbeat threshold
from 7 to 301. This
will need to be
performed on both
nodes in the
cluster. You first
need to modify the
file
/etc/sysconfig/o2cb
and set
O2CB_HEARTBEAT_THRESHOLD
to 301:
# O2CB_ENABELED: 'true' means to load the driver on boot.
O2CB_ENABLED=true
# O2CB_BOOTCLUSTER: If not empty, the name of a cluster to start.
O2CB_BOOTCLUSTER=ocfs2
# O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered dead.
O2CB_HEARTBEAT_THRESHOLD=301
After modifying
the file
/etc/sysconfig/o2cb,
you need to alter
the o2cb
configuration.
Again, this should
be performed on all
nodes in the
cluster.
# umount /u02/oradata/orcl/
# /etc/init.d/o2cb unload
# /etc/init.d/o2cb configure
Load O2CB driver on boot (y/n) [y]: y
Cluster to start on boot (Enter "none" to clear) [ocfs2]: ocfs2
Writing O2CB configuration: OK
Loading module "configfs": OK
Mounting configfs filesystem at /config: OK
Loading module "ocfs2_nodemanager": OK
Loading module "ocfs2_dlm": OK
Loading module "ocfs2_dlmfs": OK
Mounting ocfs2_dlmfs filesystem at /dlm: OK
Starting cluster ocfs2: OK
You can now check
again to make sure
the settings took
place in for the
o2cb cluster stack:
# cat /proc/fs/ocfs2_nodemanager/hb_dead_threshold
301
Important
Note: The
value of 301 used
for the O2CB
heartbeat threshold
will not work for
all the FireWire
drives listed in
this guide. Use the
following chart to
determine the O2CB
heartbeat threshold
value that should be
used.
|
FireWire
Drive
|
O2CB
Heartbeat
Threshold
Value
|
|
Maxtor
OneTouch II
300GB USB
2.0 / IEEE
1394a
External
Hard Drive -
(E01G300) |
301
|
|
Maxtor
OneTouch II
250GB USB
2.0 / IEEE
1394a
External
Hard Drive -
(E01G250) |
301
|
|
Maxtor
OneTouch II
200GB USB
2.0 / IEEE
1394a
External
Hard Drive -
(E01A200) |
301
|
|
LaCie
Hard Drive,
Design by
F.A. Porsche
250GB,
FireWire 400
- (300703U) |
600
|
|
LaCie
Hard Drive,
Design by
F.A. Porsche
160GB,
FireWire 400
- (300702U) |
600
|
|
LaCie
Hard Drive,
Design by
F.A. Porsche
80GB,
FireWire 400
- (300699U) |
600
|
|
Dual
Link Drive
Kit,
FireWire
Enclosure,
ADS
Technologies
- (DLX185) |
901
|
|
Maxtor
OneTouch
250GB USB
2.0 / IEEE
1394a
External
Hard Drive -
(A01A250) |
600
|
|
Maxtor
OneTouch
200GB USB
2.0 / IEEE
1394a
External
Hard Drive -
(A01A200) |
600
|
Reboot Both
Nodes
Before starting
the next section,
this would be a good
place to reboot all
of the nodes in the
RAC cluster. When
the machines come
up, ensure that the
cluster stack
services are being
loaded and the new
OCFS2 file system is
being mounted:
# mount
/dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw)
none on /proc type proc (rw)
none on /sys type sysfs (rw)
none on /dev/pts type devpts (rw,gid=5,mode=620)
usbfs on /proc/bus/usb type usbfs (rw)
/dev/hda1 on /boot type ext3 (rw)
none on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
cartman:SHARE2 on /cartman type nfs (rw,addr=192.168.1.120)
configfs on /config type configfs (rw)
ocfs2_dlmfs on /dlm type ocfs2_dlmfs (rw)
/dev/sda1 on /u02/oradata/orcl type ocfs2 (rw,_netdev,datavolume)
You should also
verify that the O2CB
heartbeat threshold
is set correctly (to
our new value of
301):
# cat /proc/fs/ocfs2_nodemanager/hb_dead_threshold
301
How to
Determine OCFS2
Version
To determine
which version of
OCFS2 is running,
use:
# cat /proc/fs/ocfs2/version
OCFS2 1.0.4 Fri Aug 26 12:31:58 PDT 2005 (build 0a22e88ab648dc8d2a1f9d7796ad101c)
|