Oracle disk RAID Technology
The need for the DBA to manually stripe data files across
multiple disk spindles has been lessened with the advent of
hardware and software RAID. The two most common levels of RAID
are RAID 10 (mirroring and striping) and RAID 5.
For all systems experiencing significant updates, Oracle
recommends using SAME (Stripe and Mirror Everywhere, a.k.a., RAID 1+0 ) due to the update penalty with RAID 5 architectures. The SAME approach is the foundation of the
Oracle10g Automatic Storage Management (ASM) approach.
The Horrors of RAID 5
Many disk vendors persist in pushing RAID 5 as a
viable solution for highly updated systems even though using
RAID 5 for an Oracle system with a high update rate can be
disastrous to performance.
In short, it is clearly troublesome for any company with high
volumes of updates to use RAID 5, and the goal of
Oracle10g with Automatic Storage Management (ASM) is to urge
Oracle customers to use RAID 1+0 .
The Oracle DBA will nevertheless be concerned about their most
critical performance area, the disk I/O sub-system, until such
time that solid-state disks are cheap enough to fully cache
large databases. The main points of this section include the
The old-fashion file placement rules still apply
if RAID is not being used, and Oracle data files must be
manually placed across the disk spindles to relieve I/O
RAID 5 is not recommended for high
update Oracle systems. The performance penalty from the parity
checking will greatly diminish Oracle performance.
Using a RAID 10 approach (striping
and mirroring) distributes data blocks across all of the disk
spindles, making hot disks a random and transient event.
Oracle10g continues to expand support for very
large RAM data buffers and buffer monitoring with the
v$db_cache_advice utility and Oracle10g Automatic
Memory Management (AMM).
Improving disk access speed will not help if disk
is not the source of the bottleneck. The top-5 STATSPACK wait events should be checked to ensure that disk
I/O is the bottleneck prior to undergoing expensive changes to
the disk I/O subsystem.
Many Oracle customers are using solid-state disk
for high I/O data files such as TEMP, UNDO, and REDO files.
Solid-state disk is becoming less expensive and may soon replace
traditional disk devices.
In sum, Oracle DBAs must be steadfast in attempting to
understand their disk I/O sub-system and making sure that disk
I/O does not impede the high performance of their systems.
RAID 5 can be deadly for high-update databases!
Even with lots of caching, the goal of almost all Oracle tuning
activities is to reduce I/O. Tuning the instance parameters,
sizing the library cache, tuning SQL statements all have the
common focus of reducing I/O. Even when physical disk I/O (POI)
is minimized, logical I/O (LIO) is still a major contributor to
response time. Even fully cached databases will run slowly if
sub-optimal SQL statements force thousands of unnecessary LIOs
against the buffer cache.
Time and time again, DBAs spend time and energy tuning a
component of their database that is not a top wait event, and
they are surprised to find that their change did not make a huge
difference in performance. For example, a faster CPU does not
help an I/O-bound system, and moving to faster disk does not
help a CPU-bound system.
All Oracle databases have some physical constraint, and it is
not always disk. The best way to find the constraints for the
system is to examine the top five wait events on the STATSPACK report.
The system is disk I/O bound if the majority of the wait time is
spent accessing data blocks. This can be
db file sequential read
waits, usually index
access, and db file
scattered read waits,
usually full-table scans:
Top 5 Timed Events
Event Waits Ela Time
--------------------------- ------------ ----------- --------
db file sequential read 2,598 48.54
db file scattered read 25,519 22.04
library cache load lock 673 9.26
CPU time 44 7.83
log file parallel write 19,157 5.68
Once the determination has been made that the Oracle database is
I/O-bound, the DBA should get a full grasp on the internals of
disk devices, layers of caching, and the configuration of disk
controllers and physical disk spindles.
An I/O bottleneck may also manifest itself with high CPU run
queue values combined with high idle times.
SERVER_NAME date hour runq pg_in pg_ot usr
------------- ------- ------ ---- ----- ---- -----
CSS-HP1 04/06/15 22 10 4 0 11
CSS-HP1 04/06/15 09 9 4 0 11
CSS-HP1 04/06/14 22 10 2 0 46
CSS-HP1 04/06/14 07 9 4 0 10
Normally, a high vmstat run queue, where the run queue values exceed
the number of CPUs on the server, indicates CPU overload.
However, when combined with high CPU idle times, the run queue
values indicate that the CPU is backed up waiting for I/O to
SEE CODE DEPOT FOR FULL SCRIPTS