|
A Closer Look at
ADDM
By Kimberly Floss
New diagnostics
engine in Oracle 10g
helps detect and
diagnose performance
problems
In my last column, I
focused on one of
Oracle Database 10g's
quick wins for DBAs—using
the new SQL Tuning
Advisor to quickly
tune poorly
performing SQL
statements—and
touched only briefly
on the new built-in
diagnostic engine,
Automatic Database
Diagnostic Monitor (ADDM),
which helps you
easily identify
problematic SQL
statements (see "Advice
and Consent"
in the March/April
2004 issue for more
information about
using the SQL Tuning
Advisor and the
DBMS_SQLTUNE
package).
Identifying
high-load SQL
statements is just
one of ADDM's many
capabilities. A
self-diagnostic
engine built into
the Oracle Database
10g kernel,
ADDM automatically
detects and
diagnoses common
performance
problems, including:
-
Hardware issues
related to
excessive I/O
-
CPU bottlenecks
-
Connection
management
issues
-
Excessive
parsing
-
Concurrency
issues, such as
contention for
locks
-
PGA,
buffer-cache,
and
log-buffer-sizing
issues
-
Issues specific
to Oracle Real
Application
Clusters (RAC)
deployments,
such as global
cache hot blocks
and objects and
interconnect
latency issues
Let's continue
exploring the new
performance tuning
capabilities of
Oracle Database 10g
by taking a look at
ADDM.
Automatic, Effective
Problem Diagnosis
ADDM automatically
analyzes the
performance of the
database at regular
intervals (once an
hour by default) and
identifies any
performance issues.
ADDM's diagnostic
engine is modeled as
a classification
system that quickly
and effectively
sifts through the
statistical data in
Automatic Workload
Repository (AWR)—a
newly introduced
built-in repository
in Oracle Database
10g
containing
performance
statistics and
workload
information—and
evaluates problem
areas, based on
their impact on
overall database
performance. ADDM's
processing logic
encompasses best
practices and the
accumulated
expertise of tuning
professionals
throughout the
industry. Ultimately
ADDM presents a set
of "findings" that
not only identifies
the cause of
performance problems
but can also inform
administrators that
certain database
subcomponents, such
as I/O, are
functioning properly
and do not require
any further
investigation.
Automation is just
one of several
unique benefits of
ADDM. Another
important benefit is
ADDM's ability to
identify the root
cause of problems.
Just as physicians
achieve better
results for their
patients by treating
causes rather than
symptoms, DBAs can
achieve better
database performance
by finding the root
cause of performance
issues before making
changes to the
system. However,
finding a root cause
can be difficult and
time-consuming,
given that symptoms
can sometimes mask
the real problem.
ADDM can distinguish
between the cause
and symptoms of a
problem. It's able
to do so, in part,
by using the
extensive new data
(events, statistics)
generated by the
database at runtime.
Oracle database
kernel code has
always been
instrumented to
provide raw
performance data,
but in this release,
the instrumentation
is more
comprehensive.
For example, wait
events are more
granular, with
numerous locks and
latches now
separated into
distinct wait
events. Furthermore,
wait events are now
grouped into
classes—Application,
Administration,
Commit, Concurrency,
Network, User I/O,
System I/O, and
Configuration, for
example—to
facilitate
processing through
ADDM.
In addition to the
new wait-event
model, Oracle
Database 10g
also includes an
array of new
statistics that
provide performance
data on everything
from operating
system
performance—hard
disk I/O statistics,
CPU utilization, and
network statistics,
for example—to
cumulative
statistics at the
system and session
levels. One of the
most important new
statistics is
database time, or DB
time—the amount of
time the database is
actually working on
database calls.
It's About Time
ADDM uses DB time to
measure throughput;
it's the total time
spent by foreground
processes—such as a
read I/O—waiting for
a database resource,
running on the CPU,
and waiting for a
free CPU. The
overarching goal of
ADDM is to reduce
the value of DB time
across the system,
thereby improving
overall throughput.
Operational
information about
active
sessions—those using
the CPU—as well as
sessions waiting for
the CPU, is sampled
every second and
held in a scrolling
buffer in the server
memory—Active
Session History
(ASH), an important
new feature of
Oracle Database 10g.
ASH (V$ACTIVE_SESSION_HISTORY)
maintains historical
data; in previous
Oracle releases,
DBAs had information
about currently
active sessions
only— not historical
data—so figuring out
the cause of
performance problems
after the fact
wasn't easy.
Data from ASH, along
with other important
database statistics,
persists on the disk
in the form of
"snapshots."
Snapshots are taken
every hour and
stored for seven
days, by default,
but DBAs can change
settings for both
snapshot frequency
and storage
duration. AWR
provides the raw
information for ADDM
analysis, which
begins automatically
as soon as a new AWR
snapshot is taken.
Getting Started with
ADDM
Because ADDM runs
automatically after
each new AWR
snapshot is taken,
no manual steps are
required to generate
its findings. But
you can run ADDM on
demand by creating a
new snapshot
manually, by using
either Oracle
Enterprise Manager
(OEM) or the
command-line
interface. The
following shows
creation of a
snapshot from the
command line:
SQL> exec dbms_workload_repository.create_
snapshot();
PL/SQL procedure successfully completed.
After the new
snapshot is created,
its information (snap_id,
begin_interval_
time, and the like)
is populated to the
DBA_HIST_SNAPSHOT
dictionary view. A
few seconds after
you take the
snapshot, ADDM
brings new findings
to the surface,
based on analysis of
that snapshot. The
findings are
available on the new
OEM Web-based
console, on the
Database home page,
as shown in
Figure 1.
Figure 1: The Oracle
Enterprise Manager
Database home page
provides a quick
overview of your
database's
operation.
When you access the
database instance in
the new OEM console,
you see at a glance
how the database is
operating in
general, on a
dashboard-style home
page. You can
quickly verify that
the server has
adequate CPU and
memory resources and
get a general
overview of the
system from this one
page. The Active
Sessions pie chart
shows the current
distribution across
the server in three
key areas: Using
CPU, Waiting I/O,
and Waiting Other.
You can drill down
on any of these
items to see a line
graph of activity
over the last 24
hours (and change
this to values such
as 31 days, 7 days,
or just a few
minutes).
The results of the
latest ADDM run are
displayed in the
Performance Analysis
section of the home
page, as shown in
Figure 2,
which provides a
summary view of the
top ADDM findings:
the impact of the
diagnosed problem as
a percentage of
overall database
performance, a
description of the
finding, and a
recommendations
summary. The text of
each finding is a
hyperlink to
more-detailed
information on that
particular finding.
ADDM recommendations
can include running
one of the new
advisors—SQL Tuning
Advisor, SQL Access
Advisor, Space
Management Advisor,
and so on—when
applicable. For
example, a solution
for a particular
finding can involve
using SQL Tuning
Advisor to tune a
particular SQL
statement or adding
more resources to
the machine.
Clicking on the
recommendation link
displays the
detailed finding
view, which includes
the full text of any
recommendations.
Figure 2: A summary
of ADDM findings
displayed on the OEM
Database home page
Figure 3:
Performance details
provide recommended
solutions for quick
problem resolution.
You can also
generate an ADDM
report that
summarizes
performance data and
provides a list of
all findings and
recommendations, as
shown in
Figure 3.
You can access ADDM
reports through the
Web-based OEM
console or from a
SQL*Plus command
line by using the
new
DBMS_ADVISOR
built-in package.
For example, here's
how to use the
command line to
create an ADDM
report quickly
(based on the most
recent snapshot):
set long 1000000
set pagesize 50000
column get_clob format a80
select dbms_advisor.get_task_report(
task_name, 'TEXT', 'ALL')
as ADDM_report
from dba_advisor_tasks
where task_id=(
select max(t.task_id)
from dba_advisor_tasks t, dba_advisor_log l
where t.task_id = l.task_id
and t.advisor_name='ADDM'
and l.status= 'COMPLETED');
The ALL parameter
generates additional
information about
the meaning of some
of the elements in
the report.
Findings report the
impact of the
identified problem
as a percentage of
DB time, which
correlates with the
expected benefit,
based on the
assumption that the
problem described by
the finding will be
solved if the
recommended action
is taken. For
example, here's a
finding that
identifies a
configuration issue
and recommends
adjusting the
sga_target value in
the parameter file:
FINDING 3: 5.2% impact (147 seconds)
---------------------------------------
The buffer cache was undersized causing significant additional read I/O.
RECOMMENDATION 1: DB Configuration, 5.2% benefit (147 seconds)
ACTION: Increase SGA target size by increasing the value of parameter "sga_target" by 24 M.
SYMPTOMS THAT LED TO THE FINDING:
Wait class "User I/O" was consuming significant database time. (5.3% impact [150 seconds])
...
As you see, DBAs
have the flexibility
to decide how to
resolve ADDM
findings. In this
case, the DBA might
decide to use the
new automatic SGA
sizing feature,
rather than
modifying the
parameter.
Conclusion
Oracle Database 10g
provides DBAs with
comprehensive,
quantifiable
diagnostic
information that can
be used to
proactively monitor
and prevent problems
as well as respond
to them when they
arise. There's an
enormous amount of
new statistical
information
available to help
keep the database
running optimally.
More important, the
automatic
performance-diagnostic
capabilities of AWR
and ADDM will not
only help you
resolve critical
performance problems
quickly but will
also provide you
with hard-and-fast,
quantifiable data
for making the
business case for
new hardware,
additional staff,
application
development
initiatives, or
better training for
employees. And
Oracle plans to
continually update
ADDM's problem
classification tree
and diagnostic rules
to keep pace with
new features and
technologies
introduced in the
database and
underlying operating
platform.
|