| Automatic
Workload Repository
Learn to
use the new feature that collects
database performance statistics and
metrics for analysis and tuning, shows
the exact time spent in the database,
and even saves session information
When you have
a database performance problem, what is
the first thing you do to address it?
One common approach is to see if a
pattern exists: Answering questions such
as "Is the same problem recurrent?",
"Does it occur during a specific time
period?", and "Is there a link between
two problems?" will almost always lead
to a better diagnosis.
As a DBA you
probably have invested in a third-party
or homegrown tool to collect elaborate
statistics during database operation and
derive performance metrics from them. In
a crisis, you access those metrics for
comparisons to the present. Replaying
these past events can shed light on
current problems, so continuously
capturing relevant statistics becomes
important for performance analysis.
For some time,
Oracle's solution in this area has been
its built-in tool, Statspack. While it
can prove invaluable in certain cases,
it often lacks the robustness required
by performance troubleshooting
exercises. Oracle Database 10g
offers a significant improvement: the
Automatic Workload Repository (AWR). The
AWR installs along with the database and
captures not only statistics, but the
derived metrics as well.
A Quick Test
Drive
AWR capability is
best explained quickly by the report it
produces from collected statistics and
metrics, by running the script
awrrpt.sql in the $ORACLE_HOME/rdbms/admin
directory. This script, in its look and
feel, resembles Statspack; it shows all
the AWR snapshots available and asks for
two specific ones as interval
boundaries. It produces two types of
output: text format, similar to that of
the Statspack report but from the AWR
repository, and the default HTML format,
complete with hyperlinks to sections and
subsections, providing quite a
user-friendly report. Run the script and
take a look at the report now to get an
idea about capabilities of the AWR.
Implementation
Now let's explore
how AWR is designed and structured.
Basically, AWR is an Oracle built-in
tool that collects performance related
statistics and derives performance
metrics from them to track a potential
problem. Unlike Statspack, snapshots are
collected automatically every hour by a
new background process called MMON and
its slave processes. To save space, the
collected data is automatically purged
after 7 days. Both the snapshot
frequency and retention time can be
modified by the user. To see the present
settings, you could use:
select snap_interval, retention
from dba_hist_wr_control;
SNAP_INTERVAL RETENTION
------------------- -------------------
+00000 01:00:00.0 +00007 00:00:00.0
This SQL shows
that the snapshots are taken every hour
and the collections are retained 7 seven
days. To change the settings--say,
for snapshot intervals of 20 minutes and
a retention period of two days--you
would issue the following. The
parameters are specified in minutes.
begin
dbms_workload_repository.modify_snapshot_settings (
interval => 20,
retention => 2*24*60
);
end;
AWR uses several
tables to store the collected
statistics, all stored under the SYS
schema in the new special tablespace
named SYSAUX, and named in the format
WRM$_* and WRH$_*. The
former type stores metadata information
such as the database being examined and
the snapshots taken, and the latter type
holds the actual collected statistics.
(As you might have guessed, H stands for
"historical" and M stands for
"metadata.") There are several views
with the prefix DBA_HIST_ built upon
these tables, which can be used to write
your own performance diagnosis tool. The
names of the views directly relate to
the table; for example, the view
DBA_HIST_SYSMETRIC_SUMMARY is built upon
the table WRH$_SYSMETRIC_SUMMARY.
The AWR
history tables capture a lot more
information than Statspack, including
tablespace usage, filesystem usage, even
operating system statistics. A complete
list of these tables can be seen from
the data dictionary through:
select view_name from user_views where view_name like 'DBA\_HIST\_%' escape '\';
The view
DBA_HIST_METRIC_NAME defines the
important metrics the AWR collects, the
groups to which they belong, and the
unit in which they are collected. For
example, here is one record (in vertical
format):
DBID : 4133493568
GROUP_ID : 2
GROUP_NAME : System Metrics Long Duration
METRIC_ID : 2075
METRIC_NAME : CPU Usage Per Sec
METRIC_UNIT : CentiSeconds Per Second
It shows that
a metric "CPU Usage Per Sec" is measured
in units of "CentiSeconds Per Second"
and belongs to a metric group "System
Metrics Long Duration." This record can
be joined with other tables such as
DBA_HIST_SYSMETRIC_SUMMARY to get the
activity, as in:
select begin_time, intsize, num_interval, minval, maxval, average, standard_deviation sd
from dba_hist_sysmetric_summary where metric_id = 2075;
BEGIN INTSIZE NUM_INTERVAL MINVAL MAXVAL AVERAGE SD
----- ---------- ------------ ------- ------- -------- ----------
11:39 179916 30 0 33 3 9.81553548
11:09 180023 30 21 35 28 5.91543912
... and so on ...
Here we see how
the CPU time was consumed in
centi-seconds. The standard deviation
adds to our analysis by helping
ascertain whether the average figure
reflects the actual workload. In the
first records, the average is 3
centi-seconds in CPU per second elapsed,
but the standard deviation is 9.81,
meaning the average of 3 is not
reflective of the workload. In the
second example, the value 28, with a
standard deviation of 5.9, is more
representative. This type of information
trends help understanding the effects of
several environmental parameters on
performance metrics.
Using the
Statistics
So far we have
seen what AWR collects; now let's see
what it does with the data.
Most performance
problems do not exist in isolation, but
rather leave tell-tale signs that will
lead to the eventual root cause of the
problem. Let's use a typical tuning
exercise: You notice that the system is
slow and decide to look into the waits.
Your examination reveals that the
"buffer busy wait" is very high. What
could be the problem? There are several
possibilities: there could be a
monotonically increasing index, a table
so packed that a single block is asked
to be loaded to memory very quickly, or
some other factors. In any case, first
you want identify the segment in
question. If it's an index segment, you
could decide to rebuild it; change it to
a reverse key index; or convert it to a
hash-partitioned index introduced in
Oracle Database 10g. If it's a
table, you could consider changing
storage parameters to make it less dense
or move it over to a tablespace with
automatic segment space management.
Your plan of
attack is generally methodical and
usually based your knowledge of various
events and your experience in dealing
with them. Now imagine if the same thing
were done by an engine - an engine that
captures metrics and deduces possible
plans based on pre-determined logic.
Wouldn't your job be easier?
That engine, now
available in Oracle Database 10g,
is known as Automatic Database
Diagnostic Monitor (ADDM). To arrive at
a decision, ADDM uses the data collected
by AWR. In the above discussion, ADDM
can see that the buffer busy waits are
occurring, pull the appropriate data to
see the segments on which it occurs,
evaluate its nature and composition, and
finally offer solutions to the DBA.
After each snapshot collection by AWR,
the ADDM is invoked to examine the
metrics and generate recommendations.
So, in effect you have a full-time
robotic DBA analyzing the data and
generating recommendations proactively,
freeing you to attend to more strategic
issues.
To see the ADDM
recommendations and the AWR repository
data, use the new Enterprise Manager 10g
console on the page named DB Home. To
see the AWR reports, you can navigate to
them from Administration, then Workload
Repository, and then Snapshots. We'll
examine ADDM in greater detail in a
future installment.
You can also
specify alerts to be generated based on
certain conditions. These alerts, known
as Server Generated Alerts, are pushed
to an Advanced Queue, from where they
can be consumed by any client listening
to it. One such client is Enterprise
Manager 10g, where the alerts
are displayed prominently.
Time Model
When you have a
performance problem, what comes to mind
first to reduce the response time?
Obviously, you want to eliminate (or
reduce) the root cause of the factor
that adds to the time. How do you know
where the time was spent--not
waiting, but actually doing the work?
Oracle Database
10g introduces time models for
identifying the time spent in various
places. The overall system time spent is
recorded in the view
V$SYS_TIME_MODEL. Here is the query
and its output.
STAT_NAME VALUE
------------------------------------- --------------
DB time 58211645
DB CPU 54500000
background cpu time 254490000
sequence load elapsed time 0
parse time elapsed 1867816
hard parse elapsed time 1758922
sql execute elapsed time 57632352
connection management call elapsed time 288819
failed parse elapsed time 50794
hard parse (sharing criteria) elapsed time 220345
hard parse (bind mismatch) elapsed time 5040
PL/SQL execution elapsed time 197792
inbound PL/SQL rpc elapsed time 0
PL/SQL compilation elapsed time 593992
Java execution elapsed time 0
bind/define call elapsed time 0
Note the
statistic named DB Time, which
represents the time spent in the
database since the instance startup. Run
the sample workload and select the
statistic value from the view again. The
difference should represent the time
spent in the database for that workload.
After another round of tuning, perform
the same analysis and that difference
will show the change in DB Time after
the tuning, which can be compared to
first change to examine the effect of
the tuning exercise on the database
time.
In addition to
the database time, the
V$SYS_TIME_MODEL view shows a whole
lot of other statistics, such as time
spent in different types of parsing and
even PL/SQL compilation.
This view shows
the overall system times as well;
however, you may be interested in a more
granular view: the session level times.
The timing stats are captured at the
session level as well, as shown in the
view V$SESS_TIME_MODEL, where
all the stats of the current connected
sessions, both active and inactive, are
visible. The additional column SID
specifies the SID of the sessions for
which the stats are shown.
In previous
releases, this type of analysis was
impossible to get and the user was
forced to guess or derive from a variety
of sources. In Oracle Database 10g,
getting this information is a snap.
Active Session
History
The view
V$SESSION in Oracle Database 10g
has been improved; the most valuable
improvement of them all is the inclusion
of wait events and their duration,
eliminating the need to see the view
V$SESSION_WAIT. However, since this view
merely reflects the values in real time,
some of the important information is
lost when it is viewed later. For
instance, if you select from this view
to check if any session is waiting for
any non-idle event, and if so, the event
in question, you may not find anything
because the wait must have been over by
the time you select it.
Enter the new
feature Active Session History (ASH),
which, like AWR, stores the session
performance statistics in a buffer for
analysis later. However, unlike AWR, the
storage is not persistent in a table but
in memory, and is shown in the view
V$ACTIVE_SESSION_HISTORY. The data
is polled every second and only the
active sessions are polled. As time
progresses, the old entries are removed
to accommodate new ones in a circular
buffer and shown in the view. To find
out how many sessions waited for some
event, you would use
select session_id||','||session_serial# SID, n.name, wait_time, time_waited
from v$active_session_history a, v$event_name n
where n.event# = a.event#
This command
tells you the name of the event and how
much time was spent in waiting. If you
want to drill down to a specific wait
event, additional columns of ASH help
you with that as well. For instance, if
one of the events the sessions waited on
is buffer busy wait, proper diagnosis
must identify the segments on which the
wait event occurred. You get that from
the ASH view column CURRENT_OBJ#, which
can then be joined with DBA_OBJECTS to
get the segments in question.
ASH also records
parallel query server sessions, useful
to diagnose the parallel query wait
events. If the record is for a parallel
query slave process, the SID of the
coordinator server session is identified
by QC_SESSION_ID column. The column
SQL_ID records the ID of the SQL
statement that produced the wait event,
which can be joined with the V$SQL view
to get the offending SQL statement. To
facilitate the identification of the
clients in a shared user environment
like a web application, the CLIENT_ID
column is also shown, which can be set
by DBMS_SESSION.SET_IDENTIFIER.
Since ASH
information is so valuable, wouldn't it
be nice if it were stored in a
persistent manner similar to AWR?
Fortunately, it is; the information is
flushed to the disk by the MMON slave to
the AWR table, visible through the view
DBA_HIST_ACTIVE_SESS_HISTORY.
Manual
Collection
Snapshots are
collected automatically by default, but
you can also collect them on demand. All
AWR functionality has been implemented
in the package
DBMS_WORKLOAD_REPOSITORY. To take a
snapshot, simply issue:
execute dbms_workload_repository.create_snapshot
It immediately
takes a snapshot, recorded in the table
WRM$_SNAPSHOT. The metrics collected are
for the TYPICAL level. If you
want to collect more detailed
statistics, you can set the parameter
FLUSH_LEVEL to ALL in
the above procedure. The stats are
deleted automatically but can also be
deleted manually by calling the
procedure drop_snapshot_range().
Baseline
A typical
performance tuning exercise starts with
a capturing a baseline set of metrics,
making changes, and then taking another
baseline set. These two sets can be
compared to examine the effect of the
changes made. In AWR, the same kind of
analogy can be implemented for existing
snapshots taken. Suppose a particularly
resource intensive process named
apply_interest ran between 1:00 and
3:00PM, corresponding to snapshot IDs 56
through 59. We could define a baseline
named apply_interest_1 for these
snapshots:
exec dbms_workload_repository.create_baseline (56,59,'apply_interest_1')
This action marks
the snapshots 56 through 59 as part of a
baseline named above. Checking for
existing baselines:
select * from dba_hist_baseline;
DBID BASELINE_ID BASELINE_NAME START_SNAP_ID END_SNAP_ID
---------- ----------- -------------------- ------------- -----------
4133493568 1 apply_interest_1 56 59
After a few
tuning steps, we can create another
baseline--called, say
apply_interest_2--and compare
the metrics for only those snapshots
related to these two baselines.
Isolating snapshots to only a few sets
like this helps in studying the effects
of tuning on performance metrics. You
can drop the baselines after the
analysis using the procedure
drop_baseline(); the snapshots will
be preserved. Also, when the purge
routine kicks in to delete the old
snapshots, the ones related to baselines
are not purged, allowing for further
analysis.
Conclusion
This installment
was intended to be merely an
introduction to the very rudimentary
aspects of the AWR. For a more complete
coverage, see Oracle Database 10g
documentation. Furthermore, an
excellent treatise on AWR and ADDM can
be found in the technical whitepaper
The Self-Managing Database:
Automatic Performance Diagnosis.
In Week 15, you will learn more about
ADDM and using it to solve real-life
problems. |