|
Data movemement gets
a big lift with
Oracle Database 10g
utilities
Until now, the
export/import
toolset has been the
utility of choice
for transferring
data across multiple
platforms with
minimal effort,
despite common
complaints about its
lack of speed.
Import merely reads
each record from the
export dump file and
inserts it into the
target table using
the usual INSERT
INTO command,
so it's no surprise
that import can be a
slow process.
Enter
Oracle Data Pump,
the newer and faster
sibling of the
export/import
toolkit in Oracle
Database 10g,
designed to speed up
the process many
times over.
Data Pump reflects a
complete overhaul of
the export/import
process. Instead of
using the usual SQL
commands, it
provides proprietary
APIs to load and
unload data
significantly
faster. In my tests,
I have seen
performance
increases of 10-15
times over export in
direct mode and
5-times-over
performance
increases in the
import process. In
addition, unlike
with the export
utility, it is
possible to extract
only specific types
of objects such as
procedures.
Data Pump Export
The new utility is
known as expdp to
differentiate it
from exp, the
original export. In
this example, we
will use Data Pump
to export a large
table, CASES, about
3GB in size. Data
Pump uses file
manipulation on the
server side to
create and read
files; hence,
directories are used
as locations. In
this case, we are
going to use the
filesystem
/u02/dpdata1 to hold
the dump files.
create directory dpdata1 as '/u02/dpdata1';
grant read, write on directory dpdata1 to ananda;
Next, we will export
the data:
expdp ananda/abc123 tables=CASES directory=DPDATA1
dumpfile=expCASES.dmp job_name=CASES_EXPORT
Let's analyze
various parts of
this command. The
userid/password
combination, tables,
and dumpfile
parameters are
self-explanatory.
Unlike the original
export, the file is
created on the
server (not the
client). The
location is
specified by the
directory parameter
value DPDATA1,
which points to
/u02/dpdata1 as
created earlier. The
process also creates
a log file, again on
the server, in the
location specified
by the directory
parameter. By
default, a directory
named DPUMP_DIR is
used by this
process; so it can
be created instead
of the DPDATA1.
Note the parameter
job_name above, a
special one not
found in the
original export. All
Data Pump work is
done though jobs.
Data Pump jobs,
unlike DBMS jobs,
are merely server
processes that
process the data on
behalf of the main
process. The main
process, known as a
master control
process, coordinates
this effort via
Advanced Queuing; it
does so through a
special table
created at runtime
known as a master
table. In our
example, if you
check the schema of
the user ANANDA
while expdp is
running you will
notice the existence
of a table
CASES_EXPORT,
corresponding to the
parameter
job_name. This
table is dropped
when expdp
finishes.
Export Monitoring
While Data Pump
Export (DPE) is
running, press
Control-C; it will
stop the display of
the messages on the
screen, but not the
export process
itself. Instead, it
will display the DPE
prompt as shown
below. The process
is now said to be in
"interactive" mode:
Export>
This approach allows
several commands to
be entered on that
DPE job. To find a
summary, use the
STATUS command
at the prompt:
Export> status
Job: CASES_EXPORT
Operation: EXPORT
Mode: TABLE
State: EXECUTING
Degree: 1
Job Error Count: 0
Dump file: /u02/dpdata1/expCASES.dmp
bytes written = 2048
Worker 1 Status:
State: EXECUTING
Object Schema: DWOWNER
Object Name: CASES
Object Type: TABLE_EXPORT/TBL_TABLE_DATA/TABLE/TABLE_DATA
Completed Objects: 1
Total Objects: 1
Completed Rows: 4687818
Remember, this is
merely the status
display. The export
is working in the
background. To
continue to see the
messages on the
screen, use the
command
CONTINUE_CLIENT
from the Export>
prompt.
Parallel Operation
You can accelerate
jobs significantly
using more than one
thread for the
export, through the
PARALLEL parameter.
Each thread creates
a separate dumpfile,
so the parameter
dumpfile should have
as many entries as
the degree of
parallelism. Instead
of entering each one
explicitly, you can
specify wildcard
characters as
filenames such as:
expdp ananda/abc123 tables=CASES directory=DPDATA1
dumpfile=expCASES_%U.dmp parallel=4 job_name=Cases_Export
Note how the
dumpfile parameter
has a wild card
%U, which
indicates the files
will be created as
needed and the
format will be
expCASES_nn.dmp,
where nn starts at
01 and goes up as
needed.
In parallel mode,
the status screen
will show four
worker processes.
(In default mode,
only one process
will be visible.)
All worker processes
extract data
simultaneously and
show their progress
on the status
screen.
It's important to
separate the I/O
channels for access
to the database
files and the
dumpfile directory
filesystems.
Otherwise, the
overhead associated
with maintaining the
Data Pump jobs may
outweigh the
benefits of parallel
threads and hence
degrade performance.
Parallelism will be
in effect only if
the number of tables
is higher than the
parallel value and
the tables are big.
Database Monitoring
You can get more
information on the
Data Pump jobs
running from the
database views, too.
The main view to
monitor the jobs is
DBA_DATAPUMP_JOBS,
which tells you how
many worker
processes (column
DEGREE) are working
on the job. The
other view that is
important is
DBA_DATAPUMP_SESSIONS,
which when joined
with the previous
view and V$SESSION
gives the SID of the
session of the main
foreground process.
select sid, serial#
from v$session s, dba_datapump_sessions d
where s.saddr = d.saddr;
This instruction
shows the session of
the foreground
process. More useful
information is
obtained from the
alert log. When the
process starts up,
the MCP and the
worker processes are
shown in the alert
log as follows:
kupprdp: master process DM00 started with pid=23, OS id=20530 to execute -
SYS.KUPM$MCP.MAIN('CASES_EXPORT', 'ANANDA');
kupprdp: worker
process DW01 started
with worker id=1,
pid=24, OS id=20532
to execute -
SYS.KUPW$WORKER.MAIN('CASES_EXPORT',
'ANANDA');
kupprdp: worker
process DW03 started
with worker id=2,
pid=25, OS id=20534
to execute -
SYS.KUPW$WORKER.MAIN('CASES_EXPORT',
'ANANDA');
It shows the PID of
the sessions started
for the data pump
operation. You can
find the actual SIDs
using this query:
select sid, program from v$session where paddr in
(select addr from v$process where pid in (23,24,25));
The PROGRAM column
will show the
process DM (for
master process) or
DW (the worker
proceses),
corresponding to the
names in the alert
log file. If a
parallel query is
used by a worker
process, say for SID
23, you can see it
in the view
V$PX_SESSION to find
it out. It will show
you all the parallel
query sessions
running from the
worker process
represented by SID
23:
select sid from v$px_session where qcsid = 23;
Additional useful
information can be
obtained from the
view
V$SESSION_LONGOPS to
predict the time it
will take to
complete the job.
select sid, serial#, sofar, totalwork
from v$session_longops
where opname = 'CASES_EXPORT'
and sofar != totalwork;
The column totalwork
shows the total
amount of work, of
which the sofar
amount has been done
up until now--which
you can then use to
estimate how much
longer it will take.
Data Pump Import
Data import
performance is where
Data Pump really
stands out, however.
To import the data
exported earlier, we
will use
impdp ananda/abc123 directory=dpdata1 dumpfile=expCASES.dmp job_name=cases_import
The default behavior
of the import
process is to create
the table and all
associated objects,
and to produce an
error when the table
exists. Should you
want to append the
data to the existing
table, you could use
TABLE_EXISTS_ACTION=APPEND
in the above command
line.
As with Data Pump
Export, pressing
Control-C on the
process brings up
the interactive mode
of Date Pump Import
(DPI); again, the
prompt is
Import>.
Operating on
Specific Objects
Ever had a need to
export only certain
procedures from one
user to be recreated
in a different
database or user?
Unlike the
traditional export
utility, Data Pump
allows you to export
only a particular
type of object. For
instance, the
following command
lets you export only
procedures, and
nothing else--no
tables, views, or
even functions:
expdp ananda/iclaim directory=DPDATA1 dumpfile=expprocs.dmp include=PROCEDURE
To export only a few
specific objects--say,
function FUNC1
and procedure
PROC1--you
could use
expdp ananda/iclaim directory=DPDATA1 dumpfile=expprocs.dmp
include=PROCEDURE:\"=\'PROC1\'\",FUNCTION:\"=\'FUNC1\'\"
This dumpfile serves
as a backup of the
sources. You can
even use it to
create DDL scripts
to be used later. A
special parameter
called SQLFILE
allows the creation
of the DDL script
file.
impdp ananda/iclaim directory=DPDATA1 dumpfile=expprocs.dmp sqlfile=procs.sql
This instruction
creates a file named
procs.sql in the
directory specified
by DPDATA1,
containing the
scripts of the
objects inside the
export dumpfile.
This approach helps
you create the
sources quickly in
another schema.
Using the parameter
INCLUDE
allows you to define
objects to be
included or excluded
from the dumpfile.
You can use the
clause INCLUDE=TABLE:"LIKE
'TAB%'" to
export only those
tables whose name
start with TAB.
Similarly, you could
use the construct
INCLUDE=TABLE:"NOT
LIKE 'TAB%'" to
exclude all tables
starting with TAB.
Alternatively you
can use the
EXCLUDE
parameter to exclude
specific objects.
Data Pump can also
be used to transport
tablespaces using
external tables;
it's sufficiently
powerful to redefine
parallelism on the
fly, attach more
tables to an
existing process,
and so on (which are
beyond the scope of
this article; see
Oracle Database
Utilities 10g
Release 1 10.1
for more details).
The following
command generates
the list of all
available parameters
in the Data Pump
export utility:
expdp help=y
Similarly, impdp
help=y will
show all the
parameters in DPI.
While Data Pump jobs
are running, you can
pause them by
issuing STOP_JOB
on the DPE or DPI
prompts and then
restart them with
START_JOB.
This functionality
comes in handy when
you run out of space
and want to make
corrections before
continuing.
|