ORACLE MAGAZINE
By Kimberly Floss
Use Oracle Database 10g Data
Pump for fast, flexible export and import.
With databases approaching the
double-digit terabyte range increasingly common these days,
Oracle DBAs can spend a fair amount of time shuttling vast
amounts of data as part of their daily operations. If we're not
trying to keep a complete physical backup current for disaster
recovery purposes, we're busy feeding developers their weekly or
daily diet of refreshed production data to use in development
and test environments or populating a read-only data warehouse
at some satellite location. As DBAs, we've developed numerous
scripts over the years to facilitate all these processes and
we've tuned our scripts to perform as well as they can. But
Oracle Export (and Oracle Import, especially) still takes too
long.
Fortunately, Oracle Data Pump—Oracle
Database 10g's new infrastructure for exporting and
importing database data and metadata—speeds up movement of all
of this data. This new server-side infrastructure is exposed
through a new PL/SQL package,
DBMS_DATAPUMP,
which you use in one of three basic ways: through two new
command-line client tools (expdp, impdp); through Oracle
Enterprise Manager; or through your own procedure code, with
calls to the application programming interface (API).
At first glance, the command-line clients
expdp and impdp may remind you of the exp and imp utilities, but
any similarity is superficial: These are both simply new client
tools for working with the completely new Oracle server
processes and related infrastructure. Let's start by taking a
quick look at some of the high points of that infrastructure,
before turning to a couple of usage scenarios.
Enabling Restart and
Remote Access
The Oracle Data Pump Export and Import
facility leverages key Oracle technology, including Advanced
Queuing (AQ), the metadata API (DBMS_METADATA),
and direct path, for example (see Figure 1). Here are a few of
the key components of the Oracle Data Pump facility:

Figure 1: Oracle Data Pump architecture
-
A
master control process
that manages the export and import process as jobs you
define. Each job has its own master control process that
gets instantiated when you start the job. One of the first
things a master control process does is to spawn several
other important facilities, including worker processes, and
populate a master table.
-
The
worker processes
unload or load metadata and data, as required by the
user-specified job parameters. Worker processes use the
Metadata API for all metadata loading and unloading. The
number of worker processes the master control process spawns
depends on the degree of parallelism you've selected (for
Oracle Database 10g Enterprise Edition—parallelism
isn't provided in Oracle Database 10g Standard
Edition).
-
A
master table maintains job
state, description, restart, dumpfile, and other information
as a job is under way—the master control process and all
worker processes feed this data into the master table as
operations occur.
The master table matches the job name you
pass as a parameter (if you gave it a name when you started the
job) or uses a system-generated name.
As part of job completion, the master
table's contents are written to the dumpfile. A pointer to the
master file data is thus at the head of the dumpfile and is used
to reconstitute the data structures upon import. Oracle Data
Pump infrastructure deletes the master table upon successful
completion of the Data Pump job.
If there's a system failure of some
kind—a disk fills to capacity during writing, for example—or if
you decide to stop the job so that it doesn't interfere with a
critical production job, then the table (and, more important,
all the job status information detail the table contains)
remains, enabling you to resume the job later (see Listing 1 for
the master table structure).
Code Listing 1:
Master table structure
SQL> desc kimberly.dev_refresh_job
Name Null? Type
-------------------------------------- ------- --------------------------
PROCESS_ORDER NUMBER
DUPLICATE NUMBER
DUMP_FILEID NUMBER
DUMP_POSITION NUMBER
DUMP_LENGTH NUMBER
DUMP_ALLOCATION NUMBER
COMPLETED_ROWS NUMBER
ERROR_COUNT NUMBER
ELAPSED_TIME NUMBER
OBJECT_TYPE_PATH VARCHAR2(200)
OBJECT_PATH_SEQNO NUMBER
OBJECT_TYPE VARCHAR2(30)
IN_PROGRESS CHAR(1)
OBJECT_NAME VARCHAR2(500)
OBJECT_LONG_NAME VARCHAR2(4000)
OBJECT_SCHEMA VARCHAR2(30)
ORIGINAL_OBJECT_SCHEMA VARCHAR2(30)
PARTITION_NAME VARCHAR2(30)
SUBPARTITION_NAME VARCHAR2(30)
FLAGS NUMBER
PROPERTY NUMBER
COMPLETION_TIME DATE
OBJECT_TABLESPACE VARCHAR2(30)
SIZE_ESTIMATE NUMBER
OBJECT_ROW NUMBER
PROCESSING_STATE CHAR(1)
PROCESSING_STATUS CHAR(1)
BASE_PROCESS_ORDER NUMBER
BASE_OBJECT_TYPE VARCHAR2(30)
BASE_OBJECT_NAME VARCHAR2(30)
BASE_OBJECT_SCHEMA VARCHAR2(30)
ANCESTOR_PROCESS_ORDER NUMBER
DOMAIN_PROCESS_ORDER NUMBER
PARALLELIZATION NUMBER
UNLOAD_METHOD NUMBER
GRANULES NUMBER
SCN NUMBER
GRANTOR VARCHAR2(30)
XML_CLOB CLOB
NAME VARCHAR2(30)
VALUE_T VARCHAR2(4000)
VALUE_N NUMBER
IS_DEFAULT NUMBER
FILE_TYPE NUMBER
USER_DIRECTORY VARCHAR2(4000)
USER_FILE_NAME VARCHAR2(4000)
FILE_NAME VARCHAR2(4000)
EXTEND_SIZE NUMBER
FILE_MAX_SIZE NUMBER
PROCESS_NAME VARCHAR2(30)
LAST_UPDATE DATE
WORK_ITEM VARCHAR2(30)
OBJECT_NUMBER NUMBER
COMPLETED_BYTES NUMBER
TOTAL_BYTES NUMBER
METADATA_IO NUMBER
DATA_IO NUMBER
CUMULATIVE_TIME NUMBER
PACKET_NUMBER NUMBER
OLD_VALUE VARCHAR2(4000)
SEED NUMBER
LAST_FILE NUMBER
USER_NAME VARCHAR2(30)
OPERATION VARCHAR2(30)
JOB_MODE VARCHAR2(30)
CONTROL_QUEUE VARCHAR2(30)
STATUS_QUEUE VARCHAR2(30)
REMOTE_LINK VARCHAR2(4000)
VERSION NUMBER
DB_VERSION VARCHAR2(30)
TIMEZONE VARCHAR2(64)
STATE VARCHAR2(30)
PHASE NUMBER
GUID RAW(16)
START_TIME DATE
BLOCK_SIZE NUMBER
METADATA_BUFFER_SIZE NUMBER
DATA_BUFFER_SIZE NUMBER
DEGREE NUMBER
PLATFORM VARCHAR2(100)
ABORT_STEP NUMBER
INSTANCE VARCHAR2(60)
The ability to restart an Oracle Data
Pump Export or Import job—and to do so remotely—is just one of
the many significant capabilities of Oracle Data Pump that the
legacy Oracle Export and Oracle Import don't provide. Other
capabilities include support for parallelism during the import
and export processes and the ability to change the degree of
parallelism while a job is running.
Fine-Grained Object
Selection
Oracle Data Pump provides more
functionality than simply bulk export and import of data,
however. It lets you define to a very granular degree the
schemas, tablespaces, and other objects you want to export or
import by using the
INCLUDE, EXCLUDE, and
QUERY
parameters at various stages of the process to filter or
transform the data.
Using the original Oracle Export and
Oracle Import utilities, DBAs have to set up convoluted schemes
to confine a specific export process to specific database
objects. For example, you might set access policies (using
role-based access control) on tables differently to allow a
batch export process to obtain just the tables it needs. But
using policies in this way is not the type of application for
which the Oracle fine-grained access control (also known as
role-based security) was intended.
Workarounds such as this aren't needed
with Oracle Data Pump, which provides a fine-grained object
selection feature for very granular selection without any
performance impact on the source system, because the selection
happens only at export or import time, depending on how you've
set it up. (See Jonathan Gennick's "Supercharging
the Pump" in the March/April 2004 issue of
Oracle Magazine
for a great overview of using Oracle Data Pump via command-line
clients to export, import, check status, and restart a job, as
well as filter data in both directions with the
EXCLUDE, INCLUDE,
and QUERY
capabilities.)
Export and Import Over
the Network
Typically, Oracle Data Pump Export
results in a binary format dumpfile that is then loaded (using
Oracle Data Pump Import) into another database. However, in
addition to this file-based Oracle Data Pump Export and Import,
Oracle Data Pump provides a network-based approach for direct
loading to or unloading from one database to another.
Most DBAs are probably familiar with
using the legacy Oracle Export and Import utilities over a
network, using named pipes as the mechanism through which to
pass the data. One of the problems with that approach can be
that named pipes aren't supported on every platform.
Instead of using named pipes, Oracle Data
Pump uses database links for exporting and importing data
between source and target over the network. And, as with the
named pipe mechanism, the network_link feature entirely bypasses
dumpfile creation, transmitting data from one database to
another across the network.
This feature should be of keen interest
to any DBAs spending lots of time doing production-to-test
refreshes for development or extracting data from a read-only
data warehouse. (For a read-only database, you must use Oracle
Data Pump Export's network_link mechanism, initiating the
process on the target database, because Oracle Data Pump won't
be able to create the necessary master table on the read-only
database.)
Because network bandwidth is the limiting
factor, the network_link capability is best used in conjunction
with the filtering capability, as a means of siphoning off a
subset of data (rather than an entire multigigabyte or -terabyte
database) from one database to another.
Here's how to set up an Oracle Data Pump
Import, using a network-based approach (see Figure 2):

Figure 2: Oracle Data Pump uses database
links. To export tables from a read-only database, it creates a
master table on the read-write database.
- Set up a remote
link from a satellite database (database 2) to the master
database (database 1).
- Create the
import as a SQL script to run weekly from the satellite
database to import selected tables from the source system,
filtering and transforming as needed (see Listing 2).
Code Listing 2: Sample script using
DBMS_DATAPUMP
API for import
DECLARE
ind NUMBER; — Loop index number
jobhandle NUMBER; — Data Pump job handle
percent_done NUMBER; — Percentage of job complete
job_state VARCHAR2(30); — Keeps track of job state
le ku$_LogEntry; — work-in-progress and error messages
js ku$_JobStatus; — Job status from get_status
jd ku$_JobDesc; — Job description from get_status
sts ku$_Status; — Status object returned by get_status
BEGIN
— Create a (user-named) Data Pump job to do a "table-level" import
— using a network link
jobhandle := DBMS_DATAPUMP.OPEN('IMPORT','TABLE', 'RM_TECHLAB', 'DEV_REFRESH');
— Set parallelism to 1— network import/export always uses 1 anyway, may not
— be necessary to explicitly set...
DBMS_DATAPUMP.SET_PARALLEL(jobhandle, 1);
— Remap schema objects from source named BATCH_JOBS to target KFLOSS.
DBMS_DATAPUMP.METADATA_REMAP(jobhandle,'REMAP_SCHEMA','BATCH_JOBS','KFLOSS');
— If a table already exists in the destination schema, don't override (this
— is default behavior.)
DBMS_DATAPUMP.SET_PARAMETER(jobhandle,'TABLE_EXISTS_ACTION','SKIP');
— Use statistics (rather than blocks) to estimate time.
DBMS_DATAPUMP.SET_PARAMETER(jobhandle,'ESTIMATE','STATISTICS');
— Start the job. An exception is returned if something is not set up properly.
DBMS_DATAPUMP.START_JOB(jobhandle);
— The import job should now be running. This loop monitors the job until
— it completes, displaying progress information if there are problems
percent_done := 0;
job_state := 'UNDEFINED';
while (job_state != 'COMPLETED') and (job_state != 'STOPPED') loop
DBMS_DATAPUMP.get_status(jobhandle,
DBMS_DATAPUMP.ku$_status_job_error +
DBMS_DATAPUMP.ku$_status_job_status +
DBMS_DATAPUMP.ku$_status_wip,-1,job_state,sts);
js := sts.job_status;
— As the percentage-complete changes in this loop, the new value displays.
if js.percent_done != percent_done
then
DBMS_OUTPUT.PUT_LINE('*** Job percent done = ' || to_char(js.percent_done));
percent_done := js.percent_done;
end if;
— Displays any work-in-progress or error messages received for the job.
if (bitand(sts.mask,dbms_datapump.ku$_status_wip) != 0)
then
le := sts.wip;
else
if (bitand(sts.mask,dbms_datapump.ku$_status_job_error) != 0)
then
le := sts.error;
else
le := null;
end if;
end if;
if le is not null
then
ind := le.FIRST;
while ind is not null loop
DBMS_OUTPUT.PUT_LINE(le(ind).LogText);
ind := le.NEXT(ind);
end loop;
end if;
end loop;
— When the job finishes, display status before detaching from job.
DBMS_OUTPUT.PUT_LINE('Job has completed');
DBMS_OUTPUT.PUT_LINE('Final job state = ' || job_state);
DBMS_DATAPUMP.DETACH(jobhandle);
END;
/
When you run the script on the satellite
system, it connects to the source system and brings the data
over directly to the target. The master table is created in the
schema of the user on the initiating system and not in the
database source. This approach works well for read-only
databases and also as a means of refreshing development
databases with extracts from a production system.
Data Pump Is Fast and
Flexible
Oracle Data Pump imports are anywhere
from 15 to 40 times as fast as with traditional import, even
with parallelism set to 1, in part because of Oracle Data Pump's
use of direct path (when possible).
Speed improvements are always welcome,
but so are improvements such as the ability to restart a job and
get the status of export and import jobs from anywhere. And the
rich API and its use of
DBMS_METADATA
lets DBAs develop comprehensive automated utilities for a
variety of data movement scenarios.
Kimberly Floss
(kimberly_floss@ioug.org).
president of the
International Oracle Users Group,
specializes in Oracle tuning techniques. She is the author of
Oracle SQL Tuning and CBO
Internals (Rampant
TechPress).
|