|
Arup
Nanda is the author of
Oracle Privacy Security Auditing by
Rampant TechPress.
Data Pump: Not Just for Data Moves
Data Pump – introduced in Oracle Database
10g Release 1 – has been described in many different ways: from
“Export/Import from Steroids” to the “Ultimate Data Mover”.
Perhaps the use of superlatives creates a perception that this
is only for data movements. Numerous presentations, articles,
books and even Oracle manuals permeate that perception.
However, Data Pump is much more than just
moving data. While that was probably the original and primary
intent and it still remains the principal usage, the other
subtle but significant capabilities of Data Pump remains under
represented if not undiscovered. In this article you will learn
about those lesser known capabilities and how to use them to
achieve your strategic and tactical objectives. Put into good
use Data Pump can ease a lot of daily and longer term non-data
movement related tasks and provide a viable platform for
alternate use.
This article assumes that the attendee is
familiar with the Data Pump tool. This is not a introduction to
the tool itself.
A Data Pump Primer
Even though the audience is assumed to be
fully conversant with Data Pump, a quick primer will cement that
understanding and pave way for a better understanding of the
material to come. If you are familiar with the basics of the
tool, you may want to skip this section in its entirety.
Data Pump is an evolution of the original
export and import tools; and in fact, it’s also named Data Pump
Export and Import. The purpose is similar – to export data out
of the Oracle database and import back. The exported data is put
in a dumpfile, which is portable across Operating Systems and
even across Oracle versions (10g Release 1 and above). This
export works by selecting the data from the tables using the SQL
interface, similar to the original export; but at a much faster
rate. My tests have shown Data Pump Export to be about 10 times
faster than original export when conditions are right – with
high degrees of parallelism, multiple spindles and so on.
Similarly Data Pump Import is also faster than the original
import, albeit at a lower scale.
The dumpfile produced can only be ready by
Data Pump Import, not the original import. The same holds true
for the reverse process as well, i.e. the dumpfile produced by
the original export is not readable by DP Import. A major
difference in the two tools is the location where the dumpfiles
are produced. In the original export, the dumpfiles are produced
in the same server the tool is called; so if you issue the exp
command at a client machine, such as your laptop, the dumpfile
is produced there. In contrast, the DP Export always produces
the dumpfile on the database server regardless where it is
called. You can specify a location by creating a directory
object and specifying that in the command line.
Finally, for the sake of completeness, the
commands are “expdp” for DP Export and “impdp” for DP Import.
Now that the introduction part is complete,
let’s jump to the main topic – the usefulness of this tool
beyond the data movement.
Regulatory Compliance
Would you want to store the source codes of
the database codes such as stored procedures and DDLs of the
objects? It would have been nice; but probably not strictly
needed, just a fe years ago. But thanks to the myriads of laws
and regulations covering the corporate America today, this has
now become a necessity; not just a nice to have. For instance,
laws like Sarbanes-Oxley (SOX) actually require that you have to
have a system where you can track the changes to object
definitions – commonly referred to as “metadata” in Oracle
technical parlance. Please note, the word metadata has many
different meanings in technical discussions. For instance, the
original meaning of the term is “data about the data”, i.e. the
meaning of the data elements such as columns and tables. For
example, the metadata on the column STATUS in the table EMP
could be: it refers to the status of the employee in the company
at this time and it can have values such as ACTIVE, INACTIVE,
SUSPENDED, etc. This description is commonly referred to as
“metadata”. However, in Oracle-speak, we usually refer to the
definition of the objects, i.e. the DDL statement to create the
table, the source code of the procedures and packages as
metadata. I have used the latter definition of the term.
Most regulations and internal controls in
organizations require that the source code be stores in a
repository which can be controlled and tracked. If the
definitions changed, the system should be able to track it and
report changes. The historical records of the metadata should be
persevered. How do you accomplish this? One way to achieve this
is constructing the sources and DDL from the data dictionary in
regular intervals. For tables, you would select from
DBA_TAB_COLUMNS (for columns), DBA_CONSTRAINTS, and
DBA_CONS_COLUMNS (for constraints), DBA_TAB_PRIVS (for
privileges granted to the table) and many others. For source
code, you will need to select from DBA_SOURCES; te view
DBA_TRIGGERS for triggers, DBA_VIEWS for view definitions and so
on. The other option is extracting the DDL from the dictionary
using DBMS_METADATA.GET_DDL packaged function. The solutions are
technically possible; but may seem infeasible considering the
effort. So, to address that deficiency, several tool
manufactures have jumped into the fray. Needless to say, the
tools do their job; but at a cost.
Enter Data Pump, to accomplish the same
objective. Let’s see how.
Metadata Management
One of the least used parameters of Data
Pump Export is CONTENT, which controls what is exported. It
accepts three parameters:
§
DATA_ONLY
– this option exports the data from tables.
§
METADATA_ONLY – this option does not export the data;
merely exports the metadata of the database objects.
§
ALL
–this is the default; both the metadata and the data (where
applicable) are exported.
This parameter is the one you can use to
create a baseline repository of the database sources and DDLs
very quickly. All you have to do is to use:
expdp directory=dpump_dir dumpfile=metadata.dmp
content=metadata_only
This creates the file
metadata.dmp in the
directory specified by the directlory object dpump_dir
containing all the DDLs and source codes of the database
objects. The file does not contain any data. This is your
baseline repository of the source. If you ever need to reinstate
the source, simply drop it from the database and import back the
source from this baseline:
impdp directory=dpump_dir dumpfile=metadata.dmp
This will import the objects that are not
present in the database now and skip the ones that are there.
In many cases, your database as a whole may
not be subject to the baseline repository building; you may want
to store objects of specific schemas in the repository;
not all. For instance you may want to include the ACCMAIN and
HRMAIN schemas but not the SYSTEM schema. The above command
would be slightly modified in that case:
expdp directory=dpump_dir dumpfile=metadata.dmp
content=metadata_only schemas=ACCMAIN, HRMAIN
By default, all the database objects in the
schema are exported – tables, views, triggers, procedures, and
so on. What if you want to store each object separately in its
individual files? For instance, you may want to store the stored
code such as triggers, procedures, etc. on a file named
metadata_code.dmp and
tables, views, etc. on a file named
metadata_tabs.dmp. In this
case, you can use the INCLUDE parameter to include only certain
objects or types of objects in the export file; not all. The
above command will then turn to:
expdp directory=dpump_dir dumpfile=metadata_code.dmp
content=metadata_only schemas=ACCMAIN, HRMAIN include=PROCEDURE,FUNCTION,PACKAGE,TRIGGER
Here the dumpfile has only the code related
metadata, not all.
You can also specify a specific procedure
or finction to export. Suppose you want to keep a very crucial
function – GET_RATE and a table – RATES, in a separate file –
metadata_imp.dmp, you can
use:
expdp directory=dump_dir dumpfile=metadata_imp.dmp
schemas=ACCMAIN include=FUNCTION:\"=\'GET_RATE\'\",TABLE:\"=\'RATES\'\"
Note, I have added the backslash (“\”)
before the double quotes (“) and the single quotes (‘), just so
that the unix shell will not interpret the characters as
commands. If you use Windows, you will need to remove the back
slashes. Alternatively, a better way is to use a parameter file
where you can list all the parameter, one per line and call the
parameter using the PARFILE parameter. In this case, just create
a file called exp.par with
the following as the contents:
directory=dump_dir
dumpfile=metadata_imp.dmp
schemas=accman
include=FUNCTION:"='GET_RATE'"
include=TABLE:"='RATES'"
Then you can call the Data Pump Export as
$ expdp parfile=exp.par
In the parfile, you can see that I have
placed two lines with the INCLUDE parameter; that’s allowed. I
can place as many INCLUDE parameters as I need. The include
parameter takes the values as
<ObjectType>:"<ObjectList>"
For instance, to include the functions FN1
and FN2, you would use:
include=function:"in ('FN1','FN2')"
or
include=function:"like 'FN%'"
Of course, you can also use this to get all
but some:
include=function:"not in ('FN1','FN2')"
However, a more convenient alternative may
be to use the EXCLUDE option. This is the inverse of INCLUDE and
instructs the expdp command to exclude the objects. For example,
you want to export the meta data for all objects excluding the
functions FN1, FN2 and procedures PROC1 and PROC2. You will use:
exclude=function:"in ('FN1','FN2')"
exclude=procedure:"in ('PROC1','PROC2')"
To include all objects except all the
tables in the export dump:
exclude=table
The syntax is the same as the include
option. Note a very important difference in the usage of the
parameters. Include is used to include a specific objects of a
type; exclude is used to exclude a specific object of a type.
They are not necessarily interchangeable. Consider these two
options:
exclude=function:"in ('FN1','FN2')"
and
include=function:"not in ('FN1','FN2')"
They are not the same. The first
option (the exclude one) exports all the objects –
tables, views, triggers, etc. and leaves out just two functions
– FN1 and FN2. The second option (the include one), on the other
hand, only exports all the functions except FN1 and FN2.
Only the functions are exported, not tables, views, etc. They
are fundamentally different in their behavior. You have to
consider this carefully when you construct your parameter file.
Creating an SQL File
Earlier you learned how to create a
baseline and also selective exports of the database objects for
safe keeping. However, the dumpfiles are not human readable,
i.e. you can’t just glean over the files and get the metadata.
You need another step – to extract the SQL out of the dumpfile.
You can do that using the parameter SQLFILE in the Data Pump
Import. Here is an example:
$ impdp directory=dump_dir dumpfile=metadata_full.dmp
sqlfile=metadata_full.sql
This operation does not actually import the
objects into the database; but creates an SQL script called
metadata_full.sql
containing all the objects in the dumpfile. Here is an excerpt
from the file:
-- CONNECT ARUP
-- new object type path is:
DATABASE_EXPORT/TABLESPACE
CREATE UNDO TABLESPACE "UNDOTBS1"
DATAFILE
'+NDSDATA7/undotbs101.dbf' SIZE
17179869184,
'+NDSDATA8/undotbs102.dbf' SIZE
17179869184,
'+NDSDATA9/undotbs103.dbf' SIZE
17179869184
BLOCKSIZE 8192
EXTENT MANAGEMENT LOCAL AUTOALLOCATE;
CREATE TABLESPACE "SYSAUX" DATAFILE
'+NDSDATA1/sysaux01.dbf' SIZE
4294967296,
'+NDSDATA1/sysaux02.dbf' SIZE
4194304000
NOLOGGING ONLINE PERMANENT BLOCKSIZE
8192
EXTENT MANAGEMENT LOCAL AUTOALLOCATE
SEGMENT SPACE MANAGEMENT AUTO;
CREATE TEMPORARY TABLESPACE "TEMP1"
TEMPFILE
'+NDSDATA1/temp1_01.dbf' SIZE
1073741824
......... and so on ........
You can also apply the filters to extract
the SQL statements for specific objects as well. For instance,
you can extract only the procedures, functions and packages;
nothing else by using the INCLUDE parameter
include=procedure, package
You can include specific objects as well;
not just specific types.
include=procedure:"='PROC1'"
Here is the file that is eventually
produced:
-- CONNECT SYS
-- new object type path is: SCHEMA_EXPORT/PROCEDURE/PROCEDURE
-- CONNECT ARUP
CREATE PROCEDURE "ARUP"."PROC1"
as
begin
dbms_output.put_line ('Some text');
end;
/
-- new object type path is: SCHEMA_EXPORT/PROCEDURE/ALTER_PROCEDURE
ALTER PROCEDURE "ARUP"."PROC1"
COMPILE
PLSQL_OPTIMIZE_LEVEL= 2
PLSQL_CODE_TYPE= INTERPRETED
PLSQL_DEBUG= FALSE
REUSE SETTINGS TIMESTAMP '2006-08-11
13:27:55'
As with the export, you can exclude some
objects as well:
exclude=procedure:"='PROC1'"
Building a Repository
Now that you learned how to extract the
database objects into SQL files, you can build a system or a
rudimentary tool to make the process resemble more like, well, a
process. This can be effected somehow using the a shell script
and creating the dumpfile and the SQL file named based on the
date of creation. Here is how the shell script would look like:
DATEVAR=`date +"%m%d%y"`
DMPFILE=metadata_${DATEVAR}.dmp
SQLFILE=metadata_${DATEVAR}.sql
echo "arup" | expdp arup directory=tmp_dir
dumpfile=${DMPFILE}
echo "arup" | impdp arup directory=tmp_dir
dumpfile=${DMPFILE} sqlfile=${SQLFILE}
This script will create the file in the
format metadata_<mmddyy>.dmp where <mmddyy> is the month, day
and year string. After the creation of the SQL file, you can
remove the dumpfile. I, however, prefer to keep it since it
cannot be created from the SQL file. You can selectively extract
anything from the dumpfile, apply filters; but not so with the
SQL file.
As a part of the maintenance, you may want
to move the created files periodically. Here is a simple command
that moves the files to a name <OriginalName>.old after 30 days.
find . -name "*.dmp" -ctime +30 -exec mv
{} {}.old\;
Or, you can just remove it
find . -name "*.dmp" -ctime +30 -exec rm
{}\;
Using this, you can build a full version
control and repository system in just hours; not months and
without using any tool from outside of what you already has.
This is one of the most powerful yet underutilized capabilities
of the Data Pump.
Create a User Like …
Here is a request you may have received
several times – create a user just like JOE. The requestor
doesn’t provide any other details – no tablespace quotas, no
grants and so on. All the requestor knows that JOE userid works;
so another one like JOE will work as well. How do you take care
of the request?
Of course, there are tools that can do this
quickly; but you may not have them handy. Your only option at
that point is to painstakingly create an SQL file caturing the
user’s properties for data dictionary, all the objects and so
on. While it’s possible, it’s tedious, prone to errors and
sunstantilly time consuming.
Enter Data Pump to the rescue. You can
create the user in just two commands:
$ expdp schemas=JOE content=metadata_only
$ impdp remap_schema=JOE:NEWUSER
The first command (export) is necessary
only if you don’t have a full export dump or baseline now. If
you have been using DP to take your metadata baselines, you
don’t need to perform that step. The second command (the import)
imports the metadata about the user JOE, but transforms it so
that instead of JOE, the user NEWUSER is created. All grants,
privileges, quotas, objects, etc. of JOE now goes to NEWUSER. In
just a matter of minutes you created a user like another
accurately.
If you have been using the repository as
several versions with timestamps as shown in the previous
section, you can even use that to create a user like JOE as of a
certain date. For instance, JOE has DBA privileges now; but
didn’t have them yesterday. Could we have NEWUSER sans DBA role?
Sure; just use the dumpfile as of that day.
Create Tablespaces in Test as in Production
This is another of the challenges you may
face – you have been asked to create a small test database, of
the fraction of the size of the main database; but the
structural elements must be identical. i.e. the tablespaces
should be same, laid out in the same manner, across the same
mount points and so on. How do you quickly do it?
Sure, you could write a PL/SQL loop to get
all the datafile properties from the data dictionary
(dba_data_files and dba_tablespaces) and construct the SQL
script to create the tablespaces. It will work; but will take a
significant amount of time, and, needless to say, is highly
susceptible to errors.
Again, Data Pump comes to rescue. All you
have to do is import the dumpfile and use the INCLUDE option.
$ expdp content=metadata_only
$ impdp include=tablespace
Like the previous case, the first command
(expdp) is not necessary if you already have a dump. Instead of
creating the tablespace directly, you may also want to see the
script by using the SQLFILE option:
$ impdp include=tablespace sqlfile=c.sql
This will create a file called c.sql, which
will have the tablespace creation scripts. You ca edit them, if
needed and run them from SQL*Plus.
A slight variation of the above may be more
practical. In many cases, the test database may not have or need
the size of the production database tablespaces. While creating
the tablespaces, you may want to reduce the size of the
datafiles by a certain percentage, say 90%. One option is to
create a SQL file, edit it and manually change the sizes of all
datafiles to 10% of the production size. While it’s doable, it’s
not elegant as it adds time and risk of error. You can achieve
that in a simple option – TRANSFORM.
$ impdp include=tablespace sqlfile=c.sql
transform=pctspace:10
The parameter
transform=pctspace:10
instructs the IMPDP job to change the size of the datafiles to
10% of the original size. Here is the excerpt from the SQL file
created without this parameter:
CREATE UNDO TABLESPACE "UNDOTBS1"
DATAFILE
'/u01/undotbs101.dbf' SIZE 17179869184,
And here is the excerpt when the parameter
was applied:
CREATE UNDO TABLESPACE "UNDOTBS1"
DATAFILE
'/u01/undotbs101.dbf' SIZE 1717986918,
As you can see, the size with the parameter
(1,717,986,918) is about 10% of the original size
(17,179,869,184). You will learn more about the TRANSFORM
parameter later in this article.
Data File Name Change
You are moving some tables from another
database and the tablespace in which these tables exist is also
new in the target database. Importing those tables will create
the tablespaces in the target database too; but here is a small
problem – the filesystems are different. For example, the
datafile for tablespace was “/u01/data1.dbf” but the target
database does not have a mount point called /u01; instead it has
“/u02”. Ordinarily you would have to create the tablespace
manually and then import the table into the tablespace.
Data Pump eliminates the extra step in one
elegant motion. All you have to do is use the REMAP_DATAFILE
option as follows:
$ impdp remap_datafile=‘/u01/data1.dbf':'/u02/data1.dbf'
This creates the same datafile as
/u02/data1.dbf. you can use this option to recreate the test
data even across different platforms. For instance, the
following option in the option converts datafiles from a unix to
Windows format.
remap_datafile='/u01/data1.dbf':'c:\oradata\data1.dbf'
Segment Transforms
This is also a common problem in creating a
test database from production. The test database may be very
small compared to the production volume. Therefore, when you
export and import the tables, they may fail if the initial
extents are defined too large to fit in the test database.
Instead of creating the table manually prior to importing, you
will find Data Pump doing that work for you. You can remove the
storage clauses of the tables, MVs etc. using a simple parameter
TRANSFORM to impdp. The parameter has the following format:
transform=segment_attributes:[N/Y]:<ObjectType>
<ObjectType> could be “table”. For
instance, to include the storage clause for tables, use it like
transform=segment_attributes:n:table
When put it in impdp, the table creation
does not include physical attributes, storage attributes,
tablespaces and logging features. Here is a small example to
illustrate the concept. First do a normal import:
$ impdp arup/arup directory=tmp_dir
dumpfile=metadata_full.dmp sqlfile=trans.sql
include=table:\"=\'TRANS\'\"
Here is the DDL as shown in the SQL file
trans.sql:
CREATE TABLE "ARUP"."TRANS"
( "TRANS_ID" NUMBER,
"TRANS_DT" DATE,
"PRODUCT_CODE" NUMBER,
"STORE_ID" NUMBER,
"TRANS_AMOUNT" NUMBER(12,2)
) PCTFREE 10 PCTUSED 40 INITRANS 1
MAXTRANS 255
STORAGE(
BUFFER_POOL DEFAULT)
TABLESPACE "USERS"
PARTITION BY RANGE ("TRANS_DT")
(PARTITION "Y05Q1" VALUES LESS THAN
(TO_DATE(' 2005-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))
PCTFREE 10 PCTUSED 40 INITRANS 1
MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576
MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST
GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "USERS" NOCOMPRESS ,
PARTITION "Y05Q2" VALUES LESS THAN
(TO_DATE(' 2005-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))
PCTFREE 10 PCTUSED 40 INITRANS 1
MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576
MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST
GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "USERS" NOCOMPRESS ,
PARTITION "Y05Q3" VALUES LESS THAN
(TO_DATE(' 2005-10-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))
PCTFREE 10 PCTUSED 40 INITRANS 1
MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576
MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST
GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "USERS" NOCOMPRESS ,
PARTITION "Y05Q4" VALUES LESS THAN
(TO_DATE(' 2006-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))
PCTFREE 10 PCTUSED 40 INITRANS 1
MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576
MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST
GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "USERS" NOCOMPRESS ,
PARTITION "PMAX" VALUES LESS THAN (MAXVALUE)
Now, use the same command using the
transform option:
$ impdp arup/arup directory=tmp_dir
dumpfile=metadata_full.dmp sqlfile=metadata_full.sql
include=table:\"=\'TRANS\'\" transform=segment_attributes:n:table
If you see the SQL file:
CREATE TABLE "ARUP"."TRANS"
( "TRANS_ID" NUMBER,
"TRANS_DT" DATE,
"PRODUCT_CODE" NUMBER,
"STORE_ID" NUMBER,
"TRANS_AMOUNT" NUMBER(12,2)
)
PARTITION BY RANGE ("TRANS_DT")
(PARTITION "Y05Q1" VALUES LESS THAN
(TO_DATE(' 2005-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS',
'NLS_CALENDAR=GREGORIAN'))
NOCOMPRESS ,
PARTITION "Y05Q2" VALUES LESS THAN
(TO_DATE(' 2005-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))
N
OCOMPRESS ,
PARTITION "Y05Q3" VALUES LESS THAN
(TO_DATE(' 2005-10-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))
N
OCOMPRESS ,
PARTITION "Y05Q4" VALUES LESS THAN
(TO_DATE(' 2006-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS',
'NLS_CALENDAR=GREGORIAN')) N
OCOMPRESS ,
PARTITION "PMAX" VALUES LESS THAN (MAXVALUE)
NOCOMPRESS ) ;
Note how the storage options such as
initial/next/max extents, buffer pool, etc. have been removed,
even from the partitions. This way there is no conflict with the
target database’s storage parameters and the table can be
created easily.
If you want to remove storage parameters
from all objects, not just tables, leave the third part of the
value in the TRANSFORM option:
transform=segment_attributes:n
Reducing Size
On the same line as the above, suppose you
want to still keep the initial extent clause in the CREATE TABLE
script but want to make it much smaller than it’s in production
database. Ordinarily you would have done it using the more
laborious manual approach, i.e. creating the SQL file,
calculating the desired initial extent and editing the SQL file
to plug in those numbers. This may be technically possible but
may be infeasible.
A special modifier “PCTSPACE” in the
TRANSFORM parameter shown above does the trick. Specifying a
PCTSPACE puts the initial extent size as that much percentage of
the production database. Here is how you can specify a 10% size
of the initial extent:
$ expdp transform=pctspace:30
tables=accounts
In the above case, if the original INITIAL
extent of the table ACCOUNTS was 100M, the import creates the
table with INITIAL as 10M, or 10% of the original.
Sub-setting a Table
This is also a very common request – create
a sub-set of a table in production in the test database so that
the tests can be done against it. There could be several reasons
why the full table couldn’t be used – the most common being lack
of space in test database. So, how can you create a smaller
portion of the table, say, 10% of the original rows?
There are two options while creating a
sub-set of the table:
1.
Take a random sample of 10% of the rows
2.
Take the specific rows that satisfy some criteria
Random Sample
To export a random sample of 10% of the
rows of all the tables, use the SAMPLE parameter in expdp
command as shown below:
$ expdp SAMPLE=10
This takes a sample from all the tables.
Suppose you want to take a sample from the table ACCOUNTS in
ARUP schema only:
$ expdp SAMPLE=ARUP.ACCOUNTS:10
Rows Satisfying Some Condition
Suppose you want to create a subset of the
table based on some criteria, e.g. “SALARY>10000”, you would
issue
$ expdp query=employees:"where
salary>10000" tables=employees
This can also take the ORDER BY clause to
create the dumpfile in a sorted order. Suppose you want to dump
the EMPLOYEES table order by SALARY, here is how the complete
command looks like (with the unix required escape characters –
backslahes):
$ expdp arup/arup directory=demo_dir
dumpfile=employees.dmp query=employees:\"where salary\>10000\
order by salary" tables=employees
If you already have a full dump, you may
want to import from that instead of creating a subset of the
table in export. You can also use the QUERY parameter in impdp
as well.
$ impdp QUERY=CUSTOMERS:“WHERE
TOTAL_SPENT > 10"
This can also take the ORDER BY clause.
This approach can be used to quickly load parts of the
production table to test database. Please note that this does
not take care of referential integrity constraints, i.e. the
loaded rows do not necessarily satisfy all the foreign key
relationships. For instance, if you import 10% of the tables
DEPARTMENTS and EMPLOYEES, there is no guarantee that all the
department number of the employees in the 10% of the rows of the
EMPLOYEES table will be in the 10% of the DEPARTMENTS table. So,
the import will fail. In such a case, you can either disable the
constraint (not a very useful option) or use the QUERY parameter
to select specific rows.
Refresh a Table Definition
You maintain two different databases, say,
Development and Integration. From time to time you want to make
sure the definitions of the tables are in sync between the two
databases. And suddenly you realize that the databases may be
out of sync in the structure of the objects. How can you refresh
the table in INT so it matches the copy in DEV? Of course you
can drop and recreate all the objects; but that’s a task easier
said than done. You need to create the SQL script of all the
objects – tables, views, procedures, triggers, grants made and
received and so on and so forth. You could just drop everything
and recreate; but then that means dropping something that may be
in INT only and that will not be recreated.
As always, you can use Data Pump to achieve
this in just one command. Just take an export of the metadata
only. In this example, we assume that you want to refresh only
the table ACCOUNTS. After the export, just import the table with
the option table_exists_action set to “replace”, which drops the
table and recreate it from the metadata contained in the
dumpfile.
$ expdp tables=ACCOUNTS content=metadata_only
$ impdp full=y table_exists_action=replace
You can also use this technique to
reinstate the copy of the table from the repository as of a
certain date. While importing you may also want to modify the
storage parameters of the table by using the TRANSFORM parameter
as the storage parameters may not be adequate for the target
database.
Changing Table’s Owner
This is not a everyday case but it does
happen. Someone created a table in the wrong schema – SCOTT,
instead of ACCMAIN. Subsequent operations have placed dependent
objects on it – triggers, constraints, indexes, etc.; and SCOTT
has granted privileges to others. When the mistake was realized,
it was too late to make a quick change. What can you do to
quickly change the owner.
Unfortunately, in Oracle there is no such
thing called changing the owner of the table. The only option is
a series of steps:
1.
create the SQL to create table;
2.
create the table in the new schema;
3.
create all grants, triggers, constraints and so on;
4.
export the table,
5.
import into the old schema
6.
drop the table in the old schema
Although feasible, this series of steps is
very laborious and error-prone. Of course there are tools to do
this; but again, we assume that we don’t have access to these
tools. Wedo have access to, however, the Data Pump tool, which
accomplishes the task in just one line, using the parameter “remap_schema”.
$ impdp remap_schema=“OLDU:NEWU”
network_link=maindb directory=…
Here you changed the owner from OLDU to
NEWU. Here is an important point to note here – the use of the
parameter “network_link”. It’s a database link created pointing
to the same database. The use of this parameter saves us from
running an expdp to create a dumpfile. The parameter lets impdp
get the metadata across the database link and bypass the
dumpfile creation step.
External Tables
External tables are not new in the
database; they were first introduced in Oracle 9i. In essence,
they are text files outside the database, but are visible to the
database users as simple tables. When users select from the
table, the actual text file is read and the rows returned as if
they are rows of a regular table. The table can only be queried;
not updated.
You can also create an ASCII file from a
table or set of tables and create an external table on that file
to make it visible to the applications. This comes handy while
sending some data form one system to another. Since the external
tables do not need to be loaded in the database to be visible,
this technique makes the data immediately available to the end
users and relives stress on the database. The real challenge is
how to make this file that will be eventually be used as an
external table. You can use the simple SQL*Plus to select all
the columns and spool to a file. This approach is the simplest
but also the slowest. The other option is to use OCI or Pro*C
program to create these files. This makes the file creation
quicker; but also adds a lot of time to the process.
Many people are not aware of the fact that
you can use Data Pump to create the external table file. This
file is not ASCII; it’s binary; but it’s readable across
platforms. Let’s see an example: Here we are creating a file
trans_ext.dmp to be
used as an en external table from the table TRANS.
create table trans_ext (
trans_id,
trans_dt,
product_code,
store_id,
trans_amount
)
organization external
(
type oracle_datapump
default directory tmp_dir
location ('trans_ext.dmp')
)
as
select * from trans
order by trans_id;
Once the table is created, you can use the
file as an external table – trans_external – as:
create table trans_external (
trans_id number,
trans_dt date,
product_code number,
store_id number,
trans_amount number(12,2)
)
organization external
(
type oracle_datapump
default directory tmp_dir
location ('trans_ext.dmp')
);
And that’s it; the file is now available as
an external table. You would select from it as you would select
from a regular table:
select *
from trans_external;
So, how does all this make it any better?
The user can always select from the original table TRANS,
instead of TRANS_EXTERNAL; so there does not seem to be any
advantage in using this approach.
The advantage comes when the user does not
have direct access to the database. In order to send the data to
a remote user, you have to send it using Data Pump Export. At
the other end, the user must import it into the database before
he can start using it, which requires some technical expertise.
On the other hand, the external table approach takes no
technical expertise; the user can simply make the file available
to the same server as the database runs on, using ftp, copying
the file or something as trivial as mounting the CD on the
CD-ROM. As soon as the server sees the file, the external table
is immediately visible.
Uses of External Tables
Offline Receiver
– in the cases where the receiver of the data is not online with
the database, this is the best way to send data without strong
technical expertise. The common example is a publication
industry, which sends out period updates to the data it sells.
Instead of importing the new data, the publisher merely creates
an external table file, burns a CD with the file an sends it to
the readers. The application on the database reads the external
table. When the reader pops in the new CD, the application
immediately sees the new data; no data manipulation need occur.
Since the data in the external table can’t be written to, the
system provides a read only mechanism to the process.
Ease of Creation
– You don’t have to write any code to create the files.
Ordering and Grouping
– While creating the external table files, you can use WHERE,
ORDER BY and GROUP BY clauses, thereby sorting and grouping
data. This approach can be used in ETL processes where the data
can be loaded without an intermediate sorting/grouping step. One
of the best uses of this technique is the creation of Index
Organized Tables, described in a sub-section below.
Faster Loading –
The regular Data Pump Import still goes through the SQL layer,
making it somewhat slower. The fastest way to load a table is
still the APPEND method. So, you could use Data Pump export to
create the external table file, create external table and load
into the main table using the APPEND hint:
insert /*+ append */ into trans
select * from trans_external;
This allows the Direct Path Insert,
bypassing the SQL processing layer and minimizing undo and redo
and significantly improving the processing time.
Creating IOTs –
An Index Organized Table is a table built on a primary key
index. In a normal PK index, only the PK columns and the pointer
to the corresponding row in the table are stored. In the IOT,
the entire row is stored along with the PK index columns. When a
user queries a normal table using the PK columns in the WHERE
condition, Oracle first gets the rowid from the PK index and
gets the row from the table – a two step operation. In an IOT,
since the entire row is stored along with the PK, the second
step is eliminated.
When you create an IOT, the rows can be
loaded in any manner; but if the rows are already in a sorted
format, the sorting from the IOT is considerably faster. This
can be easily done in Data Pump using the external table
approach. While creating the external table files, just use the
ORDER BY clause on the PK columns and then load them; the rows
will be in sorted order.
General Tips ‘n Tricks
Parallelizing
To make DP Exports run faster, you can use
the parallel option to execute the job in parallel. However, if
you do use the parallel option, make sure you have that many
files as parallel degree; otherwise the parallelism will not
take effect.
$ expdp tables=CASES directory=DPDATA1
parallel=4 dumpfile=\(expCASES1.dmp,epCASES2.dmp,expCASES3.dmp,expCASES4.dmp\)
If you do not wan to name the files
individually, you can use the %U wildcard.
$ expdp tables=CASES directory=DPDATA1
dumpfile=expCASES_%U.dmp parallel=4
Monitoring
One of the often asked questions is how to
monitor the performance of the DP jobs that are running now.
There are a few ways you can do it. One is identifying and
monitoring the Oracle sessions of the DP jobs. The information
about the Data Pump jobs are found in the view
DBA_DATAPUMP_SESSIONS.
select sid, serial#
from v$session s, dba_datapump_sessions d
where s.saddr = d.saddr;
Once you identify the session, you can see
the parallel query slave sessions spawned off by the main DP
session by querying the V$PX_SESSION view.
select sid from v$px_session
where qcsid = <DP Session ID>;
When DP jobs run, they post important
runtime information in the view V$SESSION_LONGOPS. You can view
all that by issuing:
select sid, serial#, sofar, totalwork
from v$session_longops
where opname = '<DP Job Name>'
and sofar != totalwork;
Troubleshooting
Like anything else, DP sometimes does not
work the way it is supposed to, or perceived to –leading to the
next section – Troubleshooting. Many problems can be easily
diagnosed using the TRACE on the tool’s innards. There are two
different types of trace on Data Pump. The trace built into the
tool itself; and the SQL trace on the sessions of DP. Let’s see
how each one works.
Data Pump TRACE
The trace is given as a parameter in the
command line of the tool in the following format:
TRACE=<CompID>0300
The <CompID> is the component to trace and
accepts the following values:
1FF –
Full Tracing
048 –
Standard Tracing
In many cases the Standard tracing provides
all the information you want for a detailed troubleshooting; but
sometimes, and usually at the direction of the Oracle Support
Services, you may want to enable the Full tracing. Here is an
example of the standard tracing.
$ impdp trace=0480300 schema=... and
so on ...
This produces a trace file in the
user_dump_dest directory, which shows in detail different
commands and where they are failing, if at all they are failing.
SQL Trace
This is the usual SQL Tracing you may have
been familiar with; however there is a small challenge. Since DP
jobs kick off the sessions themselves, you don’t have an
opportunity to set the trace in the session by specifying “alter
session set sql_trace = true”. Instead, you must enable an event
in the session from outside using the dbms_system.set_ev
procedure.
First, get the SID and Serial# of the Data
Pump session:
select sid,serial#,username,program
from v$session
where upper(program) like '%(DW%)'
or upper(program) like '%(DM%)';
Then trace the session (place the SID and
Serial# below)
SQL> exec dbms_system.set_ev(<SID>,
<Serial#>, 10046, 12, '')
This will produce a trace file in the
user_dump_dest directory of the database server.
Conclusion
Data Pump, although originally designed to
be a tool to move a lot of data quickly and across multiple
database versions and platforms, has some quite useful
functionalities much beyond that. In this article you learned a
few of those non-traditional uses of the tool that prove
extremely useful in the long and short term objectives, without
the use of any other expensive tool, whether from Oracle or
another third party. The capabilities of the tool are not
limited to those listed in the article; but only to your
imagination
|