by
Dave Moore
Here's
a step-by-step
example of creating
an external table
and querying the
data source from
within Oracle along
with a discussion of
practical
applications for
external tables,
performance and
management issues.
ORACLE
has many new
features, and one of
my favorites is the
ability to create
external tables. An
external table is a
table whose
structure is defined
inside the database
even though its data
resides externally
as one or more files
in the operating
system (see Figure
1). External tables
are very similar to
regular tables in
Oracle, except the
data isn't stored in
Oracle datafiles and
isn't managed by the
database.

Example
This
example begins with
product information
listed in a
Microsoft Excel
spreadsheet (see
Figure 2). The data
is saved in
comma-separated
values (CSV) format
to D:\products\products.csv.
The spreadsheet
contains three
columns: Product
Number, Description,
and Price. This file
contains the data
that we'll query
from Oracle.

Figure
2: Product data in
Excel.
After
saving the file from
Excel, the next task
is to create a
DIRECTORY object in
Oracle that points
to the physical
operating system
directory that
contains the file.
This DIRECTORY is
required in order to
create the external
table.
SQL>
CREATE DIRECTORY
PRODUCT_DIR AS
'd:\products';
Directory created.
Now
the external table
is created by using
the CREATE TABLE
command:
create table products (
product_no number,
description varchar2(100),
price varchar2(20)
)
organization EXTERNAL (
type oracle_loader
default directory PRODUCT_DIR
access parameters
( records delimited by newline
badfile 'products.bad'
logfile 'products.log'
fields terminated by ','
)
location ('products.csv')
)
reject limit unlimited
/
The
first part of the
CREATE TABLE
statement holds no
surprises. Notice,
however, that the
next part of the
statement specifies
ORGANIZATION
EXTERNAL, which
indicates that this
table is an external
table. This part of
the statement also
specifies a type of
oracle_loader-the
only one currently
supported by Oracle.
Oracle_loader is
actually an oracle
TYPE object defined
in the database to
handle the
processing. Also
notice that the
directory object is
part of the CREATE
TABLE statement; it
tells Oracle where
to find the files.
The
next part of the
statement specifies
the access
parameters, which
should look familiar
to anyone who's
experienced with
SQL*Loader:
- records
delimited
by specifies the
characters that
will be used to
separate rows.
- badfile
specifies the
file that Oracle
will use to
store the
rejected rows.
- logfile
specifies the
file that Oracle
will use to
store log
information.
Documentation of
any errors will
be provided in
this file.
- fields
terminated by
specifies the
field separator
that will
distinguish one
column from
another during
the load.
Finally,
the location and
reject limit are
specified:
- location
provides the
name of the
actual file to
access. If
Oracle needs to
access multiple
files, they can
be specified as
follows:
location
('file1.dat',
'file2.dat')
- reject
limit
specifies the
number of rows
that can be
rejected before
the command
returns an
error. If this
threshold is
reached, the
following error
appears when
trying to access
the table:
ERROR at line 1:
ORA-29913: error in executing ODCIEXTTABLEFETCH callout
ORA-30653: reject limit reached
ORA-06512: at "SYS.ORACLE_LOADER", line 14
ORA-06512: at line 1
The
DDL for creating the
statement will run
even if the file
doesn't exist in the
system, which can
produce mixed
results. On the one
hand, you won't know
whether the table
was successfully
created until a
statement is
executed against the
table, which in a
data-warehousing
environment might be
at 3:00 a.m.
Conversely, the file
doesn't have to
exist at the time
the table is
created. In fact,
the file can come
and go as needed,
which is quite
customary in OLAP
environments.
The
external table is
now created.
However, if another
user tries to access
the table at this
point, that user
will receive an
error:
SQL> select count(*) from dave.products;
select count(*) from dave.products
*
ERROR at line 1:
ORA-06564: object PRODUCT_DIR does not exist
To
prevent this error,
you must grant read
and write access on
the directory for
any user who wants
to select data from
the table. Granting
SELECT on the table
itself will allow
the object to be
seen, but you must
also grant access to
the underlying
directory object.
grant
read, write on
directory
products_dir to alex;
Listing
1: Querying the
table.
SQL> select product_no, substr(description,1,40) "Desc", Price from products;
PRODUCT_NO Desc PRICE
---------- -------------------------------------- ---------------
12300 Robin Yount Autographed Baseball $29.99
12301 George Brett Autographed Baseball $19.99
12302 Dale Murphy Autographed Baseball $19.99
12303 Paul Molitor Autographed Baseball $19.99
12304 Nolan Ryan Autographed Baseball $19.99
12305 Craig Biggio Autographed Baseball $19.99
12306 Jeff Bagwell Autographed Baseball $19.99
12307 Barry Bonds Autographed Baseball $19.99
12308 Mark McGuire Autographed Baseball $19.99
12309 Sammy Sosa Autographed Baseball $19.99
12310 Jeff Kent Autographed Baseball $19.99
12311 Roger Clemens Autographed Baseball $19.99
12312 Goose Gossage Autographed Baseball $19.99
12313 Derek Jeter Autographed Baseball $19.99
Read/write
access means that
Oracle will be
allowed to write to
that directory when
it needs to update
the logfile or
badfile. As an OS
user, you don't have
access to those
files in the
operating system
unless your ID has
proper privileges;
as a result,
security isn't
compromised.
After
creating the
external table and
granting privileges,
the table can be
queried like any
other table (see
Listing 1).
The
external table can
be used as a
substitute for
SQL*Loader and a
regular table can be
used to hold its
data:
INSERT
INTO PROD.PRODUCTS
AS SELECT * from
DAVE.PRODUCTS;
The
data that was in
Excel is loaded into
Oracle, which allows
it to be backed up
and to perform
better than an
external table.
Limitations
External
tables in Oracle9i
have the following
limitations:
- They're
read-only, so no
data
manipulation
language (DML)
operations (such
as Insert,
Update, or
Delete) can be
performed
against them.
Also, no indexes
can be defined
on the table.
Oracle does plan
to support
writing to these
tables in a
future release.
- They
don't support
files larger
than 2GB. If you
attempt to
access a file
larger than 2GB,
Oracle fails
with the
following error:
KUP - 04039: unexpected error while trying to find file
<file name> in director
<directory name>
- Certain
commands against
the table, such
as ANALYZE, will
fail.
SQL> analyze table products compute statistics;
analyze table products compute statistics
*
ERROR at line 1:
ORA-30657: operation not supported on external organized
Table
This
limitation is
important because
most DBAs have
scripts that
regularly refresh
object statistics
based on a schema.
If you try to
generate statistics
on an external
table, the command
will fail.
- The
data in external
tables isn't
backed up as
part of regular
Oracle backup
routines because
it's outside the
scope of the
database.
Performance
One
expects the Oracle
kernel to incur more
overhead when
processing external
tables. An Oracle
TYPE and TYPE BODY
named
SYS.ORACLE_LOADER
exist in the
database and process
all statements
accessing external
tables. This process
increases the
overhead to access
the data, and when
compared to a
regular table is
many times slower.
Oracle must fetch
and perform tasks
that it normally
doesn't perform
(such as
conversions,
handling rejections,
and logging) and is
therefore
significantly
slower. I
experimented with
the performance of
external tables by
creating an internal
table with the exact
data as the external
one:
SQL> create table products_internal as select * from
products;
Table created.
The
table contained
5,292 rows, with the
same data as in the
spreadsheet. The
internal table
didn't have any
indexes or primary
keys defined. Based
on the script shown
in Listing 2, the
internal table was
consistently 8-10
times faster to
access than the
external one.
Optimally, external
tables should be
used as a means to
load data into
internal tables and
shouldn't be queried
as an external data
source.
Listing
2: Access to the
internal table is
significantly faster
than to the external
table.
set term off
col a new_value start
select dbms_utility.get_time() a from dual;
select count(*) from products_internal where product_no = 12313;
col b new_value stop
select dbms_utility.get_time() b from dual;
col c new_value answer
select (&stop - &start) c from dual;
col d new_value start_ext
select dbms_utility.get_time() d from dual;
select count(*) from products where product_no = 12313;
col e new_value stop_ext
select dbms_utility.get_time() e from dual;
col f new_value answer_ext
select (&stop_ext - &start_ext) f from dual;
col ans form 999
col ans_ext form 999
set term on
prompt
prompt
select 'Internal Table Execution Time in ms ', &answer ans
from dual;
select 'External Table Execution Time in ms ', &answer_ext ans_ext
from dual;
By
taking the following
actions, you can
minimize the
overhead used when
processing an
external table:
- Use
the PARALLEL
clause when you
create the
table. This
value indicates
the number of
access drivers
that will be
started to
process the
datafiles and
will divide the
files into
portions that
can be processed
separately.
- Use
datatypes in
Oracle that will
match the
physical data
attributes,
which will
eliminate costly
data conversion.
- Use
fixed values
when possible,
including:
- Fixed-width
character
sets
- Fixed-length
fields
- Fixed-length
records
The
RECORDS FIXED clause
is listed under
access parameters
and requires the
definition of
fields. In the
following example,
the data line is 40
bytes long, plus one
byte for the new
line. The field
names must be the
same as the column
names to which they
correspond.
RECORDS FIXED 41
FIELDS
(
emp_first_name char(20)
emp_last_name char(20)
)
- Use
single-character
delimiters, and
use the same
character sets
as used in the
database.
- Minimize
rejections since
Oracle performs
more I/O for
each one.
Practical
Applications
External
tables have many
different practical
applications, which
I'll place into two
categories: business
processing and
database
administration.
From
the
business-processing
standpoint, external
tables serve a vital
need in a
data-warehousing
environment, in
which Extract,
Transform, and Load
processes are
common. External
tables make it
unnecessary for
users to create
temporary tables
during these
processes, thereby
reducing required
space and the risk
of failed jobs.
External tables can
be used instead of
temporary tables and
utilities like
SQL*Loader. They
also provide an easy
way for companies to
load different
information sources
into Oracle-whether
in Excel, ACT!, or
Access, information
can be loaded and
processed.
From
the database
administration view,
I'm most interested
in features that
help me do my job. I
want to monitor
those files that I
look at frequently-alert.log
and init.ora-without
leaving a SQL>
prompt. Then I can
use SQL commands to
query the file and
specify WHERE
clauses for more
sophisticated
processing. An
example of creating
an external table to
point to the alert
log is as follows:
create directory BDUMP AS 'd:\oracle9i\admin\PROD\bdump';
create table alert_log (text varchar2(200))
organization EXTERNAL (
type oracle_loader
default directory BDUMP
access parameters
( records delimited by newline
badfile 'dave.bad'
logfile 'dave.log'
fields terminated by ' '
)
location ('PRODALRT.LOG')
)
reject limit unlimited;
Listing
3: The
DBA_EXTERNAL_LOCATIONS
view.
SQL> desc dba_external_locations;
Name Null? Type
----------------------------------------- -------- ----------------
OWNER NOT NULL VARCHAR2(30)
TABLE_NAME NOT NULL VARCHAR2(30)
LOCATION VARCHAR2(4000)
DIRECTORY_OWNER CHAR(3)
DIRECTORY_NAME VARCHAR2(30)
Database
Administration
It's
important to know
what views in Oracle
contain the
information
pertaining to
external tables. The
view DBA_TABLES
shows external
tables and has a
value of 0 for
PCT_FREE, PCT_USED,
INI_TRANS, and
MAX_TRANS. All other
storage columns in
the view are null.
Scripts that use
this view to
determine problems
should be updated to
access
DBA_EXTERNAL_TABLES.
This view contains
all of the
parameters that you
specified when you
created the external
table.
Another
useful view is
DBA_EXTERNAL_
LOCATIONS, which
provides a quick way
to see which files
are accessed from
the database (see
Listing 3).
---
Dave
Moore Dave
Moore is a product
architect at BMC
Software in
Austin, TX. He's
also a Java and
PL/SQL developer,
Oracle DBA and
author of Oracle
Utilities by
Rampant Tech Press.
 |
For more details on Oracle utilities, see the book "Advanced
Oracle Utilities" by Bert Scalzo, Donald K. Burleson, and Steve Callan.
You can buy it direct from the publisher for 30% off directly from
Rampant TechPress. |
|