What is Encryption?
Simply put, encryption is the art of disguising data.
Suppose the combination to my wall safe is 37529. Because I
am not very good at remembering numbers, I keep on
forgetting it. So I come up an idea — I write it down on a
piece of paper and tape it on the safe! There it is — it’s
there when I need it, and I don’t have to worry about
forgetting it. What a great idea!
But, there is a slight problem. If a burglar happens to
see it, there is no need for him to try to crack the code —
it’s all there in plain sight! So, I change the strategy a
little bit — I decided to disguise the combination or alter
it in such a way that only I know how it was altered. For
instance, I may decide to reverse the digits; the number
becomes 92573, which is on the piece of paper. The burglar
will see the number 92573, which is not the safe
combination. He will not be able to guess the correct one
since he does not know how it was altered in the first
place.
Yet over time, this method is rendered useless. After a
few burglars visit my house, they all learn that I simply
reverse the digits. Alas! I have to figure out a new
encryption scheme. The modified scheme I come up with is to
add a number to the actual value — say, 9. So my safe
combination 37529 becomes 37529 + 9 = 37538, which goes to
the piece of paper. To decipher the true meaning, the
burglar has to know to two things:
- The logic used to perform the encryption
(i.e., adding a number to the actual value)
- The number that is added (i.e., 9, in this
case)
If I suspect that the burglars have somehow learned the
first, I can simply change the number 9 to something else.
It’s important to understand the differences between the
means to achieve the encryption and the value used to
scramble original data. The first concept, the logic used to
scramble is known as an algorithm,
and the value used to scramble the actual data is known as
the encryption key. We can change
one, keep the other constant, and get a different encrypted
value. For instance, I can keep the algorithm the same, but
change the key to arrive at a different encrypted value.
In the case of my safe combination, the algorithm is very
simple — I merely add the key to the original data to come
up with the encrypted value. However, this simplistic
algorithm has no place in reality; encryption algorithms are
actually far more complex. It’s not the purpose of this
article to go into the details of encryption algorithms;
that would fall into the realm of academia. As users, we do
not need to delve into the details, but to learn the basics
to use in practice. Because the roots of this algorithm are
common knowledge, it’s not practical to hide the algorithm
itself. But by using a secret key, the encrypted value can
remain safe — decipherable only by using the right key.
Since the “key” is literally the key to decipher
the data, my encrypted value is safe as the key itself. For
instance, in this case, the burglar knows the algorithm
(i.e., that I used a single-digit number to arrive at the
encrypted value). But what is the number? Well, he
doesn’t have to work hard — there are only ten single digit
numbers — from zero to nine. He has to make up to ten
guesses to get the right number, something he can definitely
accomplish in the short time span. But what if I used a
two-digit number? In that case, the burglar would have to
make 100 guesses — a more difficult feat. I could frustrate
the burglar more with a three-digit key, which forces him to
make up to 1,000 guesses. The point is, while keeping the
algorithm the same, encryption can be made more secure by
increasing the number of digits in the key. For real-life
encryption, the same concept holds true — increasing the
size of the key will force the intruder to make more
guesses, and make the code more difficult to crack. In
digital encryption, since computers are used to encrypt and
decrypt, making 1000 guesses, even 1 million guesses is
trivial and can be done in matter of minutes; hence, keys
must be very long to offer a real barrier to guessing.
In summary, you need the following for encryption
Types of Encryption
In the encryption case we’ve been talking about, note a
very important behavior: the key used to decrypt is the
same as the one used to encrypt. This type of
encryption is known as symmetric
key encryption, since the same key is used in both cases.
Symmetric key encryption makes the algorithms easier to
implement; however, there is a fundamental security risk
inherent in this model. Since the same key must be used to
decrypt the data that is encrypted, the key must be
protected along with the data. What if the intruder gets the
key? He will be able to decrypt every encrypted value,
seriously compromising the security of the system. In
addition to this, another inherent problem is key storage
itself. And if the key is lost, the encrypted data can never
be decrypted.
To address these issues, another kind of encryption
exists, in which two different keys are used to encrypt and
decrypt data. Since the keys are different, this scheme is
known as asymmetric encryption. In
this method, a set of two keys are generated simultaneously.
One key, known as the “public key” is given out to
the others. This key is used to encrypt the data. While
decrypting, the user uses the other key, known as “private
key” to decrypt the value. The public key is made well known
to the world and is not made a secret. The private key can
be stored securely since it is used only during the
decryption process. Since the private key is not used during
encryption, this not required to be given out to the sender
of the data. This obviously increases the security of the
key.
However, note that the public and private keys, although
not the same are mathematically connected – they have to be.
This makes the guessing of the private key a little bit
easier for someone who knows the public key. To reduce the
risk, the key must be longer, usually 1024 bits, unlike the
typical 128-bit key sizes in symmetric encryption schemes. A
larger key also means higher computational processing
requirements on the server, which translates to longer
elapsed time. This makes asymmetric encryption schemes a
little undesirable.
When to Use What Type
For data-in-motion-type encryptions, in which the data is
encrypted to be sent across wire to reduce chances of
eavesdropping, the asymmetric encryption is usually used,
due to two primary reasons. First, the single key use in
symmetric key encryption is not practical; and second, the
length of time for which the encryption key must be in use
is very short, so much smaller encryption keys can be used.
Even if the intruder cracks the key, the data is already
transmitted, reducing chances of compromise. In case of
encryption of data-at-rest, the key is held for a much
longer duration, so a longer key is necessary. To balance
the need to reduce key length and reduce risk of exposure,
asymmetric encryption schemes are used in data-at-rest.
Block and Streaming Encryption
Now that we have seen how encryption schemes are commonly
used, an interesting question is raised: How does an
algorithm work in data input? For the simple example of the
safe combination, I have used an algorithm that adds a
number to the entire input data. Note that I stress
the word, “entire,” which means the entire data must be
available to the algorithm before it can be encrypted. In
case of streaming data, such as one flowing between two
servers over the network or from the Web server to the
browser, the data is not available in its entirety. To
encrypt the data that is completely available, the algorithm
breaks it into chunks (of typically eight bytes) and then
applies encryption; but streaming data cannot be broken into
chunks. Hence, a different strategy is employed for
streaming data. The other type, in which the entire data is
available, is known as block
encryption.
Encryption Algorithms
Now that you understand how encryption works, let’s focus
on one important part of encryption: the algorithm.
One of the first algorithms to be used was the
Digital Encryption Standard (DES), which breaks up
data into blocks of 64 bits and encrypts them using a 64-bit
key. Of the 64 bits, only 56 are used. A smaller key length
makes this a fast encryption approach; however, with modern
computers, an intruder can easily break the 56-bit key.
To address the security risks, a modified form of DES was
introduced — Triple DES or DES3. This makes
three passes of the input data through the DES algorithm, so
it is reasonably secure for most applications. Another form
of DES3 uses three keys to further secure the encryption.
Three-key DES3 and two-key DES3 use keys of 156 and 112 bits
respectively.
DES3 was considered adequate for most applications for a
significant amount of time, but advancement in computing
resources diminished the safety of the algorithm.
Additionally, the compute-intensive algorithm proved to be
too strenuous on the server housing the database. This led
to the development of Advanced Encryption Standard (AES).
AES uses one of the three key lengths — 128, 192, and
256, depending on application. The more the key length, the
longer the computation cycle, but also, the lesser the risk
of an intruder breaking it.
Another type of encryption is RC4, which
is a streaming encryption algorithm.
Padding
Let’s revisit a point that was made in the previous
discussion about block encryption: an algorithm breaks the
input data into blocks of eight bytes (typically) and
encrypts one block at a time. What if the length of the data
is not divisible by eight? Then the last block of
bytes will be less than eight bytes, but the algorithm will
not be able to operate on this block because it is smaller
in size.
To address this, a value must be added to the data to
make the length exactly divisible by eight; this process is
known as padding. The value is
removed when the data is decrypted. In most cases, a simple
padding with a known value such as zero is acceptable; this
process is known as zero-padding. However, padding by zero
also makes the data somewhat prone to discovery by an
intruder, since he can now guess what zero looks like in
encrypted format. Therefore, padding by zero is not
considered cryptographically secure. Instead, a value that
is less vulnerable to break-ins is added by a standard known
as Public Key Cryptography Standard #5, or PKCS#5.
Of course, if you know that the data is in lengths of
multiples of eight, then there is no need to pad.
Chaining
Continuing along the previous line of discussion, input
data is broken into chunks and then encrypted. Since data is
separated, wouldn’t it be nice to somehow de-link the
encrypted blocks so that even if a single block is cracked,
the rest will be intact?
This is when chaining comes in:
it specifies how these chunks of data are linked (or
de-linked) during encryption. In the most common format, a
block is XOR-ed with the encrypted value of the previous
block. The resultant value is then passed to the encryption
algorithm. The same is repeated for all the blocks. This
scheme is known as Cyclic Block Chaining.
Since the first block has no previous block, a value known
as an initialization vector (IV) is used to XOR.
In another form of chaining, each block is independently
encrypted, known as Electronic Code Book
chaining method. Other methods are Cipher Feedback
and Output Feedback.
Hashing
So far, we have talked about a concept in which the input
data is masked to protect it from prying eyes. However, this
does not protect the data from being modified by an
intruder. For instance, assume that we are transmitting the
salary of the CIO and a disgruntled employee decided the
change the salary from 10 million to 10 thousand by chopping
off a few zeroes at the end. How does the receiving end know
that this data has been altered? Or, how can we know that
the data has not been altered?
To address the issue, a concept called checksum
is used. It has been used extensively in the past
to ensure filesystem integrity, and it has been in use for
some time. The idea is to calculate a checksum value by
applying some algorithm to an input value. Then, upon
receiving the input value, the receiver also applies the
same algorithm and gets a checksum. He compares the checksum
calculated with what came with the data; if they match, the
data has not been tampered with. If they differ, then the
data has been compromised. This checksum is often the hash
value of the input value.
Consider the implications of the checksum concept — the
input value itself is not masked, because the intention is
not to prevent people from seeing it. The intent is to
prevent people from modifying it. Of course, data can be
encrypted as well; but the purpose of hashing is not to
prevent exposure.
The second important concept to remember is that this
hash value is calculated from the input value, but the input
value cannot be calculated from the hash value.
This is important to understand as it has use in other
applications. For instance, password management functions
can store hash values in tables. When a user wants to
validate passwords, the function can hash the user’s entry
and compare the hash to the hash, instead of comparing plain
text. This way, the actual password is never displayed and
is never compromised. Oracle password management functions
the same way — it uses a one-way hash value in the password
column of the USER$ table.
MAC
Hashing provides an important function — validating the
integrity of a piece of data. However, consider this
scenario: An intruder might come to know the hashing
algorithm used and then use it to calculate the hash value
of the input data after changing it. The receiver gets the
tampered input data as well as the changed hash value. Since
the values match, the receiver will not be able to determine
that the data has been compromised.
To prevent this situation, the concept known as
Message Authentication Code (MAC) was developed,
which uses a “key” to calculate the hash value (known as
MAC value in this instance). The same key
is used by the receiver in calculating MAC value. Since key
is not likely to be known by the intruder, a changed MAC
value will be different and data compromise will be
unearthed.
Implementing Encryption
With the conceptual foundation in place, let’s delve into
the actual task of implementing an encryption system.
In order to implement an encryption system, you do not
have to reinvent the wheel and create the algorithms; they
are already available in supplied packages. Oracle provides
a package called dbms_obfuscation_toolkit (DOTK)
that contains several procedures and functions to allow you
to achieve encryption. In Oracle 10g, a new package
named dbms_crypto offers similar but
improved functionality. Even though the older package is
still available in Oracle 10g, it’s recommended
that the dbms_crypto be used for all encryption related
matters for the following reasons:
- Crypto provided AES encryption methods; DOTK
does not. AES is now the de facto
standard in many circumstances and is preferred,
because of its increased security and speed.
- Crypto provides automatic padding. In DOTK,
you have to provide padding of data smaller than
the block length.
- Crypto provides PKCS#5 padding, which is
cryptographically more secure than user-supplied
padding (such as padding with zeroes).
And of course, DOTK may be deprecated soon.
Please bear in mind that Crypto (and DOTK, as well) offer
only symmetric encryption only (i.e., the same key
is used to encrypt and decrypt). Asymmetric encryption, in
which a public-private key pair is used, is not available in
either package. To implement such functionality, you will
need to use Oracle Advanced Security, an extra-cost option
for the database. For most common applications, symmetric
key is adequate. We will focus on Crypto for the remainder
of this article; OAS will be covered in a future article.
Encryption
From our earlier discussion on defining encryption, you
know that to encrypt data, you need the following:
- An encryption algorithm
- An encryption key
- A padding method
- A chaining method
What you choose to use for the key depends on your choice
of algorithm. Once you have identified the algorithm to use
(say, AES128) you have to use a key that is 128 bits long.
Crypto provides constants in the package to indicate
different types of algorithms. Table 1 shows these constants
with the algorithms and the key lengths they need.
| Constant Name |
Description |
Effective Key Length |
ENCRYPT_DES
|
Data Encryption Standard. Faster but not secure. |
56 |
ENCRYPT_3DES_2KEY
|
Modified Triple Data Encryption Standard.
Operates on a block 3 times with 2 keys. |
112 |
ENCRYPT_3DES
|
Triple Data Encryption Standard. Operates on a
block 3 times but with one key. |
156 |
ENCRYPT_AES128
|
Advanced Encryption Standard |
128 |
ENCRYPT_AES192
|
Advanced Encryption Standard |
192 |
ENCRYPT_AES256
|
Advanced Encryption Standard. |
256 |
ENCRYPT_RC4
|
This is the only stream cipher. |
|
Table 1: Encryption algorithms.
Once you choose your algorithm, you have to decide on
what key to use. For example, if you have chosen AES128 as
the algorithm, you have to choose a 128-bit key, and this
key has to be of a RAW data type.
Suppose we decide to use the AES scheme with 128-bit keys
along with CBC chaining and PKCS#5 padding. The following
piece of code performs the encryption of a data value
“ConfidentialData” with the key as “1234567890123456.”
1 declare
2 l_key varchar2(2000) := '1234567890123456';
3 l_in_val varchar2(2000) := 'ConfidentialData';
4 l_mod number := dbms_crypto.ENCRYPT_AES128
5 + dbms_crypto.CHAIN_CBC
6 + dbms_crypto.PAD_PKCS5;
7 l_enc raw (2000);
8 begin
9 l_enc := dbms_crypto.encrypt
10 (
11 UTL_I18N.STRING_TO_RAW (l_in_val, 'AL32UTF8'),
12 l_mod,
13 UTL_I18N.STRING_TO_RAW (l_key, 'AL32UTF8')
14 );
15 dbms_output.put_line ('Encrypted='||l_enc);
16* end;
SQL> /
Encrypted=C0777257DFBF8BA9A4C1F724F921C43C70D0C0A94E2950BBB6BA2FE78695A6FC
PL/SQL procedure successfully completed.
SQL>
Let’s analyze the above piece of code line by line.
| Line |
Description |
| 2 |
The key is defined here. As you can see the key
is exactly 16 characters, which it must be for AES
to work. |
| 3 |
The input value, which needs to encrypted. |
| 4 — 6 |
These lines need some more explanation; hence it
has been provided at the end of this table. |
| 7 |
We define a variable to hold the encrypted
value. |
| 9 |
The function encrypt is called. |
| 11 |
The function expects the input value to be in
RAW, not varchar2 as is the reality. In this line,
we have converted it to raw using the function
strings_to_raw in the supplied package utl_i18n. |
| 13 |
As with the input value, the function also
expects the key to be RAW as well. Hence we convert
it here. |
| 15 |
Finally, we display the encrypted value.
However, in reality you will not display the value
as it is meaningless; you will probably use it for
something else, such as store it. |
The encrypted value, also in RAW, is displayed as a
hexadecimal string. This is the basic workings of the
encrypt function. Now, let’s take a look at the lines we
omitted from the previous discussion.
Remember, to encrypt you need the following components:
- An input value
- A key
- An algorithm
- A padding scheme
- A chaining scheme
In the previously cited piece of code, the input value
and the key were provided in lines three and two
respectively — but where are the rest of the necessary
components?
The rest are all crowded into lines four through six.
These options are specified in the package dbms_crypto as
named constants. For instance, the AES 128-bit algorithm is
specified by the constant ENCRYPT_AES128. Table 1 lists
these encryption constants. Similarly, the CBC chaining
method is specified by CHAIN_CBC and the PKCS#5 padding is
specified by PAD_PKCS5. All these three constants have been
specified as having been added together in the input
parameter MOD in the function encrypt(). This is exactly how
dbms_crypto expects to get the specifics of algorithm —
padding and chaining. The combination of these three is
known as a modifier to the function.
The following is a list of padding modifiers available:
| Constant |
Description |
PAD_NONE
|
No padding is done. If length of the data is
exactly same as the block size or a multiple
thereof, this option is used. |
PAD_ZERO
|
The data is padded with zeroes to make it a
multiple of block size. |
PAD_PKCS5
|
PKCS#5 based padding is done. This is the safest
and the recommended approach. |
The following options are available for chaining:
| Constant |
Description |
| CHAIN_ECB |
Electronic Code Book standard |
| CHAIN_CBC |
Cyclic Buffer Chaining method |
| CHAIN_CFB |
Cypher Feedback method |
| CHAIN_OFB |
Output Feedback |
If you wanted to use no padding and Electronic Code Book
chaining, you could have called the ENCRYPT function as:
4 l_mod number := dbms_crypto.ENCRYPT_AES128
5 + dbms_crypto.CHAIN_ECB
6 + dbms_crypto.PAD_NONE;
Similarly you can specify any combination of algorithm,
padding and chaining to suit your needs.
Coming next in part
2: Decryption, key management, key
storage, and more.
--
Arup Nanda
is the Lead DBA at Starwood Hotels & Resorts. He has been an
Oracle DBA for more than 11 years, touching all aspects of
database management — from modeling to performance tuning to
disaster recovery. He is the co-author of the book
Oracle Privacy Security Auditing (2003,
Rampant Tech Press).