Becoming an access path guru
requires time learning how
optimization parameters, statistics
and hints affect SQL access paths
and statement performance. This
article will provide you with a few
hints and tips to help you begin
your scientific analysis of the
Oracle optimization process.
In my experience most good tuners share a common set of traits. They are inquisitive by nature, understand that there is no substitute for experience and dedicate lots of time performing scientific analysis on SQL statements and database performance.
The objective we are trying to
accomplish is to identify the
effects that startup parameters,
statistics and hints have on access
paths, and ultimately, SQL
performance. You also need to spend
some time changing a statement’s
access path and noting the impact
that different access paths have on
A well known tuning guru, Jonathan Lewis, uses terms like “scientific analysis” in his discussions on SQL and database tuning. It’s really a very appropriate description of his activities. After reading many of his works, I would describe him as a database performance scientist. He identifies something he wants to learn more about, creates a test environment, executes a series of tests, analyzes the data and provides documented results to reinforce his theories.
That’s what we all MUST DO to expand our knowledge on the Oracle optimization process. We need to become database performance scientists. We do that by creating a test environment, running and documenting performance baselines, changing the environment to influence the statement’s access path and documenting the results.
Setting up a Test
Running test cases to identify the effects that different database environmental settings have on a statement’s access path is not as monumental as it may seem.
Most production databases have a test counterpart. Select an environment that is actively used by your application developers. Don’t worry, if one of our tests causes a statement to “run longer than anticipated”, we can use the trusty ALTER command to kill our session. We can also run our workloads during lull times. Lastly, the majority of changes we will make to influence the statement’s access path will be done in our own session. We won’t be making changes that affect the entire database environment.
I prefer to use an active test environment because it allows me to easily select SQL statements to use as test cases, the data is usually more representative of what will be found in production, and the developers will most likely have a firm understanding of the active workload being executed.
You need to talk to the developers who are responsible for running workloads on that test environment to ensure that the changes being made to the test data don’t skew your results from one test execution to the next. You can’t run a test, have a load insert another 100 thousand rows in the table, run another test and expect to have a good comparison. You want your test bed to be active but not so active that it complicates your testing process or causes your tests to generate incorrect results. You’ll need to use common sense when selecting the test environment as well as determining the most optimal time to run your test cases.
We also want to choose a test environment that is providing adequate performance for the workloads being executed upon it. We really don’t want to use an environment that isn’t performing well to begin with.
There’s a myriad of options available to you. If you don’t want to impact any of your test environments, create a test environment of your own. Clone one of your test databases to a sand-box environment if you can.
You’ll also want to make sure that statistics are up to date on the database you will be using as a test bed. If you are running 10G/11G databases, the database will run statistics jobs for you automatically (isn’t 10G/11G great?). If you don’t have statistics run automatically, it will be up to you to analyze the data objects to ensure that the statistics optimally represent what is stored in the data structures.
Parameters That Affect the
The next step is to document the environment. There are a couple of dozen parameters that affect optimization and SQL statement performance. To begin, we are going to choose the basic parameters that are easy to change and have the biggest impact on optimization. These are not all of the parameters that can influence the optimization process, just the ones that are easy to change and provide the best chance of successfully achieving an access path change.
It is important that we read the documentation beforehand for these parameters for the specific Oracle release that we are using as our test environment. We know that each Oracle release may contain enhancements to these parameters that change the effect they have on the optimization process and how we alter them to different values.
We’ll want to document the following parameters to begin our scientific analysis:
optimizer_features_enable - This parameter allows you to make the optimizer behave as it would for a specific release of the Oracle database. You set the value to a release identifier (the listing of the optimizer releases that you can set this parameter to is provided in the Reference Manual) and the optimizer will act as it would if the database were at that release.
cursor_sharing – For our first set of initial tests, we’ll hardcode values in our selection criteria to ensure that our statements aren’t affected by cursor sharing. We will review cursor sharing and the impact it has on statements using bind variables, in-depth in upcoming blogs.
optimizer_index_caching – Administrators will often set this parameter in conjunction with optimizer_index_cost_adj to influence the optimizer to use more indexes and Nested Loop joins. Setting this parameter makes Nested Loop joins look less expensive when compared to Hash or Sort-Merge joins.
optimizer_index_cost_adj - Allows the administrator to make index access more, or less, costly to the optimizer. The default value of 100 tells the optimizer to evaluate the index using the regular cost. The lower you set this value, the less costly the indexes will look to the optimizer. Conversely, the higher you set this parameter, the more expensive the indexes will look.
db_file_multiblock_read_count - The number of blocks that Oracle will read in one I/O when performing a scan of data (i.e. table scan). We need to be aware that in 10G/11G that Oracle will choose the optimal size of this parameter for us and could adjust it based on database workload.
optimizer_mode - Sets the approach the optimizer will use when analyzing queries. Since there have been a few changes made between Oracle 9i and Oracle10g, I’ll provide information on both sets.
optimizer_use_pending_statistics - New parameter (and feature) to Oracle Database 11g. In previous releases, when you gathered optimizer statistics, the statistics were automatically published once the gather was completed. 11G provides administrators with the option of separating statistics gathering from statistics publishing. This feature allows us to test the newly gathered statistics before they are published.
optimizer_capture_sql_plan_baselines and optimizer_use_sql_plan_baselines – Another new 11G feature that we will be testing in the near future. When these two parameters are set, the 11G optimizer will look for a SQL plan (access path) baseline for the SQL statement being optimized. If one is found in SQL Management Base, then the optimizer will review the data access costs of the plans and pick the one with the lowest cost.
Oracle 9i provides the following settings:
Choose - If optimizer_mode is set to choose, the optimizer is able to switch between rule and cost-based optimizations. When optimizer_mode is set to CHOOSE, the optimizer uses the all_rows cost-based approach for a SQL statement if there are statistics in the dictionary for at least one table accessed in the statement. If you generate statistics on one table, every query that accesses that table will use the cost-based optimizer. What happens if other tables in the query do not have statistics collected? The optimizer will make an educated guess on the statistics for those tables. The problem is that Oracle isn’t always a good statistics guesser and the end-result is a “less than optimal” access path.
first_rows - Influences the optimizer to choose an access path that minimizes response time. Most often used for online transaction processing systems that return small result sets. The optimizer favors Nested Loop joins and index access.
Rule - The optimizer will use the rule-based approach during analysis. The rule-based optimizer does not use data statistics as input when generating the access path. Each access path is assigned a numerical ranking. The rule-based optimizer chooses the access path that has the most favorable numerical ranking. The rule-based optimizer has been superceded by the cost-based approach. There are a few cases where I have seen the rule-based optimizer choose a better access path than the cost-based method – but not many.
all_rows - Influences the optimizer to choose an access path that minimizes total execution time. Most often used for decision support and data warehouse environments. The optimizer tends to favor full table scans, Hash and Merge-Scan joins.
Oracle10g/11g settings for optimizer_mode:
first_rows - The same influence on the optimizer as it did in Oracle9i.
first_rows_n - Where “n” = 1, 10, 100, 1000. Influences the optimizer to optimize queries to provide the fastest response when returning the “n” number of rows. Acts as a throttle, which allows you to better balance the optimization process.
all_rows – The same influence on the optimizer as it did in Oracle9i.
We will use the ALTER SESSION SQL statement to alter these parameters during our scientific analysis on the effects they have on the Oracle optimization process.
Documenting Our Test
After we document some of the parameters that affect optimization, let’s turn our attention to documenting the tables we will be accessing. In my next blog, I’ll provide you with a few hints and tips on how to select or create SQL statements to use in your test cases but let’s continue our discussion on documentation.
The following information will provide you with a good base of information on the data objects our statments will be accessing. Since we are just beginning our scientific analysis, we’ll use basic storage objects (tables and b-tree indexes). We’ll discuss some of the more complex objects (bitmap indexes, partitioning, etc.) in later blogs.
Row counts for all tables that are accessed by our test queries can be found in the num_rows column in dba_tables if we have statistics generated for our tables.
Number of blocks the table is using can be found in the blocks column in dba_tables if we have statistics generated for our tables.
Index listing for all indexes on our tables. The query below will provide you with a listing of indexes for a given table:
select b.index_name, b.column_name,
from sys.dba_indexes a, sys.dba_ind_columns b
where a.index_name = b.index_name
and a.table_owner=’&table_owner’ and
a.table_name = ‘&table_name’
order by b.index_name, b.column_position
Once we find all of the indexes and columns in those indexes, let’s check them for both selectivity and data skew. Selectivity is the number of unique values for a given column. Skew means that some values can occur a few times in a column while other values can occur many, many times. Since data skew will affect optimization, that information will also be important to us.
We can easily find the selectivity for a single or multi-column index by accessing the distinct_keys column in our dba_indexes table if we have statistics generated. For multi-column indexes, we will want to check the individual selectivity for each column in our multi-column index. We can do this with the following query:
select count (distinct index_colname) from owner.table_name;
Where index_colname is one of the columns in our multi-column index,owner is the table owner and table_name is the name of our table. We’ll need to do this for all columns in our multi-column indexes.
We’ll use the following queries to identify data skew and find out some information on histograms:
select index_colname, count(*)
group by index_colname;
Where index_colname is the column name, owner is the table owner and table_name is the name of our table. We’ll need to do this for all columns in our indexes. We’ll do this for both single column and multi-column indexes.
select * from dba_tab_histograms where owner=’&owner’ and table_name=’&table_name’;
Where owner is the table owner and table_name is the name of our tables. We’ll need to do this for all tables that our queries will be accessing.