Oracle Consulting Oracle Training Oracle Support Development
Home
Catalog
Oracle Books
SQL Server Books
IT Books
Job Interview Books
eBooks
Rampant Horse Books
911 Series
Pedagogue Books

Oracle Software
image
image
Write for Rampant
Publish with Rampant
Rampant News
Rampant Authors
Rampant Staff
  Phone
  252-431-0050
Oracle News
Oracle Forum
Oracle Tips
Articles by our Authors
Press Releases
SQL Server Books
image
image

Oracle 11g Books

Oracle tuning

Oracle training

Oracle support

Remote Oracle

STATSPACK Viewer

    Privacy Policy

 

 
 

OEM Grid Control Architecture for Very Large Databases (VLDB)

Excerpt by Rampant Author Porus Homi Havewala, author of Oracle Enterprise Manager Grid Control by Rampant TechPress.

In this article I will offer an overview of the architecture used to achieve this high scalability in Grid Control. This kind of information will be useful for customers that are contemplating the use of Grid Control but need guidance about properly architecting their solutions.

Suppose a DBA team, or its management, decide to implement Grid Control for a VLDB system. The normal tendency would be to use a test or development server to install the product, be it on a flavor of Unix, Linux, or Windows. This means all OEM Grid Control components (the current release at the time of writing being Release 4) are placed on a single server. This includes the repository database, Oracle Management Service (OMS), and the EM agent.

Then, OEM Agents would be installed by either the push or pull method, on a few other development and test database servers. After the DBA team experiments with the functionality of Grid Control, it would likely tentatively decide to install an agent on a production server for the first time.

Let's say eventually management decides to move the whole shebang of Grid Control to production, but it now makes the mistake of assuming that what works for a few development servers would also work for production. It authorizes the DBA team to install OEM Grid Control on a production server, again a single server. The team installs all the components again on a single server, perhaps sharing the Grid Control install with a production or test database. This is followed by OEM agents being installed on all the production and test database servers pointing back to the Grid Control server.

As the Grid Control workload gradually increases, as more and more databases are managed by more DBAs, as more and more monitoring is performed, as Grid Control is used more and more for RMAN backups, Data Guard setup and monitoring, cloning of databases and homes and so on, the Grid Control system grinds to a halt.

We need to understand the Grid Control internals. The main working component of Grid Control, the engine as it were, is OMS.

This is a J2EE application deployed on Oracle Application Server 10g; the member components are the Oracle HTTP Server, the Oracle Application Server Containers for Java (OC4J), and the OracleAS Web Cache. Therefore, Grid Control is a reduced version of Oracle Application Server itself.

At the Unix server level, we see a Unix process that is the actual OC4J_EM process. This is also seen when the opmnctl command is executed:

./opmnctl status
Processes in Instance: EnterpriseManager0.GridMgt001.in.mycompany.com
-------------------+--------------------+-------+---------
ias-component | process-type | pid | status
-------------------+--------------------+-------+---------
WebCache | WebCacheAdmin | 2071 | Alive
WebCache | WebCache | 2099 | Alive
OC4J | OC4J_EM | 27705 | Alive
OC4J | home | N/A | Down
dcm-daemon | dcm-daemon | N/A | Down
LogLoader | logloaderd | N/A | Down
HTTP_Server | HTTP_Server | 2072 | Alive

A small digression at this stage: Since the OMS runs on Oracle Application Server, you can control it like you would do with Application Server: use the EM Application Server control, or at the command line use opmnctl (Oracle Process Management Notification Control), or dcmctl (Distributed Configuration Management Control).

Thus, OC4J_EM is only a single Unix process with its own PID. The memory used by this process is also limited, it is set by the file $ORACLE_HOME/opmn/conf/opmn.xml. You could perhaps increase the memory used by the process but it remains just a single process. We can imagine the one process being used for managing numerous databases and servers—to perform various tasks such as Data Guard setups, cloning, and so on—and understand why such a setup will simply not scale.

Obviously, if the database itself were to run on a single process, with the db writer, the log writer, the archiver, and numerous other process functions being performed by a single process, then the database would become less efficient and scalable. This is the primary reason why, if all Grid Control components are placed on a single server, only limited scalability will be achieved: you would be limited to one OC4J_EM process with its own limits of memory and processor speed. If the OC4J_EM process were to reach the limits of its memory under heavy load, and the process were to slow down or not respond, then other DBAs would not be able to login to the Grid Control Console for their own database management work.

Placing Grid Control components on a single server is not recommended in production, neither is sharing it with a production or test database on the same server. Grid Control needs its own server, and it needs its own set of servers in a properly architected solution. It is recommended that some time be spent to plan the Grid Control site being contemplated for production.

\Senior management should be convinced of the need for this initial study, it should approve the budget for the solution, and the work should then be scoped out and performed as a professional project, since Grid Control is an enterprise solution and not a minor tool to deploy on a DBA workstation.

OEM Internals

OEM Grid Control is drastically different from previous incarnations of Enterprise Manager. In the past, Enterprise Manager was not so scalable, simply because it was not N-tiered. The oldest avatar was Server Manager, which was a PC executable utility.

When Grid Control was created, the internal architecture was drastically altered to the N-tier model. Oracle's vision is broadly N-tier, which is in line with and also sets the direction for modern IT thought. Grid Control became the three components mentioned previously, and because the main engine, the OMS, now runs on the application server as an OC4J application, it instantly became scalable.

Herein lies the secret of the immense scalability of Grid Control. The boundaries were broken, and horizontal scaling were opened to the EM world. The more OMS servers you add to the EM site, the more targets you can manage.

The Right Architecture for very large OEM

Our real-life large site implementation example will illustrate this concept more clearly. At the foundation of the implementation, industry-standard and open architecture can be utilized, such as Linux servers.

There is no need to deploy powerful expensive servers (beefy beasts that typically have 24 or more CPUs and 32GB or more memory). Smaller 4 CPU machines with 8 GB memory are being used, since the intention is to scale horizontally and not vertically.

The "Free Space" mentioned in the specification table is for the Oracle software, such as the Oracle Database Home, the Oracle Management Service Home, and the Agent Home. It does not include the database, which will be placed on either a SAN or a NAS (Netapps filer). The database space requirement for the EM Repository would be approximately 60 to 70GB, with an equal amount of space reserved for the Flash Recovery Area, where all archive logs and RMAN backups will be stored. Oracle recommends database backups to disk (the Flash Recovery Area), so that fast disk-based recovery is possible.

Even with a large number of targets being monitored and managed, the database size rarely goes above above 60 to 70GB with out-of-the-box functionality. A new feature of Grid Control is that the EM repository database (10g) manages itself so far as space is concerned, in the sense that it performs rollups of metric data at predetermined intervals. Hence the metric data that is being collected continuously from the targets does not drastically increase the database size. On the other hand, it is possible to manually create extra metrics for monitoring, and this may lead to an increase in the database size greater than this example figure.

During the installation phase, the Full Grid Control software is installed first of all on one of the servers, using the Grid Control installation CDs. This is done by selecting the Enterprise Manager 10g Grid Control using a new database installation type. This server becomes the repository server since the repository database is created on this machine. Being a full install, an OMS and EM Agent are also installed on the same repository server. (You can ignore the OMS at this point: more on this later.)

Next, an additional OMS is installed on each of the other servers, this is done using the same Grid Control Installation Cds but selecting the Additional Management Service installation type. During the installation of the additional service, you are asked to point at an existing repository, so point to the repository database on the first server. The repository database must be up and running at this stage with a successful installation of the repository in the Sysman schema.

In the process of the Additional Management Service installation type, only the management service (OMS) and the EM agent will be installed. This is completed on three or more additional servers, these servers now become the management server pool.

The repository database server can be complemented with a standby database server using Oracle Data Guard, or optionally an Oracle RAC cluster on multiple nodes if it is a requirement to horizontally scale up the repository database performance. But a noteworthy point is that in Grid Control, the performance requirement is not so much on the database side, but more on the management server side. The highest scalability is achieved on the management servers since the OC4J_EM is where the bulk of the Grid Control work is performed. This is the reason why the architecture should include three or more management servers that are load balanced for a large Grid Control setup.

Load balancing the pool of management servers forms an integral part of this architecture. A hardware load balancer, such as a Big IP Application Switch Load Balancer from F5 Networks, can be used for this purpose. (This company's flagship product is the BIG-IP network appliance. The network appliance was originally a network load balancer, but now also offers more functionality such as access control and application security.)

The load balancer is set up with its own IP address and domain name for example: gridcentral.in.mycompany.com. The load balancer in turn points to the IP addresses of the three management servers. When a service request is received at the IP address or domain name of the load balancer, and this can be at a particular port which can be set up at the balancer level, the balancer decides to distribute the incoming service request to any of the three simultaneously active management servers in its pool, at the port specified.

Grid control uses various ports for different purpose—for example, there is a certain port used for the Console logons, and a different port used for the Agent uploads of target metric data. The Big IP must be set up for all these ports so that load balancing occurs for Grid Control Console logons as well as for Agent uploads of target metric data.

An additional benefit is that this would give excellent redundancy to the Grid Control system. If one of the management servers were to stop functioning for any reason, such as could occur under heavy load, the OC4J_EM process may need to be restarted using opmnctl. Thus one of the management servers can be inactivated, while the other active management servers continue to service requests as distributed by the Big-IP load balancer.

The load balancer automatically ignores the non-reachable IP (discovered to be so by its own monitors, which checks the pool members on an ongoing basis, at predetermined intervals). So, failure of any of the existing management server instances simply results in the load balancer directing all subsequent service requests to the active surviving instances. When the Big IP monitor detects that the node is back on line, the node or service is automatically added back into the pool.

Software load balancing could alternatively be used, instead of hardware load balancing. This is a simple solution that uses software, such as network domain names, to route requests to the three management servers. The hardware solution is more expensive, but it is recommended since it is a more powerful solution. A hardware load balancer responsible for load balancing as well as failover capabilities should form an integral part of the total architecture solution, making the solution much more robust and flexible.

To manage the Big IP load balancers, internal IPs must be assigned to both the primary and the standby load balancers, and a floating IP address must be assigned which points to either the primary or standby load balancer depending on which balancer is active. You would then manage the load balancer via the floating IP using the URL as listed in the table below. This is the Big IP management utility or Web console. Login to this console using the Admin password or the Support password. (New users can be created in the Big IP web console with read-only rights if require.)

The Big IP root password is used for logging in at the Linux level using SSH. The balancer runs Linux but with a reduced command set shell. This is the command line interface (CLI) of Big IP. Commands are slightly different from normal Linux, for eg. in the CLI, the command "bigtop" is used to monitor the load balancer.
The internal IPs and Floating IP are illustrated in the following table (each IP address is shown as nnn.nnn.nnn.nn but is implicitly unique):


Hostname     Ip Address Description Big Ip Management URL
GridBal001 nnn.nnn.nnn.nn Unit 1 IP Address https:///bigipgui/bigconf.cgi
GridBal002 nnn.nnn.nnn.nn Unit 2 IP Address https:///bigipgui/bigconf.cgi
GridBal003 nnn.nnn.nnn.nn Floating IP Address https:///bigipgui/bigconf.cgi

Of the two load balancer units GridBal002 and GridBal002, any one unit could be active (actually handling the load balancing). Typically the two units will have 3 addresses associated with them: Unit 1 IP, Unit 2 IP, Floating IP. The Floating IP is a shared IP address and will only "exist" on the unit that is active at that time.

The other servers in the Grid Control configuration are illustrated by the following table:


Hostname Ip Address Description
GridMgt001     nnn.nnn.nnn.nn Management Server One (OMS 1)
GridMgt002     nnn.nnn.nnn.nn Management Server Two (OMS 2)
GridMgt003     nnn.nnn.nnn.nn Management Server Three (OMS 3)
GridMgt100     nnn.nnn.nnn.nn Virtual Management Server (Virtual OMS)
GridDb001     nnn.nnn.nnn.nn Database Server One (DBS 1) (Primary or RAC node)
GridDb002     nnn.nnn.nnn.nn Database Server Two (DBS 2) (Standby or RAC node)

For the purposes of load balancing, Big IP uses the concepts of virtual servers, pools, associated nodes (members) and rules to guide the load balancing. A virtual OMS server is set up at the Big IP level with its own IP address, this in turn points to a pool of Oracle management servers with their own IP addresses. Therefore the outside world has merely to point to the virtual OMS server's IP address or domain name, for both Grid Console logons or Agent uploads from multiple targets. The pool of Oracle Management servers is set up using the IP address:port combination, which means you can have one pool set up for Grid Console logons, and another pool set up for Agent uploads to the OMS.

Two new pools were created, EMAgentUploads and EMConsoles. Each pool has the three OMS nodes (the 3 active ones; however you could add a node which is still being setup and keep it as "forced down" in Big IP so it wont be monitored). The difference between the pools is at the port level. The pool EMAgentUploads is using port 4889 for Agent uploads, and the pool EMConsoles is using port 7777 for console access (7777 is the default port for Oracle Web Cache).

At the pool level, Big IP also allows you to define the persistence (stickiness) should subsequent service requests be routed to the same pool member or not. While Grid Console logons do not require stickiness (we do not care if the console uses a different OMS each time the DBA connects), it was decided that agent uploads could benefit from this stickiness. The pools were modified accordingly and "simple persistence" was set up for the agent uploads pool, but none for the console logons pool.

Two new Virtual OMS servers were created, the first using port 4889 for agent uploads using the EMAgentUploads pool, and the second using port 7777 for the Web Cache EM Console using the EMConsoles pool. Both virtual servers are using the same reserved IP address (but the ports are different).

Big IP Monitors that continuously inspect the status of pool members can also be set up. One such monitor EMMon was setup using the send string of "GET /em/upload" and the receive rule of "Http XML File receiver" which was as per the Enterprise Manager Advanced Configuration Guide.

Now, when the corporate network alias "gridcentral.in.mycompany.com" is switched to point to the virtual OMS server GridMgt100, the Big IP load balancer starts being used by production.

A point to note is that the initial changes, seen as successful at the Big IP management console, were not effective at the URL level (the URLs didn't work) until the Big IP was failed over to its standby and back again. Any configuration changes performed on the active load balancer should be propagated to the standby load balancer. This is done by the Big-IP configuration utility, go to Redundant Properties and click on Synchronize Configuration. This makes the standby balancer configuration to be the same as the active, including all pools, virtual servers, and rules, so the standby will be ready to take over the load balancing in the event of a failover.

Another notable point is that when changing the admin password, because the admin user is configured as the configsync user, you must change the password to match on the peer controller in order for configsync to work.

It is also possible to manually fail over. Before any failover to the standby Big IP, it is recommended to mirror all connections. However, be aware that this setting has a CPU performance hit. This is selected under the properties of Virtual server ..Mirror connections.
It was noted that a management server had been installed on the Grid Control Repository server during the initial install. Since the management server function has been separated from the repository function in this architecture, it is not recommended to use the extra management server that has been installed on the repository server. Simply dedicate that server only for the repository. For this purpose, only the three stand-alone management servers were placed in the Big IP load balancer pools.

The extra management server is a Java process that runs on the repository server and takes up memory and processing power, so it may be a good idea to use opmnctl on this server and shutdown the management server (OC4J_EM). Or, if Unix reboot scripts are being written that startup the OMS, Agent, and Database on the servers whenever there is a reboot, simply leave out starting the OMS in the case of the repository server. Just start the Listener, the Database, and then the Agent. On the other management servers, start the OMS and the Agent.

 

   

 Copyright © 1996 -2011 by Burleson Enterprises. All rights reserved.


Oracle® is the registered trademark of Oracle Corporation. SQL Server® is the registered trademark of Microsoft Corporation. 
Many of the designations used by computer vendors to distinguish their products are claimed as Trademarks