Introduction
Oracle Real Application Clusters (RAC) Multi-node Cluster is a database architecture that runs a single Oracle database across multiple servers (nodes), enhancing availability, scalability, and performance by distributing the workload and ensuring continuous operation even if one node fails.
Key Features
- High Availability (HA): Continuous operation by eliminating single points of failure; nodes continue to operate if one fails.
- Scalability: Add more nodes to handle increased users and transactions.
- Load Balancing: Even distribution of workloads across nodes, optimizing performance.
- Fault Tolerance: Redundancy ensures database availability despite node failures.
- Improved Performance: Multiple nodes handle more transactions and queries.
Components of Oracle RAC Multi-Node Cluster
- Cluster Nodes: Servers running an instance of the Oracle Database.
- Oracle Clusterware: Software for clustering services, including crsctl and srvctl.
- Shared Storage: Nodes share access to storage via SAN, NAS, or Oracle ASM.
- Interconnect: Private network for internode communication.
- Oracle ASM: Simplifies storage management, providing striping and mirroring.
- Global Resource Directory (GRD): Tracks data blocks and resources across instances.
Oracle Data Guard in RAC
Oracle Data Guard enhances RAC’s high availability, data protection, and disaster recovery by maintaining synchronized standby databases.
Key Features
- Disaster Recovery: Standby databases in different locations for site-level recovery.
- Data Protection: Continuous application of redo logs ensures data consistency.
- High Availability: Handles node-level and site-level failures for robust availability.
Key Benefits
- Unified Device Discovery: Provides a comprehensive view of all elements in an Oracle RAC Database with Data Guard Multi-Node Cluster, including their relationships.
- Proactive Device Monitoring: Collects metric values over time and sends alerts to the appropriate team when thresholds are breached or unexpected behavior occurs, ensuring minimal or zero downtime.
- Job Scheduling Metrics: Offers detailed metrics on job scheduling times and statuses.
- Concern Alerts: Generates alerts for each metric to notify administrators of any resource issues promptly.
Supported Target Versions
The application is validated on Oracle Database 19c Enterprise Edition Release 19.0.0.0.0.
Prerequisites
OpsRamp Classic Gateway (Linux) 15.0.0 and above.
OpsRamp Nextgen Gateway 15.0.0 and above Note: OpsRamp recommends using the latest Gateway version for full coverage of recent bug fixes, enhancements, etc.
For Bash CLI cmdlets the following are prerequisites:
- SSH User (Prefer - Oracle user) should be able to execute bash commands and listener related commands (like crsctl, srvctl..etc).
Users can establish db connection from gateway to oracle scan name and local listeners as well.
Oracle authorization permissions:
For monitoring some metrics, we are using JDBC. For JDBC connections we are supporting Database authentication.
This utilizes CLI commands such as crsctl, srvctl, and olsnodes..etc for monitoring and discovery. Additionally, we are not using .oraenv to set the Oracle environment; instead, we configure the Oracle environment variables in .bashrc.
Please find the below screenshot having oracle environment configuration in .bashrc file
Please check below points in gateway:
- ping
<scan name>
- scan hostname / scan ip address based on what is provide in the configuration. (If you are using scan hostname, ensure that the hostname is resolved by checking proper dns is configured on the gateway.) - telnet
<scan name>
1521 - connect to gcli using “gcli” cmd
- execute
db oracledb <scan_name> <username> <password> <db_port> <db_name>:servicename 15000 10000 insecure Yes "SELECT INST_ID, INSTANCE_NUMBER, INSTANCE_NAME, HOST_NAME FROM gv$instance"
Note: While establishing connection on the scan hostname / Ipaddress it is internally redirected to the local listeners, ensure that the end device (all RAC nodes) accepts inbound connections on all these IpAddresses.
Privileges - The provided database user should have the SELECT ANY TABLE privilege.
Roles - The provided database user should have the CONNECT and SELECT_CATALOG_ROLE
Hierarchy of Oracle RAC resource
• Oracle RAC
- Oracle Node
- Oracle DB Instance
- Oracle Disk Group
- Oracle Disk
Supported Metrics
Metric Name | Display Name | Metric Category | Unit | Application Version | Description |
---|---|---|---|---|---|
oracle_cluster_NodeState | Oracle Cluster Node State | Availability | 1.0.0 | State of all nodes of the cluster such as Active or InActive. Possible values Active(0),INACTIVE(1) | |
oracle_cluster_online_NodeCount | Oracle Cluster Online Nodes Count | Availability | count | 1.0.0 | Count of nodes which are in Online state |
oracle_cluster_InstanceCount | Oracle Cluster Instances Count | Availability | count | 1.0.0 | Count of Total Database Instances |
oracle_cluster_active_InstanceCount | Oracle Cluster Active Instances Count | Availability | count | 1.0.0 | Count of Oracle Active Database Instance |
oracle_cluster_check_db_Alive | Oracle Cluster Check DB Alive | Availability | 1.0.0 | Aliveness status of the Oracle Database Instance | |
oracle_cluster_ServicesStatus | Oracle Cluster Services Status | Availability | 1.0.0 | Status of the available Oracle RAC Cluster services. Possible values are ONLINE(0), OFFLINE(1), INTERMEDIATE(2), UNKNOWN(3) | |
oracle_cluster_sessions_Utilization | Oracle Cluster Sessions Utilization | Usage | % | 1.0.0 | To monitoring db sessions utilization |
oracle_cluster_executions_PerTxn | Oracle Cluster Executions Per Transaction | Performance | 1.0.0 | The average amount of time per execution | |
oracle_cluster_executions_PerSec | Oracle Cluster Executions Per Sec | Performance | 1.0.0 | The average transactions per second | |
oracle_cluster_cpu_UsagePerSec | Oracle Cluster CPU Usage Per Sec | Usage | 1.0.0 | Represents the CPU usage per second by the database processes, measured in hundredths of a second. | |
oracle_cluster_cpu_UsagePerTxn | Oracle Cluster CPU Usage Per Transaction | Usage | 1.0.0 | The amount of CPU usage per transaction for the specific task or session. | |
oracle_cluster_database_cpu_time_Ratio | Oracle Cluster Database CPU Time Ratio | Usage | 1.0.0 | The Database CPU Time Ratio is of limited value as a tuning tool.The Database CPU Time Ratio is computed by dividing the amount of CPU used in the database by the amount of total database time. Total database time is the time spent by the database on user-level calls . | |
oracle_cluster_blocking_SessionCount | Oracle Cluster Blocking Session Count | Usage | count | 1.0.0 | To monitor the count of blocking sessions |
oracle_cluster_session_limit_Usage | Oracle Cluster Session Limit Usage | Usage | % | 1.0.0 | To monitor the session limit usage |
oracle_cluster_inactive_Sessions | Oracle Cluster Inactive Sessions | Availability | count | 1.0.0 | To monitors the inactive sessions. |
oracle_cluster_active_Sessions | Oracle Cluster Active Sessions | Availability | count | 1.0.0 | To monitors the active sessions |
oracle_cluster_system_waits_PerClass | Oracle Cluster System Waits PerClass | Performance | s | 1.0.0 | To monitor oracle system class waits (The system-level waits represent a high level summary of all session-level waits).This metric evaluated using this formula avg of waits = sum(time_waited)/sum(total_waits) |
oracle_cluster_long_running_Queries | Oracle Cluster Long Running Queries | Performance | count | 1.0.0 | Validates the how many long running queries on particular database. |
oracle_cluster_bufferCacheHitRatio_Pct | Oracle Cluster BufferCacheHitRatio Percentage | Usage | % | 1.0.0 | To monitoring Buffer cache hit ratio value in Percentage |
oracle_cluster_sequence_Utilization | Oracle Cluster Sequence Utilization | Usage | % | 1.0.0 | To monitoring db sessions usage in Pct |
oracle_cluster_temp_tableSpace_Utilization | Oracle Cluster Temp Tablespace Utilization | Usage | % | 1.0.0 | To monitor Temp tableSpace space usage in Pct |
oracle_cluster_database_cdb_pdb_tableSpace_Utilization | Oracle Cluster Database CDB PDB Tablespace Utilization | Usage | % | 1.0.0 | To monitor the tableSpace utilization of the Oracle CDB & PDB. |
oracle_cluster_database_cdb_pdb_tableSpace_SizeUsed | Oracle Cluster Database CDB and PDB Tablespace Size Used | Usage | MB | 1.0.0 | To monitor the tableSpace used size of the Oracle CDB and PDB |
oracle_cluster_database_cdb_pdb_tableSpace_SizeFree | Oracle Cluster Database CDB and PDB Tablespace Size Free | Capacity | MB | 1.0.0 | To monitor the tableSpace free size of the Oracle CDB and PDB |
oracle_cluster_process_Utilization | Oracle Cluster Processes Used pct | Usage | % | 1.0.0 | The percentage of elapsed time that the processor spends to execute a non-Idle thread(This doesn't includes CPU steal time) |
oracle_cluster_dataguard_Status | Oracle Cluster Dataguard Status | Availability | 1.0.0 | To indicates the status of the Oracle Data Guard. Possible values are ALL(0) - Indicates all users other than SYS are prevented from making changes to any data in the database,STANDBY(1) - Indicates all users other than SYS are prevented from making changes to any database object being maintained by logical standby,NONE(2) - Indicates normal security for all data in the database. | |
oracle_cluster_dataguard_BrokerStatus | Oracle Cluster Dataguard Broker Status | Availability | 1.0.0 | To indicates the status of the Oracle Data Guard Broker. Possible value areENABLED(0) - Database is part of a broker configuration and broker management of the database is enabled,DISABLED(1) - Database is part of a broker configuration and broker management of the database is disabled. This value is displayed if the user disabled broker management of the database or configuration, or if broker management was disabled due to a role change (for example, the old primary was disabled after a failover operation). | |
oracle_cluster_dataguard_fs_FailoverMode | Oracle Cluster Dataguard Fast-Start Failover Mode | Availability | 1.0.0 | To indicates the status of the current fast-start failover mode. Possible values are: DISABLED(0) - Fast-start failover is disabled.OBSERVE-ONLY(1) - Fast-start failover is enabled in test drive mode.ZERO DATA LOSS(2) - Fast-start failover is enabled and a fast-start failover cannot incur any data loss.POTENTIAL DATA LOSS(3) - Fast-start failover is enabled and a fast-start failover can incur data loss within FastStartFailoverLagLimit seconds. |
Metric Name | Display Name | Metric Category | Unit | Application Version | Description |
---|---|---|---|---|---|
oracle_node_Uptime | Oracle Node Uptime | Availability | m | 1.0.0 | Time lapsed since last reboot in minutes |
oracle_node_cpu_Load | Oracle Node CPU Load | Usage | 1.0.0 | Monitors the system's last 1min, 5min and 15min load. It sends per cpu core load average. | |
oracle_node_cpu_Utilization | Oracle Node CPU Utilization | Usage | % | 1.0.0 | The percentage of elapsed time that the processor spends to execute a non-Idle thread(This doesn't includes CPU steal time) |
oracle_node_memory_UsedSpace | Oracle Node Memory Used Space | Usage | GB | 1.0.0 | Physical and virtual memory usage in GB |
oracle_node_memory_Utilization | Oracle Node Memory Utilization | Usage | % | 1.0.0 | Physical and virtual memory usage in GB |
oracle_node_disk_usage_UsedSpace | Oracle Node Disk Usage UsedSpace | Usage | GB | 1.0.0 | Monitors disk used space in GB |
oracle_node_disk_usage_Utilization | Oracle Node Disk Utilization | Usage | % | 1.0.0 | To monitor node disk utilization |
oracle_node_disk_inode_Utilization | Oracle Node Disk Inode Utilization | Usage | % | 1.0.0 | This monitor is to collect DISK Inode metrics for all physical disks in a server. |
Metric Name | Display Name | Metric Category | Unit | Application Version | Description |
---|---|---|---|---|---|
oracle_dbInstance_Status | Oracle DB Instance Status | Availability | 1.0.0 | Status of the Oracle Cluster Database Instance.Possible values are STARTED(0),MOUNTED(1),OPEN(2),OPEN MIGRATE(3) | |
oracle_dbInstance_Uptime | Oracle DB Instance Uptime | Availability | Days | 1.0.0 | Uptime (In Days) of the Oracle Cluster Database Instance |
Metric Name | Display Name | Metric Category | Unit | Application Version | Description |
---|---|---|---|---|---|
oracle_diskGroup_State | Oracle ASM DiskGroup State | Availability | 1.0.0 | To monitor the states of the each ASM disk group.Possible values are CONNECTED(0),BROKEN(1),DISMOUNTED(2),MOUNTED(3),QUIESCING(4),RESTRICTED(5),UNKNOWN(6) | |
oracle_diskGroup_UsableFileMB | Oracle ASM Disk Group Usable File Size In MB | Usage | MB | 1.0.0 | To monitor the amount of free space that can be safely utilized. |
oracle_diskGroup_RequiredMirrorFreeMB | Oracle ASM Disk Group Required Mirror Free Size In MB | Usage | MB | 1.0.0 | To monitor the amount of space that is required to be available in a given disk group in order to restore redundancy after one or more disk failures |
oracle_diskGroup_Utilization | Oracle ASM DiskGroup Space Utilization | Usage | % | 1.0.0 | To monitor ASM DiskGroup Space Utilization |
Metric Name | Display Name | Metric Category | Unit | Application Version | Description |
---|---|---|---|---|---|
oracle_disk_ModeStatus | Oracle ASM Disk Mode Status | Availability | 1.0.0 | To monitor ASM DATA diskgroup status..Possible values are ONLINE(0),OFFLINE(1),SYNCING(2) | |
oracle_disk_State | Oracle ASM Disk State | Availability | 1.0.0 | To monitor the state of the each ASM disk.Possible values are NORMAL(0),ADDING(1),DROPPING(2),HUNG(3),FORCING(4),UNKNOWN(5) | |
oracle_disk_Utilization | Oracle ASM Disk Utilization | Performance | % | 1.0.0 | To monitor the utilization of the each ASM Disk |
oracle_disk_Reads | Oracle ASM Disk Reads | Usage | count | 1.0.0 | To monitor the Total number of I/O read requests for the disk |
oracle_disk_Writes | Oracle ASM Disk Writes | Usage | count | 1.0.0 | To monitor the Total number of I/O write requests for the disk |
oracle_disk_ReadErrors | Oracle ASM Disk Read Errors | Usage | count | 1.0.0 | To monitor the Total number of failed I/O read requests for the disk |
oracle_disk_WriteErrors | Oracle ASM Disk Write Errors | Usage | count | 1.0.0 | To monitor the Total number of failed I/O write requests for the disk |
oracle_disk_ReadTime | Oracle ASM Disk Read Time | Usage | s | 1.0.0 | To monitor the Total I/O time (in seconds) for read requests for the disk if the TIMED_STATISTICS initialization parameter is set to true (0 if set to false) |
oracle_disk_WriteTime | Oracle ASM Disk Write Time | Usage | s | 1.0.0 | To monitor the Total I/O time (in seconds) for write requests for the disk if the TIMED_STATISTICS initialization parameter is set to true (0 if set to false) |
Default Monitoring Configurations
Oracle RAC Multi Node Cluster application has default Global Device Management Policies, Global Templates, Global Monitors and Global Metrics in OpsRamp. You can customize these default monitoring configurations as per your business requirement by cloning respective Global Templates and Global Device Management Policies. It is recommended to clone them before installing the application to avoid noise alerts and data.
Default Global Device Management Policies
You can find the Device Management Policy for each Native Type at Setup > Resources > Device Management Policies. Search with suggested names in global scope:
{appName nativeType - version}
Ex: oracle-cluster Oracle RAC - 1 (i.e, appName = oracle-cluster, nativeType = Oracle RAC, version = 1)
Default Global Templates
You can find the Global Templates for each Native Type at Setup > Monitoring > Templates. Search with suggested names in global scope. Each template adheres to the following naming convention:
{appName nativeType 'Template' - version}
Ex: oracle-cluster Oracle RAC Template - 1 (i.e, appName = oracle-cluster, nativeType = Oracle RAC, version = 1)
Default Global Monitors
You can find the Global Monitors for each Native Type at Setup > Monitoring > Monitors. Search with suggested names in global scope. Each Monitors adheres to the following naming convention:
{monitorKey appName nativeType - version}
Ex: Oracle RAC Monitor oracle-cluster Oracle RAC 1(i.e, monitorKey = Oracle RAC Monitor, appName = oracle-cluster, nativeType = Oracle RAC, version = 1)
Configure and Install the Oracle Cluster Integration
- To select your client, navigate to All Clients, and click the Client/Partner dropdown menu.
Note: You may either type your client’s name in the search bar or select your client from the list. - Navigate to Setup > Account. The Account Details screen is displayed.
- Click Integrations. The Installed Integrations screen is displayed with all the installed applications.
Note: If you do not have any installed applications, you will be navigated to the Available Integrations and Apps page with all the available applications along with the newly created application with the version. - Click + ADD on the Installed Integrations page.
Note: Search for the integration either by entering the name of the integration in the search bar or by selecting the category of the integration from the All Categories dropdown list. - Click ADD in the Oracle Cluster application.
- In the Configuration screen, click + ADD. The Add Configuration screen appears.
- Enter the following BASIC INFORMATION:
Field Name | Description | Field Type |
---|---|---|
Name | Enter the name for the configuration. | String |
Oracle RAC Scan Hostname/ IP Address | Enter the Oracle RAC Scan Hostname/ IP Address of the Oracle Cluster. It should be accessible from Gateway. | String |
SSH Port | SSH Port Note: By default 22 is the SSH port value. | Integer |
Oracle RAC SSH Credentials | Select the credential associated with your Oracle Cluster account. If you want to use the existing credentials, select them from the Select Credentials dropdown. Else, click + Add to create credentials. The ADD CREDENTIAL window is displayed. Enter the following information.
| Dropdown |
Database Port | Database Port Note: By default 1521 is the Database port value. | Integer |
Oracle RAC Database Credentials | Select the credential associated with your Oracle Cluster account. If you want to use the existing credentials, select them from the Select Credentials dropdown. Else, click + Add to create credentials. The ADD CREDENTIAL window is displayed. Enter the following information.
| Dropdown |
Database Name | Database Name | Integer |
App Failure Notifications | When selected, you will be notified in case of an application failure such as Connectivity Exception, Authentication Exception. | Checkbox |
- CUSTOM ATTRIBUTES: Custom attributes are the user-defined data fields or properties that can be added to the preexisting attributes to configure the integration.
Field Name | Description | Field Type |
---|---|---|
Custom Attribute | Select the custom attribute from the dropdown. You can add attributes by clicking the Add icon (+). | Dropdown |
Value | Select the value from the dropdown. | Dropdown |
Note: The custom attribute that you add here will be assigned to all the resources that are created by the integration. You can add a maximum of five custom attributes (key and value pair).
- In the RESOURCE TYPE section, select:
- ALL: All the existing and future resources will be discovered.
- SELECT: You can select one or multiple resources to be discovered.
- In the DISCOVERY SCHEDULE section, select recurrence pattern to add one of the following patterns:
- Minutes
- Hourly
- Daily
- Weekly
- Monthly
- Click ADD.
Now the configuration is saved and displayed on the configurations page after you save it.Note: From the same page, you may Edit and Remove the created configuration.
12. Under the ADVANCED SETTINGS, Select the Bypass Resource Reconciliation option, if you wish to bypass resource reconciliation when encountering the same resources discovered by multiple applications.
Note: If two different applications provide identical discovery attributes, two separate resources will be generated with those respective attributes from the individual discoveries.
13. Click NEXT.
14. (Optional) Click +ADD to create a new collector. You can either use the pre-populated name or give the name to your collector.
15. Select an existing registered profile.
- Click FINISH.
The integration is installed and displayed on the INSTALLED INTEGRATION page. Use the search field to find the installed integration.
Modify Oracle Cluster Integration
See Modify an Installed Integration or Application article.
Note: Select Oracle Cluster.
Discover Resources in Oracle Cluster Integration
- Navigate to Infrastructure > Search > DATABASES > Oracle Cluster. The Oracle Cluster page is displayed.
- Select the application on the Oracle Cluster page
- The RESOURCE page appears from the right.
- Click the ellipsis (…) on the top right and select View Details.
- Navigate to the Attributes tab to view the discovery details.
View resource metrics
To confirm Oracle Cluster Cluster monitoring, review the following:
- Metric graphs: A graph is plotted for each metric that is enabled in the configuration.
- Alerts: Alerts are generated for metrics that are configured as defined for integration.
- Click the Metrics tab to view the metric details for Oracle Cluster.
Supported Alert Custom Macros
Customize the alert subject and description with the following macros so that it can generate alerts accordingly.
Supported macros keys:
${resource.name} | ${resource.ip} | ${resource.mac} |
${resource.aliasname} | ${resource.os} | ${resource.type} |
${resource.dnsname} | ${resource.alternateip} | ${resource.make} |
${resource.model} | ${resource.serialnumber} | ${resource.systemId} |
${Custome Attributes in the resource} | ${parent.resource.name} |
Resource Filter Input keys
Oracle Cluster application Resources are filtered and discovered based on below keys.
Note: You can filter the resources with the discoverable keys only.
The following tabs represent the Resource Type of Oracle Cluster
Click here to view the Supported Input Keys
Resource Type | Keys |
---|---|
All Types | |
resourceName | |
hostName | |
aliasName | |
dnsName | |
ipAddress | |
macAddress | |
os | |
make | |
model | |
serialNumber | |
Oracle RAC | Path |
Total Disks | |
Total Nodes | |
Version | |
Oracle Node | Architecture |
Icon name | |
Kernel | |
Machine ID | |
Oracle DB Instance | DB Type |
Version | |
Oracle Disk Group | Compatibility |
Database Compatibility | |
Disk Group Number | |
Offline Disks | |
Type | |
Oracle Disk | Path |
Risks, Limitations & Assumptions
- The integration can manage critical/recovery failure alerts for the following two scenarios when the user activates App Failure Notifications in the settings:
- Connectivity Exception
- Authentication Exception
- Oracle Cluster will send any duplicate/repeat failure alert notification for every 6 hours.
- Metrics can be used to monitor Oracle resources and can generate alerts based on the threshold values.
- We have provided the provision to give Cluster Ip Address OR HostName in configuration, but hostName provision will work only if the host name resolution works.
- Oracle Cluster supports only Classic Gateway and NextGen Gateway. Not supported with Cluster Gateway.
- No support of showing activity logs.
- The Template Applied Time will only be displayed if the collector profile (Classic and NextGen Gateway) is version 18.1.0 or higher.
- Component level thresholds can be configured on each resource level.
- Latest snapshot metric support from Gateway 14.0.0.
Troubleshooting
Before troubleshooting, ensure all prerequisites prerequisites are met.
If Oracle Cluster integrations fail to discover or monitor, troubleshoot using the following steps:
- Check if any alerts have been generation on the cluster or in vprobe.
- If there is an error or alert related to the end device connectivity or authentication, try checking the reachability of the end device from the gateway with the following commands:
- to ping the scan hostname provided in the configuration:
{ping <scan name>}
- to try telnet:
{telnet <scan name> <Port>}
- to try ssh to the end device:
{ssh <username>@<node IP Address>}
- to connect to the gcli:
{gcli}
{db oracledb <scan_name> <username> <password> <db_port> <db_name>:servicename 15000 10000 insecure Yes "SELECT INST_ID, INSTANCE_NUMBER, INSTANCE_NAME, HOST_NAME FROM gv$instance"}
- to try SSH to nodes:
{ssh <user name>@<node IPAddress>}
- to ping the scan hostname provided in the configuration:
Version History
Application Version | Bug fixes / Enhancements |
---|---|
1.0.2 | Enhancements related to the latest snapshot, Activity Log and DebugHandler changes. |
1.0.1 | Changes related to resource discovery. |
1.0.0 | Initial support for Oracle Cluster application. |