Core Concepts

Agents and Gateways

OpsRamp deploys distributed platform components to discover, monitor, and manage infrastructure resources.

Agents: Executable applications that run on managed resources, both on-premise and cloud, to monitor servers and applications.
Gateways: Virtual appliances that facilitate secure communication, monitor non-server devices (network, storage), and provide temporary data storage during connectivity failures.

Discovery

Discovery is the process of identifying and cataloging enterprise resources before they can be monitored and managed.

Automatically detects deployed resources.
Builds a dynamic model to interpret and present the state of the environment.
Enables monitoring and metric collection for effective resource management.

Monitoring

Monitoring ensures resource availability and performance assessment by collecting, storing, and evaluating key metrics.

Measures uptime, response times, and utilization of servers, network devices, applications, and cloud services.
Supports both agent-based and agentless monitoring approaches.

Performance

Performance monitoring ensures that resources operate within user-defined parameters.

Identifies fault conditions through threshold breaches.
Provides insights into system health to prevent service disruptions.

Availability

Availability indicates whether a resource is in an up/down state, impacting its ability to provide services.

Metrics evaluation or simple acknowledgments determine availability.
Ensures high service uptime and performance efficiency.

Metric Thresholds

OpsRamp supports two types of thresholds to assess resource health:

Static Threshold: A fixed value representing fault conditions when exceeded.
Change-based Threshold: A computed value that detects unexpected variations, especially for dynamic environments.

Dashboards

Dashboards consolidate and visualize collected metrics using customizable widgets.

Partner-scoped dashboards: Visible only to users defined for a partner.
Client-scoped dashboards: Visible only to users within a specific client organization.

Service Maps

Service maps organize resources into a hierarchical structure, associating resource health with user and business impact.

Helps analyze dependencies and understand infrastructure relationships.
Provides a logical view of resource groups and services.

Topology Maps

Topology maps visualize relationships determined during discovery.

Nodes represent managed resources, while edges depict connections.
Useful for impact analysis and infrastructure exploration.

Access Controls

OpsRamp provides mechanisms for authenticating and authorizing user access.

Authentication Methods:
- Native user management and authentication
- Single Sign-On (SSO)
- Two-factor authentication
Role-Based Access Control (RBAC) to grant permissions based on user roles.
- Controls resource access (e.g., managing only network resources).
- Restricts credential access (e.g., non-administrator credentials on servers).
- Defines allowed actions (e.g., remote console access restrictions).
- Limits access by location, domain, or organizational policies.

Tenancy Model

OpsRamp follows a multi-tenant architecture to logically segregate resources and operations.

Partner (Master Tenant): Associated with an account, manages multiple clients.
Client (Sub-Tenant): Managed independently with specific policies and user access.
User Privileges:
- User: An individual account within a tenant.
- User Group: A collection of users with shared access levels.
- Permission: Defines actions users can perform.
- Role: A combination of users, groups, and permissions applied to resources.

Automation

OpsRamp enables automation to streamline remediation and operational efficiency.

Discrete Task Automation: Executes a single task on multiple servers.
Process Automation Workflow: Sequences multiple tasks across resources for complex workflows.

Event Management

Events represent significant operational activities occurring on monitored resources. Common events include:

Hardware failures
CPU utilization threshold breaches
Application crashes
Configuration changes

Event Detection Methods:

Native instrumentation
Self-diagnostics
Third-party integrations

Event Management Lifecycle:

Ingestion
Interpretation
Correlation
First Response
Escalation

Correlation and Alerting

OpsRamp correlates related alerts to reduce noise and improve incident response.

Deduplication: Merges repeated alerts from unresolved issues (e.g., SNMP traps continuously sent by network devices).
Inferencing: Identifies different alerts with a common root cause across IT resources.

First Response and Suppression

Uses AI-driven seasonal patterns and historical trends to suppress non-critical alerts.
Applies machine learning to filter alerts based on recurring patterns.

Feedback