Agents and Gateways

OpsRamp deploys distributed platform components to discover, monitor, and manage infrastructure resources.

  • Agents: Executable applications that run on managed resources, both on-premise and cloud, to monitor servers and applications.
  • Gateways: Virtual appliances that facilitate secure communication, monitor non-server devices (network, storage), and provide temporary data storage during connectivity failures.

Discovery

Discovery is the process of identifying and cataloging enterprise resources before they can be monitored and managed.

  • Automatically detects deployed resources.
  • Builds a dynamic model to interpret and present the state of the environment.
  • Enables monitoring and metric collection for effective resource management.

Monitoring

Monitoring ensures resource availability and performance assessment by collecting, storing, and evaluating key metrics.

  • Measures uptime, response times, and utilization of servers, network devices, applications, and cloud services.
  • Supports both agent-based and agentless monitoring approaches.

Performance

Performance monitoring ensures that resources operate within user-defined parameters.

  • Identifies fault conditions through threshold breaches.
  • Provides insights into system health to prevent service disruptions.

Availability

Availability indicates whether a resource is in an up/down state, impacting its ability to provide services.

  • Metrics evaluation or simple acknowledgments determine availability.
  • Ensures high service uptime and performance efficiency.

Metric Thresholds

OpsRamp supports two types of thresholds to assess resource health:

  • Static Threshold: A fixed value representing fault conditions when exceeded.
  • Change-based Threshold: A computed value that detects unexpected variations, especially for dynamic environments.

Dashboards

Dashboards consolidate and visualize collected metrics using customizable widgets.

  • Partner-scoped dashboards: Visible only to users defined for a partner.
  • Client-scoped dashboards: Visible only to users within a specific client organization.

Service Maps

Service maps organize resources into a hierarchical structure, associating resource health with user and business impact.

  • Helps analyze dependencies and understand infrastructure relationships.
  • Provides a logical view of resource groups and services.

Topology Maps

Topology maps visualize relationships determined during discovery.

  • Nodes represent managed resources, while edges depict connections.
  • Useful for impact analysis and infrastructure exploration.

Access Controls

OpsRamp provides mechanisms for authenticating and authorizing user access.

  • Authentication Methods:
    • Native user management and authentication
    • Single Sign-On (SSO)
    • Two-factor authentication
  • Role-Based Access Control (RBAC) to grant permissions based on user roles.
    • Controls resource access (e.g., managing only network resources).
    • Restricts credential access (e.g., non-administrator credentials on servers).
    • Defines allowed actions (e.g., remote console access restrictions).
    • Limits access by location, domain, or organizational policies.

Tenancy Model

OpsRamp follows a multi-tenant architecture to logically segregate resources and operations.

  • Partner (Master Tenant): Associated with an account, manages multiple clients.
  • Client (Sub-Tenant): Managed independently with specific policies and user access.
  • User Privileges:
    • User: An individual account within a tenant.
    • User Group: A collection of users with shared access levels.
    • Permission: Defines actions users can perform.
    • Role: A combination of users, groups, and permissions applied to resources.

Automation

OpsRamp enables automation to streamline remediation and operational efficiency.

  • Discrete Task Automation: Executes a single task on multiple servers.
  • Process Automation Workflow: Sequences multiple tasks across resources for complex workflows.

Event Management

Events represent significant operational activities occurring on monitored resources. Common events include:

  • Hardware failures
  • CPU utilization threshold breaches
  • Application crashes
  • Configuration changes

Event Detection Methods:

  • Native instrumentation
  • Self-diagnostics
  • Third-party integrations

Event Management Lifecycle:

  1. Ingestion
  2. Interpretation
  3. Correlation
  4. First Response
  5. Escalation

Correlation and Alerting

OpsRamp correlates related alerts to reduce noise and improve incident response.

  • Deduplication: Merges repeated alerts from unresolved issues (e.g., SNMP traps continuously sent by network devices).
  • Inferencing: Identifies different alerts with a common root cause across IT resources.

First Response and Suppression

  • Uses AI-driven seasonal patterns and historical trends to suppress non-critical alerts.
  • Applies machine learning to filter alerts based on recurring patterns.