Agents and Gateways
OpsRamp deploys distributed platform components to discover, monitor, and manage infrastructure resources.
- Agents: Executable applications that run on managed resources, both on-premise and cloud, to monitor servers and applications.
- Gateways: Virtual appliances that facilitate secure communication, monitor non-server devices (network, storage), and provide temporary data storage during connectivity failures.
Discovery
Discovery is the process of identifying and cataloging enterprise resources before they can be monitored and managed.
- Automatically detects deployed resources.
- Builds a dynamic model to interpret and present the state of the environment.
- Enables monitoring and metric collection for effective resource management.
Monitoring
Monitoring ensures resource availability and performance assessment by collecting, storing, and evaluating key metrics.
- Measures uptime, response times, and utilization of servers, network devices, applications, and cloud services.
- Supports both agent-based and agentless monitoring approaches.
Performance
Performance monitoring ensures that resources operate within user-defined parameters.
- Identifies fault conditions through threshold breaches.
- Provides insights into system health to prevent service disruptions.
Availability
Availability indicates whether a resource is in an up/down state, impacting its ability to provide services.
- Metrics evaluation or simple acknowledgments determine availability.
- Ensures high service uptime and performance efficiency.
Metric Thresholds
OpsRamp supports two types of thresholds to assess resource health:
- Static Threshold: A fixed value representing fault conditions when exceeded.
- Change-based Threshold: A computed value that detects unexpected variations, especially for dynamic environments.
Dashboards
Dashboards consolidate and visualize collected metrics using customizable widgets.
- Partner-scoped dashboards: Visible only to users defined for a partner.
- Client-scoped dashboards: Visible only to users within a specific client organization.
Service Maps
Service maps organize resources into a hierarchical structure, associating resource health with user and business impact.
- Helps analyze dependencies and understand infrastructure relationships.
- Provides a logical view of resource groups and services.
Topology Maps
Topology maps visualize relationships determined during discovery.
- Nodes represent managed resources, while edges depict connections.
- Useful for impact analysis and infrastructure exploration.
Access Controls
OpsRamp provides mechanisms for authenticating and authorizing user access.
- Authentication Methods:
- Native user management and authentication
- Single Sign-On (SSO)
- Two-factor authentication
- Role-Based Access Control (RBAC) to grant permissions based on user roles.
- Controls resource access (e.g., managing only network resources).
- Restricts credential access (e.g., non-administrator credentials on servers).
- Defines allowed actions (e.g., remote console access restrictions).
- Limits access by location, domain, or organizational policies.
Tenancy Model
OpsRamp follows a multi-tenant architecture to logically segregate resources and operations.
- Partner (Master Tenant): Associated with an account, manages multiple clients.
- Client (Sub-Tenant): Managed independently with specific policies and user access.
- User Privileges:
- User: An individual account within a tenant.
- User Group: A collection of users with shared access levels.
- Permission: Defines actions users can perform.
- Role: A combination of users, groups, and permissions applied to resources.
Automation
OpsRamp enables automation to streamline remediation and operational efficiency.
- Discrete Task Automation: Executes a single task on multiple servers.
- Process Automation Workflow: Sequences multiple tasks across resources for complex workflows.
Event Management
Events represent significant operational activities occurring on monitored resources. Common events include:
- Hardware failures
- CPU utilization threshold breaches
- Application crashes
- Configuration changes
Event Detection Methods:
- Native instrumentation
- Self-diagnostics
- Third-party integrations
Event Management Lifecycle:
- Ingestion
- Interpretation
- Correlation
- First Response
- Escalation
Correlation and Alerting
OpsRamp correlates related alerts to reduce noise and improve incident response.
- Deduplication: Merges repeated alerts from unresolved issues (e.g., SNMP traps continuously sent by network devices).
- Inferencing: Identifies different alerts with a common root cause across IT resources.
First Response and Suppression
- Uses AI-driven seasonal patterns and historical trends to suppress non-critical alerts.
- Applies machine learning to filter alerts based on recurring patterns.