ONES Rule Engine
Overview
In data center operations, a rule engine with alerts for various metrics is essential for proactive monitoring and management of critical components and services. Let's discuss the need for rule engine alerts for specific metrics in a data centre environment
CPU and Memory Utilisation
Fan and PSU LED status
SSD Memory Utilization, Health and Temperature Status
Traffic Bandwidth
ASIC Routes
Health Services
Device Down alerts
Interface Flap Alerts
Traffic Errors and Discard Counters
PFC Counters
Device Queue Counters
Rule engine alerts ensure efficient resource utilization, timely troubleshooting, early detection of potential issues, and overall operational stability within the data centre environment.
Notification
ONES-App is capable of triggering breached threshold values to
Slack Channel
Zendesk Support ticket
Rules are categorized based on the metric hierarchy
Device Level
Interface Level
List of all the Metrics Supported by Rule Engine with possible units and measured value a user can use
Hierarchy | Metrics | Unit | Measure | Value |
Device | CPU Utilization | Percentage (%) | AVG/MIN/MAX | 0/100 |
Device | Memory Utilization | Percentage (%) | AVG/MIN/MAX | 0/100 |
Device | Failed Fans | Count () | MIN/MAX | Count |
Device | Failed PSU | Count () | MIN/MAX | Count |
Device | CPU Core Temperature | Celsius () | AVG/MIN/MAX | Celsius |
Device | PSU Temperature | Celsius () | AVG/MIN/MAX | Celsius |
Device | FAN Speed | Percentage (%) | AVG/MIN/MAX | 0/100 |
Device | ASIC IPv4 Routes Utilization | Percentage (%) | AVG/MIN/MAX | 0/100 |
Device | ASIC IPv6 Routes Utilization | Percentage (%) | AVG/MIN/MAX | 0/100 |
Device | BGP Nbrs Operationally Down | Count () | AVG/MIN/MAX | Count of Nbrs |
Device | FRR Container CPU Utilization | Percentage (%) | AVG/MIN/MAX | 0/100 |
Device | Syncd Container CPU Utilization | Percentage (%) | AVG/MIN/MAX | 0/100 |
Device | Device Down | NA | NA | NA |
Device | Queue Counter | Count() | AVG/MIN/MAX | Count |
Device | SSD Health | Percentage(%) | Percentage(%) | 0/100 |
Device | SSD Temperature | Celsius () | AVG/MIN/MAX | Celsius |
Device | SSD Memory | Percentage(%) | Percentage(%) | 0/100 |
Interface | Int Flap | NA | NA | NA |
Interface | PFC Counters | Count () | AVG/MIN/MAX | Count |
Interfaec | Queue Counters | Count () | AVG/MIN/MAX | Count |
Interface | TX Utilization | Percentage (%) | AVG/MIN/MAX | 0/100 |
Interface | RX Utilization | Percentage (%) | AVG/MIN/MAX | 0/100 |
Interface | In Errors | Count () | AVG/MIN/MAX | User defined |
Interface | Out Errors | Count () | AVG/MIN/MAX | User defined |
Interface | In Discards | Count () | AVG/MIN/MAX | User defined |
Interface | Out Discards | Count () | AVG/MIN/MAX | User defined |
Interface | Tranx TX Power | dBm | AVG/MIN/MAX | User defined |
Interface | Tranx Rx Power | dBm | AVG/MIN/MAX | User defined |
Interface | Tranx Temperature | Celscius () | AVG/MIN/MAX | User defined |
Interface | Tranx Voltage | Volts () | AVG/MIN/MAX | User defined |
Last updated