ONES Rule Engine
⚡ Overview
In data center operations, maintaining reliability and uptime requires more than just monitoring — it demands proactive detection and rapid response.
A Rule Engine plays a critical role by continuously tracking key performance metrics and triggering alerts when thresholds are breached. This ensures that operators can:
✅ Identify anomalies early before they escalate. ✅ Respond quickly to potential risks or outages. ✅ Safeguard critical components & services from failure. ✅ Automate escalation via integrations (Slack, Zendesk, ServiceNow).
💡 In essence, Rule Engine alerts act as the first line of defense, keeping the data center environment stable, resilient, and secure.
Device Based Rules
CPU and Memory Utilisation
CPU Core temp alerts
Fan and PSU LED status
SSD Memory Utilization, Health and Temperature Status
Traffic Bandwidth
ASIC Routes (IPv4 and IPv6)
Device and Docker Down alerts
BGP Neighbour Down alter
Component failure
Interface Flap Alerts
Traffic Errors and Discard Counters
PFC Counters
Device Queue Counters
Config Change alert
Docker CPU and Memory utilization(per service)
IP Change Alerts
Failed FAN's and PSU's
FAN Speed
MCLAG (Member/Peer/Session) Down Alerts
NTP Drift
PSU Temp Alerts
Unhealthy Devices Alerts
Interface Based Rules
Broadcast/Multicast Storm
PFC Rx/Tx Counters
Port Flap
Queue Transmit Counters
Traffic InDiscards
Traffic InErrors
Traffic OutDiscards
Traffic OutErrors
Traffic Rx/Tx Utilization
Transceiver Rx/Tx Power
Transceiver Temperature
Transceiver Voltage
Server/GPU Based Rules
CPU Core Temperature
CPU Utilization
DISK Health
DISK Temperature
DISK Used Memory Percent
Device Down
Docker Down
GPU Memory Utilization
GPU PSU 1 Power Draw
GPU PSU 2 Power Draw
GPU Temperature
GPU Utilization
Memory Utilization
Rule engine alerts ensure efficient resource utilization, timely troubleshooting, early detection of potential issues, and overall operational stability within the data centre environment.


Notification
ONES-App is capable of triggering breached threshold values to
Slack Channel
Zendesk Support
ServiceNow
Rules are categorized based on the metric hierarchy
Device Level
Interface Level
List of all the Metrics Supported by Rule Engine with possible units and measured value a user can use
Hierarchy
Metrics
Unit
Measure
Value
Device
CPU Utilization
Percentage (%)
AVG/MIN/MAX
0/100
Device
Memory Utilization
Percentage (%)
AVG/MIN/MAX
0/100
Device
Failed Fans
Count ()
MIN/MAX
Count
Device
Failed PSU
Count ()
MIN/MAX
Count
Device
CPU Core Temperature
Celsius ()
AVG/MIN/MAX
Celsius
Device
PSU Temperature
Celsius ()
AVG/MIN/MAX
Celsius
Device
FAN Speed
Percentage (%)
AVG/MIN/MAX
0/100
Device
ASIC IPv4 Routes Utilization
Percentage (%)
AVG/MIN/MAX
0/100
Device
ASIC IPv6 Routes Utilization
Percentage (%)
AVG/MIN/MAX
0/100
Device
BGP Nbrs Operationally Down
Count ()
AVG/MIN/MAX
Count of Nbrs
Device
FRR Container CPU Utilization
Percentage (%)
AVG/MIN/MAX
0/100
Device
Syncd Container CPU Utilization
Percentage (%)
AVG/MIN/MAX
0/100
Device
Device Down
NA
NA
NA
Device
Queue Counter
Count()
AVG/MIN/MAX
Count
Device
SSD Health
Percentage(%)
Percentage(%)
0/100
Device
SSD Temperature
Celsius ()
AVG/MIN/MAX
Celsius
Device
SSD Memory
Percentage(%)
Percentage(%)
0/100
Interface
Int Flap
NA
NA
NA
Interface
PFC Counters
Count ()
AVG/MIN/MAX
Count
Interfaec
Queue Counters
Count ()
AVG/MIN/MAX
Count
Interface
TX Utilization
Percentage (%)
AVG/MIN/MAX
0/100
Interface
RX Utilization
Percentage (%)
AVG/MIN/MAX
0/100
Interface
In Errors
Count ()
AVG/MIN/MAX
User defined
Interface
Out Errors
Count ()
AVG/MIN/MAX
User defined
Interface
In Discards
Count ()
AVG/MIN/MAX
User defined
Interface
Out Discards
Count ()
AVG/MIN/MAX
User defined
Interface
Tranx TX Power
dBm
AVG/MIN/MAX
User defined
Interface
Tranx Rx Power
dBm
AVG/MIN/MAX
User defined
Interface
Tranx Temperature
Celscius ()
AVG/MIN/MAX
User defined
Interface
Tranx Voltage
Volts ()
AVG/MIN/MAX
User defined
Last updated