ONES Rule Engine
⚡ Overview
In data center operations, maintaining reliability and uptime requires more than just monitoring — it demands proactive detection and rapid response.
A Rule Engine plays a critical role by continuously tracking key performance metrics and triggering alerts when thresholds are breached. This ensures that operators can:
✅ Identify anomalies early before they escalate. ✅ Respond quickly to potential risks or outages. ✅ Safeguard critical components & services from failure. ✅ Automate escalation via integrations (Slack, Zendesk, ServiceNow).
💡 In essence, Rule Engine alerts act as the first line of defense, keeping the data center environment stable, resilient, and secure.
Device Based Rules
CPU and Memory Utilisation
CPU Core temp alerts
Fan & PSU LED status
SSD Memory Utilization, Health & Temperature Status
Traffic Bandwidth
ASIC Routes (IPv4 & IPv6)
Device & Docker Down Alerts
Docker Per Process Down Alert
BGP Neighbour Down alter
Component failure
Interface Flap Alerts
Traffic Errors & Discard Counters
PFC Counters
Device Queue Counters
Config Change Alert
Docker CPU & Memory utilization(per service)
IP Change Alerts
Failed FAN's & PSU's
FAN Speed
MCLAG (Member/Peer/Session) Down Alerts
NTP Drift
PSU Temp Alerts
Unhealthy Devices Alerts
Interface Based Rules
Broadcast/Multicast Storm
PFC Rx/Tx Counters
Port Flap
Queue Transmit Counters
Traffic InDiscards
Traffic InErrors
Traffic OutDiscards
Traffic OutErrors
Traffic Rx/Tx Utilization
Transceiver Rx/Tx Power
Transceiver Temperature
Transceiver Voltage
GPU/ONES Server Based Rules
CPU Core Temperature
CPU Utilization
DISK Health
DISK Temperature
DISK Used Memory Percent
Device Down
Docker Down
GPU Memory Utilization
GPU PSU 1 Power Draw
GPU PSU 2 Power Draw
GPU Temperature
GPU Utilization
Memory Utilization
Rule engine alerts ensure efficient resource utilization, timely troubleshooting, early detection of potential issues, and overall operational stability within the data centre environment.

Alert Trigger on threshold value breach

Notification
ONES-App is capable of triggering breached threshold values to
Slack Channel
Zendesk Support
ServiceNow
Rules are categorized based on the metric hierarchy
Device Level
Interface Level
Server Level
List of all the Metrics Supported by Rule Engine with possible units and measured value a user can use
Hierarchy
Metrics
Unit
Measure
Value
Device/Server
CPU Utilization
Percentage (%)
AVG/MIN/MAX
0-100
Device/Server
Memory Utilization
Percentage (%)
AVG/MIN/MAX
0-100
Device
Failed Fans
Count ()
MIN/MAX
Count
Device
Failed PSU
Count ()
MIN/MAX
Count
Device/Server
CPU Core Temperature
Celsius ()
AVG/MIN/MAX
Celsius
Device
PSU Temperature
Celsius ()
AVG/MIN/MAX
Celsius
Device
FAN Speed
Percentage (%)
AVG/MIN/MAX
0-100
Device
ASIC IPv4 Routes Utilization
Percentage (%)
AVG/MIN/MAX
0-100
Device
ASIC IPv6 Routes Utilization
Percentage (%)
AVG/MIN/MAX
0-100
Device
BGP Nbrs Operationally Down
Count ()
AVG/MIN/MAX
Count of Nbrs
Device
FRR Container CPU Utilization
Percentage (%)
AVG/MIN/MAX
0-100
Device
Syncd Container CPU Utilization
Percentage (%)
AVG/MIN/MAX
0-100
Device/Serer
Device Down
NA
NA
NA
Device/Server
Docker Down
NA
NA
NA
Device
Queue Counter
Count()
AVG/MIN/MAX
Count
Device/Server
SSD Health
Percentage(%)
Percentage(%)
0-100
Device/Server
SSD Temperature
Celsius ()
AVG/MIN/MAX
Celsius
Device/Server
SSD Memory
Percentage(%)
Percentage(%)
0-100
Interface
Int Flap
NA
NA
NA
Interface
PFC Counters
Count ()
AVG/MIN/MAX
Count
Interfaec
Queue Counters
Count ()
AVG/MIN/MAX
Count
Interface
TX Utilization
Percentage (%)
AVG/MIN/MAX
0-100
Interface
RX Utilization
Percentage (%)
AVG/MIN/MAX
0-100
Interface
In Errors
Count ()
AVG/MIN/MAX
User defined
Interface
Out Errors
Count ()
AVG/MIN/MAX
User defined
Interface
In Discards
Count ()
AVG/MIN/MAX
User defined
Interface
Out Discards
Count ()
AVG/MIN/MAX
User defined
Interface
Tranx TX Power
dBm
AVG/MIN/MAX
User defined
Interface
Tranx Rx Power
dBm
AVG/MIN/MAX
User defined
Interface
Tranx Temperature
Celscius ()
AVG/MIN/MAX
User defined
Interface
Tranx Voltage
Volts ()
AVG/MIN/MAX
User defined
Server
GPU Mem Util
Percentage (%)
AVG/MIN/MAX
0-100
Server
GPU PSU 1 Power Draw
Count ()
AVG/MIN/MAX
User defined
Server
GPU PSU 2 Power Draw
Count ()
AVG/MIN/MAX
User defined
Server
GPU Temperature
Celscius ()
AVG/MIN/MAX
User defined
Server
GPU Utilization
Percentage (%)
AVG/MIN/MAX
0-100
Last updated