ONES Rule Engine
Last updated
Was this helpful?
Last updated
Was this helpful?
In data center operations, a rule engine with alerts for various metrics is essential for proactive monitoring and management of critical components and services. Let's discuss the need for rule engine alerts for specific metrics in a data centre environment
ASIC IPv4 Routes ASIC IPv6 Routes BGP Neighbours Down CPU Core Temperature CPU Utilization DISK Health DISK Temperature DISK Used Memory Percent Device Down Device Queue Transmit Counter Docker CPU Utilization Docker Down Docker MEM Utilization Dynamic IP Change Dynamic IP Change With Only Conflicts FAN Speed Failed FANs Failed PSUs Memory Utilization PSU Temperature Unhealthy Devices
Rule engine alerts ensure efficient resource utilization, timely troubleshooting, early detection of potential issues, and overall operational stability within the data centre environment.
ONES-App is capable of triggering breached threshold values to
Slack Channel
Zendesk Support
ServiceNow
Device Based
Interface Based
GPU Server Based
Hierarchy
Metrics
Unit
Measure
Value
Device
CPU Utilization
Percentage (%)
AVG/MIN/MAX
0/100
Device
Memory Utilization
Percentage (%)
AVG/MIN/MAX
0/100
Device
Failed Fans
Count ()
MIN/MAX
Count
Device
Failed PSU
Count ()
MIN/MAX
Count
Device
CPU Core Temperature
Celsius ()
AVG/MIN/MAX
Celsius
Device
PSU Temperature
Celsius ()
AVG/MIN/MAX
Celsius
Device
FAN Speed
Percentage (%)
AVG/MIN/MAX
0/100
Device
ASIC IPv4 Routes Utilization
Percentage (%)
AVG/MIN/MAX
0/100
Device
ASIC IPv6 Routes Utilization
Percentage (%)
AVG/MIN/MAX
0/100
Device
BGP Nbrs Operationally Down
Count ()
AVG/MIN/MAX
Count of Nbrs
Device
FRR Container CPU Utilization
Percentage (%)
AVG/MIN/MAX
0/100
Device
Syncd Container CPU Utilization
Percentage (%)
AVG/MIN/MAX
0/100
Device
Device Down
NA
NA
NA
Device
Queue Counter
Count()
AVG/MIN/MAX
Count
Device
DISK Health
Percentage(%)
Percentage(%)
0/100
Device
DISK Temperature
Celsius ()
AVG/MIN/MAX
Celsius
Device
DISK Memory
Percentage(%)
Percentage(%)
0/100
Device
Docker CPU Utilization
Percentage(%)
Percentage(%)
0/100
Device
Docker Memory Utilization
Percentage(%)
Percentage(%)
0/100
Device
Docker Down
NA
NA
NA
Device
Device IP Change
NA
NA
NA
Device
Device IP Change with Conflict
NA
NA
NA
Device
Unhealthy Device
NA
NA
NA
Interface
Int Flap
NA
NA
NA
Interface
PFC Counters
Count ()
AVG/MIN/MAX
Count
Interfaec
Queue Transmit Counters
Count ()
AVG/MIN/MAX
Count
Interface
TX Utilization
Percentage (%)
AVG/MIN/MAX
0/100
Interface
RX Utilization
Percentage (%)
AVG/MIN/MAX
0/100
Interface
In Errors
Count ()
AVG/MIN/MAX
User defined
Interface
Out Errors
Count ()
AVG/MIN/MAX
User defined
Interface
In Discards
Count ()
AVG/MIN/MAX
User defined
Interface
Out Discards
Count ()
AVG/MIN/MAX
User defined
Interface
Transceiver TX Power
dBm
AVG/MIN/MAX
User defined
Interface
Transceiver Rx Power
dBm
AVG/MIN/MAX
User defined
Interface
Transceiver Temperature
Celscius ()
AVG/MIN/MAX
User defined
Interface
Transceiver Voltage
Volts ()
AVG/MIN/MAX
User defined
Server
CPU Core Temperature
Celsius ()
AVG/MIN/MAX
User defined
Server
CPU Utilization
Percentage (%)
AVG/MIN/MAX
0/100
Server
DISK Health
Percentage (%)
AVG/MIN/MAX
0/100
Server
DISK Temperature
Celsius ()
AVG/MIN/MAX
User defined
Server
DISK used Memory %
Percentage (%)
AVG/MIN/MAX
0/100
Server
Device Down
NA
NA
NA
Server
Docker Down
NA
NA
NA
Server
GPU Memory Utilization
Percentage (%)
AVG/MIN/MAX
0/100
Server
GPU PSU-1 Power Draw
Celsius ()
AVG/MIN/MAX
User defined
Server
GPU PSU-2 Power Draw
Celsius ()
AVG/MIN/MAX
User defined
Server
GPU Temperature
Celsius ()
AVG/MIN/MAX
User defined
Server
GPU Utilization
Percentage (%)
AVG/MIN/MAX
0/100
Server
Memory Utilization
Percentage (%)
AVG/MIN/MAX
0/100