ONES Rule Engine

Overview

In data center operations, a rule engine with alerts for various metrics is essential for proactive monitoring and management of critical components and services. Let's discuss the need for rule engine alerts for specific metrics in a data centre environment

  1. CPU and Memory Utilisation

  2. Fan and PSU LED status

  3. SSD Memory Utilization, Health and Temperature Status

  4. Traffic Bandwidth

  5. ASIC Routes

  6. Health Services

  7. Device Down alerts

  8. BGP Neighbour Down alter

  9. Component failure

  10. Interface Flap Alerts

  11. Traffic Errors and Discard Counters

  12. PFC Counters

  13. Device Queue Counters

Rule engine alerts ensure efficient resource utilization, timely troubleshooting, early detection of potential issues, and overall operational stability within the data centre environment.

Notification

ONES-App is capable of triggering breached threshold values to

  • Slack Channel

  • Zendesk Support

  • ServiceNow

Rules are categorized based on the metric hierarchy

  1. Device Level

  2. Interface Level

List of all the Metrics Supported by Rule Engine with possible units and measured value a user can use

Hierarchy

Metrics

Unit

Measure

Value

Device

CPU Utilization

Percentage (%)

AVG/MIN/MAX

0/100

Device

Memory Utilization

Percentage (%)

AVG/MIN/MAX

0/100

Device

Failed Fans

Count ()

MIN/MAX

Count

Device

Failed PSU

Count ()

MIN/MAX

Count

Device

CPU Core Temperature

Celsius ()

AVG/MIN/MAX

Celsius

Device

PSU Temperature

Celsius ()

AVG/MIN/MAX

Celsius

Device

FAN Speed

Percentage (%)

AVG/MIN/MAX

0/100

Device

ASIC IPv4 Routes Utilization

Percentage (%)

AVG/MIN/MAX

0/100

Device

ASIC IPv6 Routes Utilization

Percentage (%)

AVG/MIN/MAX

0/100

Device

BGP Nbrs Operationally Down

Count ()

AVG/MIN/MAX

Count of Nbrs

Device

FRR Container CPU Utilization

Percentage (%)

AVG/MIN/MAX

0/100

Device

Syncd Container CPU Utilization

Percentage (%)

AVG/MIN/MAX

0/100

Device

Device Down

NA

NA

NA

Device

Queue Counter

Count()

AVG/MIN/MAX

Count

Device

SSD Health

Percentage(%)

Percentage(%)

0/100

Device

SSD Temperature

Celsius ()

AVG/MIN/MAX

Celsius

Device

SSD Memory

Percentage(%)

Percentage(%)

0/100

Interface

Int Flap

NA

NA

NA

Interface

PFC Counters

Count ()

AVG/MIN/MAX

Count

Interfaec

Queue Counters

Count ()

AVG/MIN/MAX

Count

Interface

TX Utilization

Percentage (%)

AVG/MIN/MAX

0/100

Interface

RX Utilization

Percentage (%)

AVG/MIN/MAX

0/100

Interface

In Errors

Count ()

AVG/MIN/MAX

User defined

Interface

Out Errors

Count ()

AVG/MIN/MAX

User defined

Interface

In Discards

Count ()

AVG/MIN/MAX

User defined

Interface

Out Discards

Count ()

AVG/MIN/MAX

User defined

Interface

Tranx TX Power

dBm

AVG/MIN/MAX

User defined

Interface

Tranx Rx Power

dBm

AVG/MIN/MAX

User defined

Interface

Tranx Temperature

Celscius ()

AVG/MIN/MAX

User defined

Interface

Tranx Voltage

Volts ()

AVG/MIN/MAX

User defined

Last updated

Copyright © Aviz Networks, Inc.