In data centre operations, a rule engine with alerts for various metrics is essential for proactive monitoring and management of critical components and services. Let's see the different types of rule engine alerts for specific metrics in a data centre environment
CPU and Memory Alerts
Fan and Power Supply Unit Alerts
Traffic Bandwidth
ASIC IPv4 & IPv6 Routes
BGP Neighbour Alerts
Health Services
Device Down Alerts
SSD Health, temperature and memory usage alert
Device Queue counters
PFC counters
Traffic Errors and Discard Counters
frr and syncd services CPU utilization status
Server Agent based metrics
CPU Temperature and Utilization
Down status
FAN Speed
Memory Utilization
GPU
Memory Utilization
PSU Power Draw
Temperature
Utilization
Rule Engine pushes the configured rule notification in case any device breaches the threshold value configured under the rule to
Slack channel
Zendesk Support ticket
Service Now ticket
To use Rule Engine Alert feature User needs to setup first Slack channel integration, Zendesk Support integration or Service-Now integration
1. Create a Channel for ONES-App push notification
login to api.slack.com & choose Your apps
Create an App
Choose From scratch
Provide any App Name and choose the workspace where the user wants to get the push notification & Create App
Choose Incoming Webhook and Activate Incoming Webhooks & Add New Worbhook to workspace
Select the configured Channel & Allow
Copy the newly created webhook link
Open ONES-App and select Integration >> Messaging
Add Channel & Paste the Webhook URL
After saving it will be available to use while creating any rule using Rule Engine feature
Login to the Zendesk Support Admin panel & Follow the steps
click >> Apps & Integration
Choose >> Zendesk API
Enable Token Access
Give API Token Description (Optional)
Copy the API Token
Save the Settings
Open ONES-App and select Integration >> Ticketing
Add Channel & Paste the required details
After saving it will be available to use while creating any rule using Rule Engine feature
Integrations >> Ticketing >> ServiceNow
Add Channel
Inputs to sucessfully integrate ServiceNow
Instance URL (from serviceNow developer Account)
Credentials (from serviceNow developer Account)
There are two types of Rule a user can configure
Entity Based
Allow a user to create Rules per device
Allow user to include or exclude the devices from the rule
Entity by Property
Allow a user to create Rules by using HwSKU, Role, OS Version across all the managed devices
Rule Name: The user can choose any related name
For: The user can choose 2 options
Metrics: Metrics depend on the above (For: Device/Interface) condition
Measure: Metrics are measured in three diff ways
MIN
AVG
MAX
Period: Measured metrics can be verified with a buffer of a timer
5 min
10 min
15 min
30 min
1 hour
When Measured Value is: This option allows a user to choose what condition has to match when the measured value is
EQ: Equal to
NEQ: Not Equal to
GE: Greater than Equal to
LE: Less than Equal to
GT: Greater than
LT: Less than
Critical Threshold: The user can set a Critical value on which push notification will be triggered
Warning Threshold: The user can set a Warning value on which push notification will be triggered
Notify: The user can choose the integrated SLACK Channel
Create Ticker: Zendesk Users can choose this to raise the Zendesk support ticket
Create Ticker: ServiceNow Users can choose this to raise the ServiceNow support ticket
Weekly Digest: Slack Users can choose this for Weekly Digest to SLACK Channel
Do not notify if the same alert trigger in: 30min, 1hour, 2hours, 10hours, 24hours
Stop notifying after: The user can choose a value of occurrence then it will not trigger the same in the next 24 hours
Rule Name: The user can choose any related name
Filter: user can filter the rule for all managed devices by
HWSKU
ROLE
OS Version
For: The user can choose 2 options
Select: this option depends on the Filter category, possible values are
Select HWSKU :
Select ROLE :
Select OS VERSION :
Metrics: Metrics depend on the above (For: Device/Interface) condition
Measure: Metrics are measured in three diff ways
MIN
AVG
MAX
Period: Measured metrics can be verified with a buffer of a timer
5 min
10 min
15 min
30 min
1 hour
When Measured Value is: This option allows a user to choose what condition has to match when the measured value is
EQ: Equal to
NEQ: Not Equal to
GE: Greater than Equal to
LE: Less than Equal to
GT: Greater than
LT: Less than
Critical Threshold: The user can set a Critical value on which push notification will be triggered
Warning Threshold: The user can set a Warning value on which push notification will be triggered
Notify: The user can choose the integrated SLACK Channel
Create Ticker: Zendesk Users can choose this to raise the Zendesk support ticket
Create Ticker: ServiceNow Users can choose this to raise the ServiceNow support ticket
Weekly Digest: Slack Users can choose this for Weekly Digest to SLACK Channel
Do not notify if the same alert trigger in 30min, 1hour, 2hours, 10hours, 24hours
Stop notifying after: The user can choose a value of occurrence then it will not trigger the same in the next 24 hours
Navigation >> Watcher >> Rules
Create New & Add the required inputs
Preview & Create
Once a user create the rule it will be available in the rule list
Once the device SSD Memory Utilization goes above the threshold value it will start pushing notifications to SLACK & Zendesk Support tickets & also inside the ONES Alert Page
Navigation >> Watcher >> Rules
Create New & Add the required inputs
Preview & Create
Once a user creates the rule it will be available in the rule list
Once the device CPU Utilization goes above the threshold value it will start pushing notifications to SLACK & Zendesk Support tickets and the ONES App Alert Page
When a user creates a rule, and the threshold value is exceeded, alerts will be generated. These alerts will also be displayed on this page.
Notifications from the Alerts Page are always sent to:
ServiceNow Support: Integrated ServiceNow Support will get all the alerts triggered by ONES.
Zendesk Support: Integrated Zendesk Support systems will receive all push notifications.
SLACK Channel: If integrated, notifications will also be sent to the configured SLACK channel.
Alert Page: It will always display the alerts on ONES Alert page
At a time one single Support Ticket can work, While using Zendesk Support User wont be able to use ServiceNow Ticket Support
Alert Management
Count of alerts related to feature
Alert Name
First seen of the alert
Last seen of the alert
Option to delete the alerts
Expand Option is used to check the payload and total alerts
Time Scale Alert Updates
Users can choose the time range to check the more alerts
Alert Page allows a user to download the report in CSV format with a time range