arrow-left

All pages
gitbookPowered by GitBook
1 of 8

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Alerts

hashtag
Overview

When a user creates a rule, and the threshold value is exceeded, alerts will be generated. These alerts will also be displayed on this page.

hashtag
Alerts

Notifications from the Alerts Page are always sent to:

  • ServiceNow Support: Integrated ServiceNow Support will get all the alerts triggered by ONES.

  • Zendesk Support: Integrated Zendesk Support systems will receive all push notifications.

  • SLACK Channel: If integrated, notifications will also be sent to the configured SLACK channel.

circle-exclamation

At a time one single Support Ticket can work, While using Zendesk Support User wont be able to use ServiceNow Ticket Support

Alert Management

  1. Count of alerts related to feature

  2. Alert Name

  3. First seen of the alert

  4. Last seen of the alert

Expand Option is used to check the payload and total alerts

Time Scale Alert Updates

Users can choose the time range to check the more alerts

Alert Page allows a user to download the report in CSV format with a time range

Alert Page: It will always display the alerts on ONES Alert page

Option to delete the alerts

Slack Channel Integration

1. Create a Channel for ONES-App push notification

hashtag
2. Generate API for Channel

login to api.slack.comarrow-up-right & choose Your apps

  1. Create an App

  2. Choose From scratch

Service Now Integration

circle-info

User needs to generate a Instance URL from Service Now developer Account

  1. Integrations >> Ticketing >> ServiceNow

  1. Add Channel

    Inputs to sucessfully integrate ServiceNow

    1. Instance URL (from serviceNow developer Account)

    2. Credentials (from serviceNow developer Account)

Zendesk Support Integration

  1. Login to the Zendesk Support Admin panel & Follow the steps

    1. click >> Apps & Integration

    2. Choose >> Zendesk API

    3. Enable Token Access

    4. Give API Token Description (Optional)

    5. Copy the API Token

    6. Save the Settings

  2. Open ONES-App and select Integration >> Ticketing

  3. Add Channel & Paste the required details

  4. After saving it will be available to use while creating any rule using Rule Engine feature

Add Rules: Entity by Properties

hashtag
Add Rules

Navigation >> Watcher >> Rules

Create New & Add the required inputs

Preview & Create

Once a user creates the rule it will be available in the rule list

Once the device CPU Utilization goes above the threshold value it will start pushing notifications to SLACK & Zendesk Support tickets and the ONES App Alert Page

Rules Type

hashtag
Rules Type

There are two types of Rule a user can configure

  1. Entity Based

    1. Allow a user to create Rules per device

      1. Allow user to include or exclude the devices from the rule

  2. Entity by Property

    1. Allow a user to create Rules by using HwSKU, Role, OS Version across all the managed devices

hashtag
1. Entity Based explained

hashtag
Possible Values & Description

  1. Rule Name: The user can choose any related name

  2. For: The user can choose 2 options

chevron-rightDevice: Once the user chooses the rule for Devices it will show the below Metricshashtag
  • ASIC IPv4 Routes

  • ASIC IPv6 Routes

chevron-rightInterface: Once the user chooses the rule for Interfaces it will show the below Metricshashtag
  • Interface flap

  • Interface PFC Receive Counters

chevron-rightServer: The user will be able to get alerts for IntelGaudy hashtag
  • CPU Core Temperature

  • CPU Utilization

  1. Metrics: Metrics depend on the above (For: Device/Interface) condition

  2. Measure: Metrics are measured in three diff ways

    1. MIN

hashtag
Conditions

  1. When Measured Value is: This option allows a user to choose what condition has to match when the measured value is

    1. EQ: Equal to

    2. NEQ: Not Equal to

hashtag
Notification

  1. Notify: The user can choose the integrated SLACK Channel

  2. Create Ticker: Zendesk Users can choose this to raise the Zendesk support ticket

  3. Create Ticker: ServiceNow Users can choose this to raise the ServiceNow support ticket

hashtag
2. Entity by Property

hashtag
Possible Values & Description

  1. Rule Name: The user can choose any related name

  2. Filter: user can filter the rule for all managed devices by

    1. HWSKU

chevron-rightDevice: Once the user chooses the rule for Devices it will show the below Metricshashtag
  • ASIC IPv4 Routes

  • ASIC IPv6 Routes

chevron-rightInterface: Once the user chooses the rule for Interfaces it will show the below Metricshashtag
  • Interface Flap

  • Interface PFC Counters

  1. Select: this option depends on the Filter category, possible values are

    1. Select HWSKU :

    2. Select ROLE :

hashtag
Conditions

  1. When Measured Value is: This option allows a user to choose what condition has to match when the measured value is

    1. EQ: Equal to

    2. NEQ: Not Equal to

hashtag
Notification

  1. Notify: The user can choose the integrated SLACK Channel

  2. Create Ticker: Zendesk Users can choose this to raise the Zendesk support ticket

  3. Create Ticker: ServiceNow Users can choose this to raise the ServiceNow support ticket

BGP Neighbours Down

  • Device CPU Core Temperature

  • Device CPU Utilization

  • Device Down

  • Device Memory Utilization

  • Device Queue Transmit Counter

  • FAN Speed

  • Failed FANs

  • Failed PSUs

  • PSU Temperature

  • SSD Health

  • SSD Temeperature

  • SSD Used Memory Percent

  • frr CPU Utilization

  • syncd CPU Utilization

  • Interface PFC Transmit Counters

  • Interface Queue Transmit Counters

  • Traffic InDiscards

  • Traffic InErrors

  • Traffic OutDiscards

  • Traffic OutErrors

  • Traffic Rx Utilization

  • Traffic Tx Utilization

  • Transceiver Rx Power

  • Transceiver Temperature

  • Transceiver Tx Power

  • Transceiver Voltage

  • Device Down

  • FAN Speed

  • GPU Memory Utilization

  • GPU PSU 1 Power Draw

  • GPU PSU 2 Power Draw

  • GPU Temperature

  • GPU Utilization

  • Memory Utilization

  • AVG

  • MAX

  • Period: Measured metrics can be verified with a buffer of a timer

    1. 5 min

    2. 10 min

    3. 15 min

    4. 30 min

    5. 1 hour

  • GE: Greater than Equal to

  • LE: Less than Equal to

  • GT: Greater than

  • LT: Less than

  • Critical Threshold: The user can set a Critical value on which push notification will be triggered

  • Warning Threshold: The user can set a Warning value on which push notification will be triggered

  • Weekly Digest: Slack Users can choose this for Weekly Digest to SLACK Channel

  • Do not notify if the same alert trigger in: 30min, 1hour, 2hours, 10hours, 24hours

  • Stop notifying after: The user can choose a value of occurrence then it will not trigger the same in the next 24 hours

  • ROLE

  • OS Version

  • For: The user can choose 2 options

  • BGP Neighbours Down

  • Device CPU Core Temperature

  • Device CPU Utilization

  • Device Down

  • Device Memory Utilization

  • Device Queue Counter

  • FAN Speed

  • Failed FANs

  • Failed PSUs

  • PSU Temperature

  • SSD Health

  • SSD Temeperature

  • SSD Used Memory Percent

  • frr CPU Utilization

  • syncd CPU Utilization

  • Interface Queue Counters

  • Traffic InDiscards

  • Traffic InErrors

  • Traffic OutDiscards

  • Traffic OutErrors

  • Traffic Rx Utilization

  • Traffic Tx Utilization

  • Transceiver Rx Power

  • Transceiver Temperature

  • Transceiver Tx Power

  • Transceiver Voltage

  • Select OS VERSION :

  • Metrics: Metrics depend on the above (For: Device/Interface) condition

  • Measure: Metrics are measured in three diff ways

    1. MIN

    2. AVG

    3. MAX

  • Period: Measured metrics can be verified with a buffer of a timer

    1. 5 min

    2. 10 min

    3. 15 min

    4. 30 min

    5. 1 hour

  • GE: Greater than Equal to

  • LE: Less than Equal to

  • GT: Greater than

  • LT: Less than

  • Critical Threshold: The user can set a Critical value on which push notification will be triggered

  • Warning Threshold: The user can set a Warning value on which push notification will be triggered

  • Weekly Digest: Slack Users can choose this for Weekly Digest to SLACK Channel

  • Do not notify if the same alert trigger in 30min, 1hour, 2hours, 10hours, 24hours

  • Stop notifying after: The user can choose a value of occurrence then it will not trigger the same in the next 24 hours

  • Provide any App Name and choose the workspace where the user wants to get the push notification & Create App
  • Choose Incoming Webhook and Activate Incoming Webhooks & Add New Worbhook to workspace

  • Select the configured Channel & Allow

  • Copy the newly created webhook link

  • Open ONES-App and select Integration >> Messaging

  • Add Channel & Paste the Webhook URL

  • After saving it will be available to use while creating any rule using Rule Engine feature

  • Enable the Service

    Rule Engine

    hashtag
    Overview

    In data centre operations, a rule engine with alerts for various metrics is essential for proactive monitoring and management of critical components and services. Let's see the different types of rule engine alerts for specific metrics in a data centre environment

    1. CPU and Memory Alerts

    2. Fan and Power Supply Unit Alerts

    3. Traffic Bandwidth

    4. ASIC IPv4 & IPv6 Routes

    5. BGP Neighbour Alerts

    6. Health Services

    7. Device Down Alerts

    8. SSD Health, temperature and memory usage alert

    9. Device Queue counters

    10. PFC counters

    11. Traffic Errors and Discard Counters

    12. frr and syncd services CPU utilization status

    13. Server Agent based metrics

      1. CPU Temperature and Utilization

      2. Down status

      3. FAN Speed

    hashtag
    Push Notification

    Rule Engine pushes the configured rule notification in case any device breaches the threshold value configured under the rule to

    1. Slack channel

    2. Zendesk Support ticket

    3. Service Now ticket

    To use Rule Engine Alert feature User needs to setup first Slack channel integration, Zendesk Support integration or Service-Now integration

    Memory Utilization

  • GPU

    1. Memory Utilization

    2. PSU Power Draw

    3. Temperature

    4. Utilization

  • Add Rules: Entity

    hashtag
    Add Rules

    Navigation >> Watcher >> Rules

    Create New & Add the required inputs

    Preview & Create

    Once a user create the rule it will be available in the rule list

    Once the device SSD Memory Utilization goes above the threshold value it will start pushing notifications to SLACK & Zendesk Support tickets & also inside the ONES Alert Page