Monitor

Overview

The monitor widget in ONES:

  • Shows the complete topology view of the fabric

  • User can change the topology view by:

    • Datacenter

    • AI-ML

  • The Topology view can be categorised by:

    • Region

    • Platform

    • ASICs

    • Statuses

      • Not streaming, Faulty Fans, Faulty PSUs, Links Down

    • Metrics

      • Bandwidth RX & TX

      • Memory, CPU & ASIC Utilization

  • This page shows all the links and information connected to each other

  • Using a right click on device it allow a user to go to specific feature details

    • Traffic, Health, Capacity, Protocols

    • Allow a user to connect device via SSH, Console Access

    • Can get SYSLOGS

  • Traffic View

    • PFC Enabled Device view

    • input/output packets in million per sec

    • Errors and Discard packets per interface

    • And more related metrics

    • AI-Fabric metrics

  • Health of the devices

    • CPU & Memory Utilization

    • CPU & PSU Temperature

    • PSU Voltage & Fan Speed

    • SSD Temperature, Health and Memory utilization

    • Server GPU Metrics

  • Capacity of the devices

    • IPv6 & IPv6 Routes

      • ASIC/Software/Kernel

      • ASIC ACL capacity

  • Links Page

    • All the connected devices

    • Transceivers info

  • Protocols status

    • BGP status

    • VXLAN

    • MCLAG

    • LACP

    • QOS

Topology

  • Navigate to Monitor >> Topology

    • Topology Type: DATACENTER

    • Topology Type: AI-ML

  • This shows the complete Topology view, how the devices are connected

  • Topology can be filtered by Underlay/Overlay/RoCE

  • We can put the filters to check the customized view of the Topology

  • Count of devices

    • All devices onboarded

    • Not streaming

    • Faulty Fans & PSUs

    • Links Down

  • We can also check Down Links to check the topology those are having links in the shutdown state

  • When we hover the cursor over any device and use right click we get few more controls

    • Device details/Ports

    • Direct Navigation per device

      • Traffic/Health/Capacity/Protocols

    • Console connect

    • SYSLOG

  • we can also filter the view by using

    • Statuses

      • Not streaming, Faulty Fans, Faulty PSUs, Links Down

    • Metrics

      • Bandwidth RX & TX

      • Memory, CPU & ASIC Utilization

Traffic

  • Using this widget we can check the input and output errors across all the devices

  • This widget also shows the input and output packet per device

  • Navigate to Monitor >> Traffic

  • This page shows the information:

    • Device Name & IP

    • Roles & Region

    • Device details

    • Interface speed and ports

    • Errors & Utilization of the links

  • Filter Ribbon can be used to get a customized view

    • PFC Enabled Devices

    • Operator up/down

    • Admin down

    • PFC Enable interfaces

  • When we click on any device it gives more information about the interface traffic

    • Errors per interfaces

    • Bandwidth Utilisation per interfaces

  • When we click on any particular interface it gives the timescale of the inputs and output packets with Errors and Discards & all metrics in detail

  • This page shows the traffic drop rate per interface based and will be very useful while doing any troubleshooting for a traffic drop

  • Using these details a user can check more details inside to fix the issue of dropping/discarding packets

Health

This page shows the latest utilization of all the devices

  • CPU & Memory utilization

  • Temperature & Voltage of PSU

  • fan speed in % & RPM

  • SSD Temperature, Health and Memory Usage

  • Navigate to Monitor >> Health

Health Status

Health Status is reported for the following components

  • Roles

  • SKU/ASIC

  • Ports/Max Speed

  • CPU Utilization (%)

  • Memory Utilization (%)

  • CPU Temperature (℃)

  • PSU Temperature (℃)

  • PSU Voltage (V)

  • Fan Speed (RPM)

  • SSD

    • Temperature(℃)

    • Health(%)

    • Memory(%)

HOST / IP

  • Device Name

  • Device IP

Roles/Region

  • Device Role

  • Device Region

SKU/ASIC

  • SKU (Stock Keeping Unit)

  • ASIC

Port/Max Speed

  • Total number of ports available

  • Speed of ports

CPU Utilization (%)

CPU Utilization reported in 4 states

  • Normal

  • Acceptable

  • Critical - Action needed

  • Not Streaming - Agent is not up

Click on any device to get the view/status of all the components related to that device

Memory Utilization (%)

Memory Utilization reported in 4 states

  • Normal

  • Acceptable

  • Critical - Action needed

  • Not Streaming - Agent is not up

Click on any device to get the view/status of all the components related to that device

Average CPU Temperature (C)

Details of the temperature of the CPU across all the devices in degrees celsius

  • Any device that breaches the configured acceptable or critical value will be shown here

  • Click on any device to get the view/status of all the components related to that device

Average PSU Temperature (C)

Power Supply Temperature in degrees celsius

  • Any device that breaches the configured acceptable or critical value will be shown here

  • Click on any device to get the view/status of all the components related to that device

PSU (Voltage)

Power Supply Voltage readings in volts

  • Any device that breaches the configured acceptable or critical value will be shown here

  • Click on any device to get the view/status of all the components related to that device

Average Fan Speed (%)

Fan Speed in % of maximum supported RPM

  • Any device that breaches the configured acceptable or critical value will be shown here

  • Click on any device to get the view/status of all the components related to that device

SSD

SSD Status will be shown here

  • SSD Temperature: will allow a user to track the temperature

  • SSD Health: will allow a user to check the health utilization in percentage

  • SSF Memory: this metric will be useful to check the utilization of SSD

Customized View

  • We can check the health of the device as per some customization

  • We can filter the devices by:

    • Roles

    • Region

Role-based Customization

  • We can choose a role using the available Role-based option

  • 4 Roles available

    • Super Spine

    • Spine

    • Leaf

    • ToR

  • Let’s check it with a Leaf filter

  • After selecting Leaf input, here is the new view of only devices that belong to the Leaf role

Per Device Status

  • This Platform Widget also gives the option to check the extended capability view of the device

  • Apart from this monitoring view, we can also verify/check extended feature sets like:

    • PSU Current (A)

    • PSU Power (W)

    • Services Running

    • Services CPU/Memory Consumption (%)

  • To view per device status with all possible widgets, click on any of the devices present on the list

  • When we choose a specific device we get an output like this

Device Info Ribbon

Feature
Use

1

Time Frame: Check Utilization Trends based on Time Range The application has the capacity to store up to 2 weeks of data

2

Refresh Component Status

3

Alerts: show all the alerts triggered by rules

4

Raise a Ticket for Technical Support

5

Documentation

6

API Explorer

7

Device Details

  • Platform

  • Number of Ports and Speed

  • Agent Version

  • Uptime

  • CPU Utilization

  • Memory Utilization

  • CPU Temperature

  • Services running on the device

CPU Usage (%)

  • Here we get the complete status of CPU usage with a time range A complete status What was the usage from starting to end

  • To check a specific time detail we can hover the cursor to any level

Memory Usage (%)

  • Now here we get the status of Memory Usage of selected device

  • To check a specific time detail with memory utilization, we can hover the cursor to any level

Services CPU Consumption (%)

  • This widget shows us the CPU consumption percentage level of all services / per service.

  • Here we can see we have the option to check the consumption view of CPU

  • To check a specific time detail we can hover the cursor on any level

Services Memory Consumption (%)

  • This widget shows us the Memory consumption percentage level of all services / per service.

  • Here we can also check the consumption view of only Memory.

  • To check a specific time detail we can hover the cursor on any level

Services Running

  • The best widget here for Services

  • We can also check the total number of count of services running on the platform

  • This graph shows the red colour bar, red colour show at what time one of the services went down

CPU Temperature (C)

  • This template shows the status of CPU temperature in degree celsius

  • Here we get the status of all the CPU and Core running on the device

Components

This page outlines the key metrics for accurately monitoring the performance of various components, specifically focusing on Temperature, Current, Fans, and Power.

SSD

Device Resource Utilization Page

This page allows users to monitor the resource usage of their devices, providing a proactive view of how resources are allocated per switch and the trends in service usage. If you notice any resource usage spiking, you can easily navigate to another utilization page to identify the specific processes contributing to the increased demand. This feature is designed to help users manage their resources more effectively and prevent potential issues before they impact performance.

These metrics can be effectively analyzed through a time series graph.

Capacity

This page shows the view of Capacity and a few more details related to devices

  • This widget Shows

    • Roles/Region per device

    • SKU and ASIC details per device

    • ASIC ACL Capacity utilization

    • IPv4 Routes (ASIC, Software, Kernel)

    • IPv6 Routes (ASIC, Software, Kerneel)

Feature
Details

Roles/Region per device

SKU and ASIC details per device

ASIC ACL Capacity utilization

IPv4 Routes (ASIC, Software, Kernel)

IPv6 Routes (ASIC, Software, Kernel)

This Capacity Widget give us the control to get the output per Role and Region basis also

  • Let's choose Leaf Role to get the customized view

  • In the same way, we can customize the view by Region & SKUs

This is the extended view of the device capacity for all the IPv4 and IPv6 ASIC routes, ACL utilization, software, and kernel routes

Using this page a user will be able to troubleshoot the protocol or any misbehaviour happening on the devices due to any capacity issue of routes

Per Device Status

This widget gives us the capability to check the extended view of the Routes & ACL usage with a range of time

Click on any of the devices to get the extended view

Feature

Use

  • When we move the cursor to metrics this gives the usage view of ipv4:

    • ASIC

    • Kernel

    • Software

​When we move the cursor to metrics this gives the usage view of ipv6:

  • ASIC

  • Kernel

  • Software

​When we move the cursor to metrics this gives the usage view of ACL:

  • ASIC

This page gives a view to the user for all the possible connected links between devices with a few more capabilities

  • Navigate to Monitor >> Links

Feature
Use

Hostname

Hostname of the managed device

Role

Role of the device

Port/Interface

Interface details

Port Speed

Link speed of connected devices

Transceiver

SFP/QSFP Optics statuts

Manufacturer

Device Manufacturer

Manufactured Date

Date of Manufacturing

Admin and Operator status

Local and Remote status of link

  • This page helps a user to get the best view of the number of connections between devices with speed and other manufacturer details

  • This page gives the exact view of the interface name, interface speed, transceivers and admin & operator status

User can also check the transceiver details with timescale database

Protocols

This Protocol Page shares the metrics of below features

  • BGP (numbered/ unnumbered)

  • VXLAN

  • MGLAG

  • LACP

  • QOS

BGP

This BGP section will help a user to know more and accurate number with following details

  • BGP 2 byte and 4 byte AS

  • BGP numbered

  • BGP unnumbered

  • Total number of neighbors configured

  • How many neighbors are up and down

  • total number of prefixes and how many we are advertising

  • BGP neighbor details

Feature
Feature

  • Here we can get:

    • Device name

    • Device IP

  • Here We get the view of:

    • Roles and Region

  • this column shares the details of:

    • SKU

    • ASIC

  • This shared the count of total BGP neighbours

  • This column share the status of

    • how many BGP neighbours are UP and running

    • How many BGP neighbours are in Down state

  • This column shares the Total Prefixes Present in BGP

  • These are the total number of advertised prefixes by the router to other BGP neighbours

  • This is the Local BGP AS number

  • Here we have the control to check more details on neighbours

This page gives the best details of the BGP neighbours connected with the devices and possible metrics/values a user can use to troubleshoot a BGP neighbour

Neighbor View

This shows the status of the neighbour's details, the total number of neighbours, received routes, neighbour RID, BGP AS number & much more​​We have the option here to check the neighbour details and status of Routes​​We can click on neighbours to get more details about all neighbours connected

Feature

Feature

  • Here we get the details of connected neighbours

    • Neighbour Device Name

    • Neighbour IP

  • Here we get the Neighbour BGP AS number

  • This shares the neighbour status of uptime, from how long the neighbour is connected

  • Here we get the detail of the last neighbour reset timer

  • This share the count of established and dropped connections per neighbour

  • Here is the view of the Keep Alive timer:

    • Tx: how many keepalives have been transmitted

    • RX: how many keepalives have been received

Here is the view of the Route Refresh messages count:

  • Tx: how many Route-Refresh messages have been transmitted

  • RX: how many Route-Refresh messages have been received

Here is the view of the Updates Count:

  • Tx: how many times updates have been transmitted

  • RX: how many times updates have been received

Per device status (Neighbour's & Announcement)

The user can get per-device status by choosing a particular Device

  • Click on the device name to get the status

  • This new page shows the status of BGP neighbours about UP and Down status

  • On the right side it shows the BGP announcements and the local prefixes present in the BGP table

VXLAN

This section is really helpful in finding the devices with active VXLAN features enabled with a few more details

  • L2 & L3 VXLAN metrics

  • Local VTEP

  • Remote VTEP and details on how many are up and down

  • VLAN to VNI Mapping

  • VRF to VNI mapping

Below output shows all the possible devices with VXLAN details

VTP Details

By clicking on Local VTEP ID we get the most accurate details on that device for all the remote VTEP connected to this

Users can click on operation status to get the time series graph about the up nd down status, at what time the VTEP was up or down

VLAN to VNI Mapping

Users can get the details on VLAN to VNI mapping using the option on this page

VRF to VNI Mapping

Using the below option user can check all the possible VRF to VNI Mapping

MCLAG

This feature enhances network management by allowing users to access a timescale graph. This graph shows the status of neighbour and peer links over time, indicating periods when they were down or active. Users can further examine the health page to determine whether downtime resulted from process issues or resource utilization problems.

Additionally, the feature provides tools for verifying:

  • MGLAG (Multi-Chassis Link Aggregation Group) Domain ID

  • The status of PortChannels associated with Peer Links

  • MCLAG-L3 (Multi-Chassis Link Aggregation Level 3) status

  • MCLAG-L2 (Multi-Chassis Link Aggregation Level 2) status

With these capabilities, users can effectively track and diagnose network performance and configuration issues.

Time Series Graph with up and down status

Navigate to Monitor >> Protocols >> <Choose Devcice> >> Click on Active/Passive

LACP

Using this Page metrics user will get the details on the EtherChannel status, it will show per device etherchannel status with member ports and the status with a time series graph

Selecting a port channel on any device allows users to view its time series data. This feature enables the analysis of the device's status, ranging from the latest hour up to two weeks of metrics.

QOS (Quality of Service)

Quality of Service (QoS) in networking is used to prioritize certain types of traffic, ensuring critical applications like video conferencing, VoIP, or online gaming get sufficient bandwidth. It helps reduce latency, minimize jitter, and maintain consistent performance, especially in congested networks, by managing and controlling traffic flows based on importance.

This page allow a user to check the active configuration related to:

  • QoS Active Queues on interfaces

  • 802.1p and DSCP to TC mapping

  • PFC enabled queues

  • TC to PG and Queue mapping

  • Scheduler and WRED status

Queues status per device

To verify the activated queues, the user can click on "Show Active" This page displays the status of each queue along with the features enabled for each interface.

All Profile

The "Show All" option allows the user to view all activated configurations per widget, providing a detailed overview of QoS-related settings for each device.

Last updated

Copyright © Aviz Networks, Inc.