Monitor
Overview
The monitor widget in ONES:
Shows the complete topology view of the fabric
User can change the topology view by:
Datacenter
AI-ML
The Topology view can be categorised by:
Region
Platform
ASICs
Statuses
Not streaming, Faulty Fans, Faulty PSUs, Links Down
Metrics
Bandwidth RX & TX
Memory, CPU & ASIC Utilization
This page shows all the links and information connected to each other
Using a right click on device it allow a user to go to specific feature details
Traffic, Health, Capacity, Protocols
Allow a user to connect device via SSH, Console Access
Can get SYSLOGS
Traffic View
Compare Interface RX/TX metrics
PFC Enabled Device view
input/output packets in million per sec
Errors and Discard packets per interface
And more related metrics
AI-Fabric metrics
Health of the devices
CPU & Memory Utilization
CPU & PSU Temperature
PSU Voltage & Fan Speed
SSD Temperature, Health and Memory utilization
Server GPU Metrics
Capacity of the devices
IPv6 & IPv6 Routes
ASIC/Software/Kernel
ASIC ACL capacity
Links Page
All the connected devices
Transceivers info
Protocols status
BGP status
LACP
MCLAG
QOS
VLAN
VXLAN
VRRP
ONES Collector Resource Monitoring
Topology
Navigate to Monitor >> Topology
Topology Type: DATACENTER

Topology Type: AI-ML

This shows the complete Topology view, how the devices are connected
Topology can be filtered by Underlay/Overlay/RoCE/McLag
We can put the filters to check the customized view of the Topology
Count of devices
All devices onboarded
Not streaming
Faulty Fans & PSUs
Links Down
We can also check Down Links to check the topology those are having links in the shutdown state
When we hover the cursor over any device and use right click we get few more controls
Device Details/Ports
Direct Navigation per device
Traffic/Health/Capacity/Protocols
Console connect
SYSLOG

we can also filter the view by using
Statuses
Not streaming, Faulty Fans, Faulty PSUs, Links Down
Metrics
Bandwidth RX & TX
Memory, CPU & ASIC Utilization
Traffic
Using this widget we can check the input and output errors across all the devices
This widget also shows the input and output packet per device
Navigate to Monitor >> Traffic

This page shows the information:
Device Name & IP
Roles & Region
Device details
Interface speed and ports
Errors & Utilization of the links
When we click on any device it gives more information about the interface traffic
Errors per interfaces
Bandwidth Utilisation per interfaces
When we click on any particular interface it gives the timescale of the inputs and output packets with Errors and Discards & all metrics in detail

This page shows the traffic drop rate per interface based and will be very useful while doing any troubleshooting for a traffic drop
Using these details a user can check more details inside to fix the issue of dropping/discarding packets
This page is also capable to show the RoCE related Metrics

Compare
This feature allow a user to compare the traffic between multiple interface and multiple devices interface

Select the devices and interface to see the traffic trends

Health
This page shows the latest utilization of all the devices (Chart View and Number View
CPU & Memory utilization
Temperature & Voltage of PSU
fan speed in % & RPM
SSD Temperature, Health and Memory Usage
Navigate to Monitor >> Health
Chart View

Number View

Health Status
Health Status is reported for the following components
Roles
SKU/ASIC
Ports/Max Speed
CPU Utilization (%)
Memory Utilization (%)
CPU Temperature (℃)
PSU Temperature (℃)
PSU Voltage (V)
Fan Speed (RPM)
SSD
Temperature(℃)
Health(%)
Memory(%)
HOST / IP
Device Name
Device IP
Roles/Region
Device Role
Device Region
Fabric
Shows the fabric related info
SKU/ASIC
SKU (Stock Keeping Unit)
ASIC
Port/Max Speed
Total number of ports available
Speed of ports
CPU Utilization (%)
CPU Utilization reported in 4 states
Normal
Acceptable
Critical - Action needed
Not Streaming - Agent is not up
Click on any device to get the view/status of all the components related to that device
Memory Utilization (%)
Memory Utilization reported in 4 states
Normal
Acceptable
Critical - Action needed
Not Streaming - Agent is not up
Click on any device to get the view/status of all the components related to that device
Average CPU Temperature (C)
Details of the temperature of the CPU across all the devices in degrees celsius
Any device that breaches the configured acceptable or critical value will be shown here
Click on any device to get the view/status of all the components related to that device
Average PSU Temperature (C)
Power Supply Temperature in degrees celsius
Any device that breaches the configured acceptable or critical value will be shown here
Click on any device to get the view/status of all the components related to that device
PSU (Voltage)
Power Supply Voltage readings in volts
Any device that breaches the configured acceptable or critical value will be shown here
Click on any device to get the view/status of all the components related to that device
Average Fan Speed (%)
Fan Speed in % of maximum supported RPM
Any device that breaches the configured acceptable or critical value will be shown here
Click on any device to get the view/status of all the components related to that device
SSD
SSD Status will be shown here
SSD Temperature: will allow a user to track the temperature
SSD Health: will allow a user to check the health utilization in percentage
SSF Memory: this metric will be useful to check the utilization of SSD
Customized View
We can check the health of the device as per some customization
We can filter the devices by:
Roles
Region
Role-based Customization
We can choose a role using the available Role-based option
4 Roles available
Super Spine
Spine
Leaf
ToR
Let’s check it with a Leaf filter
After selecting Leaf input, here is the new view of only devices that belong to the Leaf role

Per Device Status
This Platform Widget also gives the option to check the extended capability view of the device
Apart from this monitoring view, we can also verify/check extended feature sets like:
PSU Current (A)
PSU Power (W)
Services Running
Services CPU/Memory Consumption (%)
To view per device status with all possible widgets, click on any of the devices present on the list

When we choose a specific device we get an output like this

Sync All Charts
The Sync All Charts feature helps users analyze related metrics across multiple widgets simultaneously.
When this feature is enabled, all charts are synchronized by time. It will automatically align the other widgets to the same timestamp. Each widget will display the corresponding data trend for that exact time.
This allows users to easily correlate system performance metrics and identify patterns or anomalies across different resources.

CPU Usage (%)
Here we get the complete status of CPU usage with a time range A complete status What was the usage from starting to end
To check a specific time detail we can hover the cursor to any level

Memory Usage (%)
Now here we get the status of Memory Usage of selected device
To check a specific time detail with memory utilization, we can hover the cursor to any level

Services CPU Consumption (%) per Core/Average
This widget shows us the CPU consumption percentage level of all services / per service.
Here we can see we have the option to check the consumption view of CPU

To check a specific time detail we can hover the cursor on any level

Services Memory Consumption (%)
This widget shows us the Memory consumption percentage level of all services / per service.
Here we can also check the consumption view of only Memory.
To check a specific time detail we can hover the cursor on any level

Services Running
The best widget here for Services
We can also check the total number of count of services running on the platform
This graph shows the red colour bar, red colour show at what time one of the services went down

CPU Temperature (C)
This template shows the status of CPU temperature in degree celsius
Here we get the status of all the CPU and Core running on the device

Top CPU consuming services
This widget shows top 10 services consuming high CPU as primary and along with memory utilization
This widget also reveals the Process ID, empowering users to swiftly take control—pinpoint any specific process and neutralize unwanted behavior instantly.

Components
This page outlines the key metrics for accurately monitoring the performance of various components, specifically focusing on Temperature, Current, Fans, and Power.

SSD
Device Resource Utilization Page
Gain complete visibility into your device’s resource usage with this page—see exactly how resources are allocated per switch and uncover real-time trends in service demand.
Spot a sudden spike? Jump instantly to the detailed utilization page to pinpoint the exact processes behind it. Stay proactive, optimize resource allocation, and prevent performance issues before they even arise.

These metrics can be effectively analyzed through a time series graph.
Capacity
This page shows the view of Capacity and a few more details related to devices

This widget Shows
Roles/Region per device
SKU and ASIC details per device
ASIC ACL Capacity utilization
IPv4 Routes (ASIC, Software, Kernel)
IPv6 Routes (ASIC, Software, Kerneel)
This Capacity Widget give us the control to get the output per Role and Region basis also
Let's choose Leaf Role to get the customized view

In the same way, we can customize the view by Region & SKUs
Per Device Status
This widget gives us the capability to check the extended view of the Routes & ACL usage with a range of time

Click on any of the devices to get the timeseries view

Links
This page gives a view to the user for all the possible connected links between devices with a few more capabilities
Navigate to Monitor >> Links

This page helps a user to get the best view of the number of connections between devices with speed and other manufacturer details
This page gives the exact view of the interface name, interface speed, transceivers and admin & operator status
User can also check the transceiver details with timescale database
Protocols
This Protocol Page shares the metrics of below features
BGP (numbered/ unnumbered)
LACP
MCLAG
QoS
VLAN
VXLAN
VRRP
BGP
This BGP section will help a user to know more and accurate number with following details
BGP 2 byte and 4 byte AS
BGP numbered
BGP unnumbered
Total number of neighbors configured
How many neighbors are up and down
total number of prefixes and how many we are advertising
BGP neighbor details

Neighbor View
This shows the status of the neighbour's details, the total number of neighbours, received routes, neighbour RID, BGP AS number & much moreWe have the option here to check the neighbour details and status of Routes
We can click on neighbours to get more details about all neighbours connected

Per device status (Neighbour's & Announcement)


The user can get per-device status by choosing a particular Device
Click on the device name to get the status

This new page shows the status of BGP neighbours about UP and Down status
On the right side it shows the BGP announcements and the local prefixes present in the BGP table
LACP
Using this Page metrics user will get the details on the EtherChannel status, it will show per device etherchannel status with member ports and the status with a time series graph

Selecting a port channel on any device allows users to view its time series data. This feature enables the analysis of the device's status, ranging from the latest hour up to two weeks of metrics of transition state


MCLAG
This feature enhances network management by allowing users to access a timescale graph. This graph shows the status of neighbour and peer links over time, indicating periods when they were down or active. Users can further examine the health page to determine whether downtime resulted from process issues or resource utilization problems.
Additionally, the feature provides tools for verifying:
MGLAG (Multi-Chassis Link Aggregation Group) Domain ID
The status of PortChannels associated with Peer Links
MCLAG-L3 (Multi-Chassis Link Aggregation Level 3) status
MCLAG-L2 (Multi-Chassis Link Aggregation Level 2) status
With these capabilities, users can effectively track and diagnose network performance and configuration issues.

Time Series Graph with up and down status
Navigate to Monitor >> Protocols >> MCLAG <Choose Devcice> >> Click on Active/Passive


QOS (Quality of Service)
Quality of Service (QoS) in networking is used to prioritize certain types of traffic, ensuring critical applications like video conferencing, VoIP, or online gaming get sufficient bandwidth. It helps reduce latency, minimize jitter, and maintain consistent performance, especially in congested networks, by managing and controlling traffic flows based on importance.
This page allow a user to check the active configuration related to:
QoS Active Queues on interfaces
802.1p and DSCP to TC mapping
PFC enabled queues
TC to PG and Queue mapping
Scheduler and WRED status

Queues status per device
To verify the activated queues, the user can click on "Show Active" This page displays the status of each queue along with the features enabled for each interface.

Mapping Details
The "Show All" option allows the user to view all activated configurations per widget, providing a detailed overview of QoS-related settings for each device.

VLAN
This page provides a complete overview of all VLAN-related information across the entire fabric, including L2 VLANs, L3 VLANs, SVIs, and assigned interfaces and associated VRF.


VXLAN
This section is really helpful in finding the devices with active VXLAN features enabled with a few more details
L2 & L3 VXLAN metrics
Local VTEP
Remote VTEP and details on how many are up and down
VLAN to VNI Mapping
VRF to VNI mapping
Below output shows all the possible devices with VXLAN details

VTEP Details
By clicking on Local VTEP ID we get the most accurate details on that device for all the remote VTEP connected to this


Users can click on operation status to get the time series graph about the up nd down status, at what time the VTEP was up or down

VLAN to VNI Mapping
Users can get the details on VLAN to VNI mapping using the option on this page

VRF to VNI Mapping
Using the below option user can check all the possible VRF to VNI Mapping

VRRP
This section allow a user to check the VRRP enabled devices with active and passive mode with more details.

To see more related VRRP details, Choose the Device and select the Interface IP


ONES Collector Resource Monitoring
Get complete visibility into your collector’s health and performance—all in one powerful dashboard.
The ONES Collector Resource Monitoring page delivers real-time insights into system health, resource utilization, and operational status. Track CPU and memory consumption, monitor Docker container status and resource usage, and keep a close eye on HDD utilization to ensure optimal performance at all times.
With intuitive visualizations and actionable metrics, you can quickly identify bottlenecks, troubleshoot issues, and optimize resource allocation. Need the data offline? Download detailed reports in CSV format for audits, analysis, or sharing with your team.
Key Highlights:
🔍 Real-time collector health overview
📊 CPU & memory utilization at both system and Docker levels
🐳 Docker status and per-container resource usage
💽 HDD utilization monitoring
📥 One-click CSV data export
Stay in control. Stay informed. ONES Collector Resource Monitoring empowers you to keep your infrastructure running at peak efficiency

Last updated