Open Networking Enterprise Suite (ONES) is a Network Orchestration, Visibility, and Assurance solution for multi-vendor and multi-NOS operated Network Infrastructure. ONES provides a one-stop solution from delivering deep visibility into your datacenter networks to extending 24x7 support functions for SONiC. It also hosts a powerful analytics engine that assists users to identify network issues and troubleshoot their networks, in case of common network anomalies and disruptions.
ONES uses Auto-discovery for SONiC devices and a YAML or CSV-based template for adding non-SONiC devices during the onboarding process and continuously collects streaming telemetry data from them to provide insights on;
Data Center Inventory
Network State
Platform and System Health
ONES monitors various control and data plane metrics to provide these insights.
Rule Engine
ONESv2.1 application has the capability to trigger notifications via Slack app notifications when certain user-defined threshold values are breached.
In data centre operations, a rule engine with alerts for various metrics is essential for proactive monitoring and management of critical components and services.
Rule Engine pushes the configured rule notification in case any device breaches the threshold value configured under the rule to SLACK Channel & Zendesk Support page.
Let's see the different types of rule engine metrics for specific Entity/features in a data centre environment
CPU and Memory Utilisation
Fan and PSU LED status
Traffic Bandwidth
ONES Orchestration
ONES orchestration provide network admins to automate the fabric configuration using configuration templates for provisioning physical interfaces, layer 3 configuration for building IP-CLOS fabric using
BGP as a routing protocol including BGP-unnumbered
Symmetric/Asymmetric IRB
BGP Peering with PO
ONES orchestration not only configures the fabric but also make sure the Fabric is operational by doing verifying the configuration at every stage.
ONES provides north bound API access for configurations originating from external orchestration tools.
Data Lake
A data lake is a centralized repository that allows you to store vast amounts of structured, semi-structured, and unstructured data in its raw format. Unlike traditional data warehouses where data is stored in a structured manner, a data lake retains the data in its native format until it's needed for analysis or processing.
ONES provide the capability to store the RAW data of all the Metrics to Cloud and then user will be able to use that RAW data for any deployment or any other use cases.
Here are key components and characteristics of a ONE DL.
Storage of Diverse Data Types: A data lake can store various types of data, including structured data (like relational databases), semi-structured data (like JSON, XML), and unstructured data (like documents, images, videos). This flexibility allows organizations to ingest and store data from different sources without the need for extensive preprocessing.
Scalable and Cost-Effective Storage: Data lakes are typically built on scalable storage systems, such as cloud-based object storage (e.g., Amazon S3, Azure Data Lake Storage) or Splunk . These systems can efficiently handle large volumes of data and offer cost-effective storage solutions.
Schema-on-Read Approach: In contrast to traditional data warehouses that use a schema-on-write approach (where data must be structured and conform to a predefined schema before storage), data lakes adopt a schema-on-read approach. This means that data is stored in its original form, and the schema is applied at the time of data retrieval or analysis. This flexibility allows users to apply different schemas and interpretations to the same dataset based on their analytical needs.
In summary, ONE DL provides a flexible and scalable platform for storing, managing, and analyzing diverse data types at scale. By leveraging a schema-on-read approach and supporting various analytics tools, ONES DL facilitate advanced data analytics and enable organizations to derive valuable insights from their data assets. However, proper governance, security, and metadata management are crucial to ensure the usability, reliability, and integrity of data lakes.
Control and Data Plane resource Utilisation
Traffic Utilisation
Software Compliance
ASIC Routes
Health Services
Traffic Errors and Discard Counters
BGP Neighbours flapping notification
Device down status
Link flap status
Device SSD Memory Utilization, Health and Temperature
ROCE Counters
L2/L3 MC-LAG
EVPN MultiHoming
Layer2 Leaf-Spine (L2/L3 Mode)
Rack-to-Rack Deployment
BGP Peering over MC-LAG PeerLink
BGP Peering using separate Link between MC-LAG Peers
SFLOW
DHCP Relay
SAG / SVI
NTP, SNMP, SYSLOG
Incremental Config update for L2VNI/L3VNI
Enhanced backup and restore options via UI
Enhanced API support - Config Replace
Support for Big Data Processing and Analytics: Data lakes serve as a foundational component for big data analytics and processing. Users can perform various analytics tasks, including exploratory data analysis, data mining, machine learning, and real-time analytics, directly on the data lake. Tools like Apache Spark, Apache Hive, and Presto are commonly used for querying and processing data stored in data lakes.
Support for Data Discovery and Self-Service Analytics: Data lakes enable data discovery and self-service analytics, empowering users to explore and analyze data without extensive dependencies on IT teams. Data scientists, analysts, and business users can access relevant data directly from the data lake, speeding up insights generation and decision-making processes.
VXLAN-SAG
MCLAG
ONES Time Scale Metric Calculation
Overview
ONES Agent pushes all the telemetry at every 20 seconds to the ONES Collector, so DB will have the values of every 20sec, Further as per time series selection ONES-UI plots the Graph with different average values.
Below are the Average values Calculated by ONES-UI as per Time-Scale:
12Hours
5Minutes
180
1Day
10Minutes
180
1Week
1Hour 10Minutes
180
2Weeks
2Hours 20Minutes
180
Time Series
Average between 2 data points
Data Points
Reference
1Hour
30sec
120
2Hours
40sec
180
4Hours
1Minute 20sec
180
ONES Telemetry Collector(s) and Visibility
Overview
ONES Telemetry Collector(s) and Analytics bring truly unparalleled visibility across all your switches running SONiC (both community and vendor distros), regardless of the underlying ASIC. ONES front end (UI), will enable network admins to;
Manage inventory of your network devices running SONiC on Broadcom, Cisco, Marvell, and Nvidia ASICs
View the topology of the entire fabric across multiple hardware platforms, and network operating systems
Monitor traffic, system health, bandwidth utilization, & more
Topology is the new default page for ONES application
This page shows Underlay, Overlay, RoCE telemetry view & other Advance Filtering view
The same Topology page allows a user to connect to the device SSH and Console
Enhanced Traffic Page shares the PFC enabled Interfaces
Track Switch CPU/memory consumption, bandwidth, link failures, traffic errors, and more in real-time
Proactively identify and resolve issues that may lead to network downtime
Instantly connect to individual devices for maintenance and troubleshooting
Inventory Page
Syslog extraction for device, Console access, Add/Remove Non-SONiC devices via YAML or CSV, export or download inventory
Firmware information is added in Device details section
Day 2 Operations (Incremental Config etc) with API
Config Generate, Validate, Apply & Diff
Integration & Alerting
Rule Engine with alerting on metrics
Integration with Slack, Zendesk etc
ONE-DL
Network Performance Monitoring
Packet Loss & Latency
Figure1: Topology page
ONES Supportability
AVIZ Support Overview
Network Assurance helps the NetOps team validate policy and security compliance checks before making a change in network configuration, an intelligent set of proactive and predictive techniques that validate the Network for readiness without error, conflicts, and disruptions
Aviz Support team is located across four timezones offering 24x7 SONiC and related product support for multi-vendor switches and ASICs. Using our support portal, we offer you to
Collaborate with our SONiC experts to expedite your evaluations
Speed up your SONiC troubleshooting SLAs to as low as 15 minutes regardless of the underlying Switch/ASIC platform
Minimize operational delays by centralizing issues across multiple platforms
Users can reach out to customer support on
Supports Options are available:
Integrated Chat
Submit a Ticket
Send an email to support@aviznetworks.com
Refer to the "" section of this document for more details
ONES Supportability
To connect with customer support users can choose the support option available on ONES-UI
ONES Orchestration
Why do we need Network Orchestration?
Orchestration refers to tasks or actions required to achieve a set of objectives for your Network Infrastructure operations
A centralized application like ONES translates these objectives into a network configuration template, applies and monitors to validate the operational efficiency and functionality
Automated tasks are performed on your Network Fabric in a purposeful order and each step is verified for success before moving to the next
ONES Orchestration - Overview
ONES Orchestration function, referred to as Fabric Manager (FM), lets you compose, deploy, and validate network configurations across any SONiC, be it a Community version or a Vendor distro.
As part of the initial release, ONES Orchestration supports to
Create and configure CLOS topology for ToR, Leaf, Spine, and Super-Spine layers
Apply and validate configurations pre- and post-deployment
Compare running configs against applied configs at any point
ONES Orchestration use cases are configured using a set of pre-defined YAML-based templates on ONES Web User Interface
FMCLI
Fabric Manager CLI
FMCLI is an Industry standard Command Line Interface
Once the user installs Orchestrator Agent (Fabric Manager Agent) on the device, it enables FMCLI
FMCLI provides a user interface to configure all the open standard protocols and is user-friendly
To use FMCLI, the user can run fmcli command on the device to enter in the configuration mode and can configure the protocols or any other required feature
Example of BGP config using fmcli
Supported FMCLI Features
Zero Touch Provisioning
Image Management
NetOps API
NetOps API can be used to integrate into customer-running applications, and can be used to perform the Day 1 and Day N configuration, Using NetOps API a user can do all the configurations and can also perform the Partial Configuration
Few Operations can be done By NetOps API
Day-1 Operations: intent upload
SONiC NOS upgrade
Device Reboot
Difference between the Golden Config and running configuration //running & applied configuration
For more Details on NetOps API check
ONES Rule Engine
Overview
In data center operations, a rule engine with alerts for various metrics is essential for proactive monitoring and management of critical components and services. Let's discuss the need for rule engine alerts for specific metrics in a data centre environment
CPU and Memory Utilisation
Fan and PSU LED status
SSD Memory Utilization, Health and Temperature Status
Traffic Bandwidth
ASIC Routes
Health Services
Device Down alerts
Interface Flap Alerts
Traffic Errors and Discard Counters
PFC Counters
Device Queue Counters
Rule engine alerts ensure efficient resource utilization, timely troubleshooting, early detection of potential issues, and overall operational stability within the data centre environment.
Notification
ONES-App is capable of triggering breached threshold values to
Slack Channel
Zendesk Support ticket
Rules are categorized based on the metric hierarchy
Device Level
Interface Level
List of all the Metrics Supported by Rule Engine with possible units and measured value a user can use
ONES Security
ONES is a support application for SONiC stack. It is designed for customer's engineering team such as SRE’s, HW and SW engineering teams for their daily network diagnosis and troubleshooting needs. In addition to that ONES exposes the API to integrate with external tools or customer homegrown applications.
This section describes how ONES authenticates users and secures communication.
Features
Getting Started
ONES Installation
Upgrade devices with a single click via ZTP or custom NOS images
Restore & Backup configuration feature
Yaml-based config for VXLAN, MCLAG, BGP IP CLOS & EVPN(L2VPN), EVPN Multihoming, L3 EVPN Symmetric IRB, L3EVPN symmetric IRB with MCLAG.
Automate Configuration of interfaces, layer 3 interfaces, BGP-unnumbered and Common Services like NTP, SNMP, SYSLOG etc.
Configuration Management
Interface Management
VLAN's
Spanning Tree Protocol
VXLAN
L2 Forwarding Database
LLDP
LACP
DHCP Relay
IP Management
ARP
PING
Traceroute
Routing
BGP*
NTP
SYSLOG
Platform Details
SFLOW
NAT
Forward Error Correction
BFD
SNMP
VRF
AAA & TACACS
Drop Counters
ERSPAN
IP Based ACL
Prefix-list
EVPN Multihoming
Route-map
Backup Running Configuration
Replace Config
If config is done by the ONES then only the user can use this replace option to modify the config.
ONES provide RBAC support for creating dedicated user accounts. it has a superadmin account which can manage these user accounts for control and permissions
Secure Access to Application
ONES Application provides HTTPS over standard port 443 supporting both self-signed and CA-signed certificates
Secure Access to switches
Auto-discovery communication between Agent and collector using a secure channel(SSL/TLS) with certificates (self-signed and CA-signed certificates
API Access
ONES Application provides HTTPS over standard port 443 supporting both self-signed and CA-signed certificates, the API is available via time-bound authentication tokens.
RBAC: Role-Based Access Control
Click to get more details on RBAC
Secure Access to the Application
ONES application provides HTTPs over standard port 443 supporting both self-signed and CA signed certificates.
HTTPS Support CA Signed
HTTPS Self Signed
Secure Access to the switch*
ONES utilizes gRPC infrastructure to communicate with switch agents. TLS (Transport Layer Security) is the primary security protocol used by gRPC to secure communication between the client and the server. TLS provides authentication, confidentiality, and integrity of data. Authentication is achieved using digital certificates, which verify the identity of the client and the server.
With an added extra layer of security, ONESv2.1 support Certificate based communication between switches and ONES Controller, and all the metrics will be streamed using the certificate-based encryption
Agent Based Deployment with TLS certificate
Transport Layer Security (TLS) is a crucial protocol that ensures secure communication between ONES Controller and Agent, Whenever Agent will register to ONES server and further it will start sending the update it will encapsulate all the metrics and will do the encryption based on certificate provided at the time of installation, by using this all the communication will be encrypted between ONES agent and ONES controller
TLS relies on digital certificates issued by trusted Certificate Authorities (CAs) to authenticate servers and sometimes clients. These certificates validate the identity of the entities involved in the communication and establish trust in the encrypted connection.
A data lake is a centralized repository that allows you to store vast amounts of structured, semi-structured, and unstructured data in its raw format. Unlike traditional data warehouses where data is stored in a structured manner, a data lake retains the data in its native format until it's needed for analysis or processing.
ONES provide the capability to store the RAW data of all the Metrics to Cloud and then user will be able to use that RAW data for any deployment or any other use cases.
Here are key components and characteristics of a ONE DL 1.0.0
Storage of Diverse Data Types: A data lake can store various types of data, including structured data (like relational databases), semi-structured data (like JSON, XML), and unstructured data (like documents, images, videos). This flexibility allows organizations to ingest and store data from different sources without the need for extensive preprocessing.
Scalable and Cost-Effective Storage: Data lakes are typically built on scalable storage systems, such as cloud-based object storage (e.g., Amazon S3, Azure Data Lake Storage) or Splunk . These systems can efficiently handle large volumes of data and offer cost-effective storage solutions.
Schema-on-Read Approach: In contrast to traditional data warehouses that use a schema-on-write approach (where data must be structured and conform to a predefined schema before storage), data lakes adopt a schema-on-read approach. This means that data is stored in its original form, and the schema is applied at the time of data retrieval or analysis. This flexibility allows users to apply different schemas and interpretations to the same dataset based on their analytical needs.
In summary, ONE DL provides a flexible and scalable platform for storing, managing, and analyzing diverse data types at scale. By leveraging a schema-on-read approach and supporting various analytics tools, ONES DL facilitate advanced data analytics and enable organizations to derive valuable insights from their data assets. However, proper governance, security, and metadata management are crucial to ensure the usability, reliability, and integrity of data lakes.
ONES Cloud Service Integration
As of now ONES support 2 different platforms where customer can get the RAW data
Splunk
Amazon S3
Users can control the behaviour of the Catalog
Users will have the option to tune the frequency of streaming the metric to the cloud platform, user will have the option to tune frequency starting from 1 minute to 60 minutes.
Users can select/unselect the Network state metrics using the above catalogue option
What's new?
ONES Package
Installation Allow user to add DataLake end point
Agent installation Allow user to add one more controller IP without reinstallation of ONES-Agent
ONES UI
New Topology Page with more filters
Device SSD Details
Temperature
ONES Telemetry
SSD Resource Check
Temperature
Health
ONES Orchestration
Yaml Config Illustrator
Multivendor SONiC Support
Improved Session Management
Seamless Copy-Paste Functionality
ONES Rule-Engine
SSD Temperature
SSD Health
SSD Memory
Device Down
ONE DL 1.0.0
Integration of Splunk
Integration of Amazon S3
Supported Switch Platforms and NOS
SONiC Supported Broadcom Platforms:
Speed
Vendor (Models)
Subscription
ONES provides the following subscriptions to manage and monitor the devices.
Cumulus Linux, Arista EOS & Cisco NX-OS platforms are considered by ONES as Agent-less and supports metrics available using NVUE and EOS APIs
Vendor
NOS
Version
Arista
EOS
4.x
Cisco
NXOS
9.x
NVIDIA
Cumulus Linux
5.9, 5.11
Agent-based vs Agent-less
SONiC-based switches require ONES Agents (Agent-based) to be installed on the switch being monitored, as a pre-requisite for ONES Telemetry and orchestrator-based functions to work.
ONES Telemetry Agent
ONES Orchestrator Agent
Proprietary NOS like Arista EOS, Cumulus, and Cisco NX-OS does not require an ONES Agent and instead leverage the OpenConfig (Agent-less) feature. OpenConfig extends APIs that provide Network Telemetry information about the resources being monitored via gNMI (gRPC Network Management Interface) protocol to the ONES Application
NX-OS expose its own way of metric collection using GRPC
ONES does not support Orchestrator-based functions on Proprietary NOS (non-SONiC).
Agent requirements
SSH access
SONiC versions beyond 202012 or 202111 are supported
For Trail License Engineer needs to contact to AVIZ support Team.
8
Support up to 8 devices
16
Support up to 16 devices
32
Support up to 32 devices
64
Support for Big Data Processing and Analytics: Data lakes serve as a foundational component for big data analytics and processing. Users can perform various analytics tasks, including exploratory data analysis, data mining, machine learning, and real-time analytics, directly on the data lake. Tools like Apache Spark, Apache Hive, and Presto are commonly used for querying and processing data stored in data lakes.
Support for Data Discovery and Self-Service Analytics: Data lakes enable data discovery and self-service analytics, empowering users to explore and analyze data without extensive dependencies on IT teams. Data scientists, analysts, and business users can access relevant data directly from the data lake, speeding up insights generation and decision-making processes.
Installation Pre-requisites
Installation Overview
ONES Installation follows the below steps in the order sequence of:
License Readiness
Preparing and Installing ONES Application machine
Installing ONES Agents on SONiC Switches for Orchestrator and Telemetry
Enabling OpenConfig on non-SONiC Switches for Telemetry
License Readiness
After installation of ONES application user needs to contact AVIZ support to generate a trail license, trail license works for 8 devices and upto 30 days a customer can use.
Users can get the ONES installation ID on the ONES-UI Login page after the installation
System Hardware Requirements – ONES Application
In the current release, ONES can support managing up to 1024 devices. For ONES Application Installation, the system hardware requirements vary based on the number of devices to manage;
Devices
Processor and Cores
RAM
Storage
System Software Requirements - ONES Application
OS
Libraries
Task
Command
Validation
ONES Application package will take care of this prerequisite at the time of installation, Package verify the availability of the dependencies first then execute the application scripts
Note* Script do not take care about the update to latest version of ubuntu
Customer Firewall Configuration (Ports to be opened)
Ports to be open from Agent(Source) to ONES controller(Destination)
These ports has to be enable on ONES Controller
ONES Service
Port Numbers
Ports to be open from ONES controller(source) to Agent(Destination)
These ports has to be enable on Device(Switch)
ONES Service
Port Numbers
Ports to be open on ONES Server for ONES Services
ONES Service
Port Numbers
Ports to be open for HTTPS Access
HTTPS port has to be enable if a firewall is present in between User-machine and ONES-Controller
ONES Service
Port Numbers
These port numbers should be available to use and all ports must be allowed in the firewall if the Database server and devices are in the different DMZ zone
sudo iptables -L // This command can be used to verify the used ports
The installer file automatically detects & processes fresh installation or upgrade to the new version
While upgrading there is no dependency of prevision version files,
Once the upgrade process is completed, user manually have to delete the previous version files/Packages from the device, Script do not touch old version files
By default, the installer does not provide any license, user needs to contact AVIZ support Team.
ONESv2.1 support SSL certificate integration
User can choose YES if the User wants to integrate their own SSL certificate
ONESv2.1 support certificate-based authentication between ONES App and devices for GNMI and Auto-discovery
For agent auto-discovery agent will act as a client and the collector as a server.
For normal gnmi communication, the agent will act as a server and the collector as a client. Need certificates based on this.
The user needs to provide the certificate path and replace the key name with the path of the certificate to be used here
ONES Application support IP-based Access & FQDN Access
Enter the ONES App URL: https:// #Replace the input with IP or FQDN
IP based
FQDN based
Installation begins
Access ONES Application Web GUI from a supported browser using https://<host-ip/FQDN>
Activation:
For a Trail license, the user needs to reach out to AVIZ Support
For Activation, the user can choose Activate License if the user has an activation key of any subscription
Users can activate the ONES Application first time just after installation(first-time ONES application shows the page to activate the license
After evaluating ONES application, the user will have the option to activate the license anytime from the License Page
1. Activate License
2. Activate Key
Use Default credentials as below;
Username: superadmin
Password : Admin@123
Password should contain:-
Minimum Password Length - 8 characters
Maximum Password Length - 24 characters
Character Support - Alpha Numeric
Login To ONES
After Resetting the password use new credentials to login
You will see the default Monitor Page with a Topology view
1. Upgrade License
After Trail use, if the user wishes to upgrade new Subscription-based license, so user can navigate to the below page.
Account >> License >> Upgrade License
Click Upgrade License & Enter the subscription-based key
Installing ONES Agents 2.1
Overview
ONES requires user to install the below agents on SONiC NOS to allow Network Orchestration and Visibility
ONES Orchestrator Agent for Network Orchestration
ONE-DL cloud Deployment
Using this section user can install ONES-DL backend AWS
Provisioning an EC2 Instance
AWS EC2 Instance Sizing for Event Ingestion
ONES Orchestration Agent Installation
On the ONES Application server, go to ONES-2.1/ones_fm_agent
root@ones-application:~$ cd /ONES-2.1/ones_fm_agent
Installation (Agent Install on multiple switches at the same time)
Download ONES Package
User can download the Latest version of ONES on Support Portal.
Please refer to the link for downloading latest version of ONES Application
NOTE: You are required to sign-up on for getting access to the download page.
Work with Aviz Sales/Support contact to create an account on Aviz Networks Support Portal
ONES Telemetry Agent for Telemetry Data Streaming (Network Visibility)
ONES Agent 2.1 Version allow the user to add a new controller without doing the complete installation again, earlier user had to tune the Agent configuration file or perform the complete installation again with the new controller IP
NOTE: for non-SONiC switches,
OpenConfig feature on its NOS needs to be enabled for Network Visibility (Telemetry Data Streaming)
Network Orchestration is not supported
SONiC NOS upgrade scenario - Impact on ONES Agents
SONiC NOS Upgrade could be done either via
ONES UI (Inventory-->Devices)
Instead of using FM - Orchestrator Agent
Orchestrator Agent takes a backup of FMCLI, ONES Agents and associated services to the /host folder.
After a successful upgrade, Orchestrator Agent restores these files
Traditional method (ZTP, sonic-installer CLI)
The user needs to reinstall ONES Agents again
VM Deployment
ONES Application can be integrated in the network as a Virtual Machine(VM) Package
VM Packages
QCOW2 Package: Qcow2 can be imported any KVM Hypervisor based application
OVA/OVF Package: OVA can import in
VMware workstation/Fusion
ESXI Server
Virtual-Box
VM Package Upgrade
QCOW2 & OVA, both packages are supported for an upgrade to latest version
Agent Less Telemetry
Network Device Configuration Interfaces
Cumulus (NVUE API)
Cumulus Networks offers the NVUE (Network Virtualization Utility Engine) API, providing an abstraction layer over traditional configuration mechanisms. This allows for a more intuitive and standardized approach to network configuration and management, echoing modern software development practices.
Arista EOS (OpenConfig)
Arista's EOS platform leverages OpenConfig, a collaborative effort among network operators to define vendor-neutral data models for configuring and managing networks. OpenConfig facilitates simplified, consistent interactions across different network devices.
Cisco NX-OS (gRPC)
Cisco's NX-OS supports gRPC, enabling efficient, scalable, and programmatic network device management. This interface allows for the streaming of telemetry data and the execution of configuration commands.
Syslog access, Console/SSH access for device
Email ID: (For Account creation)
128GB
6 TB or more
512
x86/x64 based,
32-core CPU
256GB
12 TB or more
1024
x86/x64 based,
64-core CPU
512GB
20 TB or more
sudo apt-get install docker-compose
docker-compose version
Install Python3
sudo apt-get install python3
python3 –-version
Install Python3-pip
sudo apt-get install python3-pip
pip3 –-version
Install Paramiko
sudo apt-get install python3-paramiko
pip show paramiko
Install SCP-Client
sudo pip3 install scp
pip show scp
8080
stream-processer
8093
ksqldb-server
8088
kafka-connect
8083
schema-registry
8081
broker
29092, 9101, 9092
Zookeeper
2181
ONES Collector
50053
8/16/32/64
x86/x64 based,
16-core CPU
32GB
160GB/320GB/640GB/1.2 TB
128
x86/x64 based,
16-core CPU
64GB
3 TB or more
256
Ubuntu 18.0 or later
docker, docker-compose
python3, python3-pip
paramiko
scp
Ubuntu Server
Installer file (Version 18 or higher)
lsb_release -a
Update to latest packages
sudo apt-get update
NA
Install Docker
sudo apt-get install docker.io
docker ps
ONES Collector
50053
Switch Access over SSH
22
ONES Monitoring
50052
gNMI Gateway (Telemetry)
9339
ONES Telemetry Database
5432
ONES Orchestrator
8787
ONES Orchestrator Database
2345
pty-server
8885
ONES Web GUI
443
x86/x64 based,
32-core CPU
Install Docker-compose
API-Server
Enter device details (Management IP, Username, Password ) in device_info.csv
root@ones-application/ONES-2.1/ones_fm_agent:~$ vi device_info.csv
using this process, Script will clear the base config like port--channel related config, IP related config, VXlan related config and more related configuration.
ONES Agent configuration file allows user to add new collector(controller) after the agent installation if required
Overview
To redirect agent telemetry data to a different ONES collector without reinstalling the agent, simply re-run the script with the "Only controller IP addition" option enabled. This process automatically registers the device with the new ONES application and starts the telemetry data stream.
Note: The terms "collector" and "controller" are synonymous in this context. It is essential to note that the auto-discovery feature is limited to supporting just two controllers.
Update controller IP without installing agent
Upon upgrading the Controller IP, it will automatically register with the new ONES
ONES Telemetry Agent Installation
ONES Agent v2.1 support Agent Auto discovery
ONESv2.1 Agent support auto-discovery feature
ONESv2.1 Agent support to send telemetry on multiple controllers (Max 2)
Restrict IP feature can be enabled/disabled
Only Collector upgrade after Deployment is possible now
Using the Restrict IP feature agent will discover the ONES Controller and will update the entry on the ONES App with all the feature metrics
Need to add a few inputs while installing the agent
Controller IP //To restrict the telemetry streaming
Device Credentials
Installation
On the Application machine, go to ONES-2.1/ones_t_agent folder
root@ones-application:~$ cd /ONES-2.1/ones_t_agent
Installation (Agent Install on multiple switches at the same time)
Enter device details (Management IP, Username and Password ) in device_info.csv
root@ones-application/ONES-2.1/ones_t_agent:~$ vi device_info.csv
The user needs to add all the required details in the CSV file, This CSV file will be used to push this information to agent.conf(/etc/sonic/agent.conf) file to every switch and ones-agent on the switch will pick the details from agent.conf file and will register itself to ONES controller with all the given parameters
this helps a NetOps engineer to directly add a CSV file containing all the details, The Engineer needs not to add one by one devices on the controller which actually is time-consuming
The user must maintain the layer names exactly as specified above (case-sensitive). If the user inputs names that differ from these, they may encounter issues when using the ONES application.
Save the File
Executing the installation script can be used for installing a telemetry agent on one or more devices in the data centre.
The installer file automatically detects & will process fresh installation or upgrade to the new version
While upgrading, all the previous files will automatically get deleted on the Switch
If users want to use the certificate for GNMI & Auto-Registration, so users need to put the certificate in directory gnmi-certs(for GNMI) & auto-reg-certs(for Agent Auto Registration)
Users can use ONES-Agent as an integrated service in SONiC OS or can use it as an independent third-party container.
Users can choose this option to only update one more controller IP without doing the complete agent installation.
Scripts asks to put the Controller IP to use auto-discovery feature
User can only add 2 Controller IP to restrict the telemetry streaming
User can choose the restriction to send telemetry to collector IP only
It's important to restrict collector IP as NO in case the running network has NAT translation from private to public IP for ONES server access from the device.
Installation Begin
Now Agent will only stream the metrics to the given controller & will autoregister on the ONES-App
The user needs to make sure, The devices have a unique name, otherwise, there will issue while plotting the full topology view(Topology Page).
Arista EOS (OpenConfig)
Introduction
To enable Arista switches running EOS to stream telemetry data to ONES controller, API gNMI and eAPI need to be enabled
Enabling eAPI
Verification eAPI
Cumulus(NVUE API)
Cumulus switch with version >4.4 to start streaming to ONES Controller via NVUE API
ONES IS not using NCLU. ONES Application only use NVUE API from OS version 4.4, less than that NCLU code is not enabled for ONES. [For ONES 1.1 testing 4.4 and 5.2 version]
Cumulus 5.x not fully support NCLU, only NVUE.
VMware ONES Deployment
Download OVA Package
Work with Aviz Sales/Support contact to create an account on Aviz Networks Support Portal
Login to with your account credentials
Click on the Downloads section, under ONES, and click to download ONES Release 2.1
VM Compatibility
ONES-2.1.0 OVA support below Versions of VMware Family
Import OVA to ESXI Server
Login to ESXI >> Create / Register VM
Choose Deploy a virtual machine from an OVF or OVA file >> NEXT
Give it a Valid Name >> Click to select Files or drag/drop (upload from the download folder)
Choose the downloaded OVA package >> NEXT
Choose preferred storage to run ONES-2.1.0 VM >> NEXT
Choose a Network Adapter to provide DHCP IP to ONES App (Management interface/Eth0) >> NEXT
Verify all the inputs >> FINISH
After the OVA upload to ESXi is complete and the status indicates "Successful," the user will then be able to use ONESVM.
ONES is Ready to use >> Power On the ONES VM
Credentials to access ONES OVA VM
Once logging into the server CLI using below credentials, please continue with the next steps that is Ones Agent Installation
Expand HDD
vCPU & RAM can be expended without any dependency
Upgrade VM
Upgrade Process
To upgrade any VM (QCOW or OVA)
KVM ONES Deployment
QCOW Deployment
Download Qcow2 Package
ONES Web GUI Administration
ONES User Interface - Features
SONiC Devices use auto-discovery
Installing Open Networking Enterprise Suite (ONES)
..................................................
ONES is getting installed for the first time, choose appropriate options when prompted...
....................
Installing prerequisites for ONES application
....................
....................
....................
....................
Installing ONES application...
Do you want to install domain SSL certificate(if not, installation will proceed with a self signed certificate)? [y/n]: n
Using self signed certificates...
Do you want to enable ONE-DL feature? [y/n]: y
Since ONE-DL configuration has been chosen, please provide the information below...
Enter EC2 ONE-DL Backend Public DNS Endpoint: <Path>
Do you want to enable ONE-DL feature? [y/n]: n
Installing Open Networking Enterprise Suite (ONES)
..................................................
ONES is getting installed for the first time, choose appropriate options when prompted...
....................
Installing prerequisites for ONES application
....................
....................
....................
....................
Installing ONES application...
Do you want to install domain SSL certificate(if not, installation will proceed with a self signed certificate)? [y/n]: y
Enter the path to the private key file: ./certs/server.pem
Enter the path to the certificate file: ./certs/server.crt.pem
Local backup:
Do you want to enable DB backups? [y/n]y
Where do you want to store the backups? [local/remote]: local #local keyword trigger local database on server
Enter the backup directory: ./backups #Enter the server directory in which user wants to take backup
Enter the number of backups (between 1 and 3) to retain (Older backups will be deleted): 1 #Enter the number of backup user wants to create
Enter the backup interval in seconds (3600 seconds or higher): 86400 #Enter the value in seconds to take a backup
Remote backup:
Do you want to enable DB backup feature? [y/n]: y
Where do you want to store the backups? [local/remote]: remote #remote keyword trigger remote database on server
Please make sure the remote server is reachable via SSH
Enter the remote machine IP: 10.0.0.1
Enter the remote machine username: admin
Enter the remote machine password:
Enter the backup directory: ~/backups #Enter the remote server directory in which user wants to take backup
Backup is being done in 10.0.0.1 at ~/backups
Enter the number of backups (between 1 and 100) to retain (Older backups will be deleted): 5 #Enter the number of backup user wants to create
Enter the backup interval in seconds (3600 seconds or higher): 86400 #Enter the value in seconds to take a backup
No:
Do you want to enable certificate based authentication between ONES controller and devices? [y/n]: n
Yes:
Do you want to enable certificate based authentication between ONES controller and devices? [y/n]: y
Enter the path to the ca-cert.pem file: ca-cert.pem
Enter the path to the server-cert.pem file: server-cert.pem
Enter the path to the server-key.pem file: server-key.pem
Enter the path to the client-cert.pem file: client-cert.pem
Enter the path to the client-key.pem file: client-key.pem
Proceeding with certificates for Agent Auto Registration
Enter the path to the ca-cert-reg.pem file: ca-cert-reg.pem
Enter the path to the server-cert.pem file: server-cert.pem
Enter the path to the server-key.pem file: server-key.pem
Enter the path to the client-cert.pem file: client-cert.pem
Enter the path to the client-key.pem file: client-key.pem
Enter the ONES App URL: https://192.168.1.1
Enter the ONES App URL: https://ones.aviznetworks.com
Installing Open Networking Enterprise Suite (ONES)
..................................................
....................
....................
....................
....................
....................
Installing ONES application...
Do you want to install domain SSL certificate(if not, installation will proceed with a self signed certificate)? [y/n]: n
Using self signed certificates...
Do you want to enable ONE-DL feature? [y/n]: y
Since ONE-DL configuration has been chosen, please provide the information below...
Enter EC2 ONE-DL Backend Public DNS Endpoint: ec2-69-696-9-69.us-west-2.compute.amazonaws.com
Successfully copied 2.05kB to /home/aviz/ONES2.1/collector_old.json
Previous collector has no excluded tables..........
Do you want to enable DB backup feature? [y/n]: n
Do you want to enable certificate based authentication between ONES controller and devices? [y/n]: n
Enter the ONES App URL [https://<host-ip or domain>]: https://192.168.1.1
Setting up the environment and loading essential dockers...
cfb9c9006507: Loading layer [==================================================>] 68.31MB/68.31MB
a7b5f6d81cf5: Loading layer [==================================================>] 2.56kB/2.56kB
aad629396b2c: Loading layer [==================================================>] 143.1MB/143.1MB
5f70bf18a086: Loading layer [==================================================>] 1.024kB/1.024kB
Loaded image: avizdock/ones-collector:v2.1.0
...
...
...
Name Command State Ports
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
api-server java -jar /app/apiserver.jar Up 0.0.0.0:8080->8080/tcp,:::8080->8080/tcp
broker /etc/confluent/docker/run Up 0.0.0.0:29092->29092/tcp,:::29092->29092/tcp, 0.0.0.0:9092->9092/tcp,:::9092->9092/tcp,
0.0.0.0:9101->9101/tcp,:::9101->9101/tcp
docker python3 app.py Up
kafka-connect /etc/confluent/docker/run Up (healthy) 0.0.0.0:8083->8083/tcp,:::8083->8083/tcp, 9092/tcp
ksqldb-server /etc/confluent/docker/run Up 0.0.0.0:8088->8088/tcp,:::8088->8088/tcp
ones-collector java -jar -XX:MaxGCPauseMi ... Up 0.0.0.0:50053->50053/tcp,:::50053->50053/tcp, 8093/tcp
ones-collector-db /docker-entrypoint.sh postgres Up 0.0.0.0:5432->5432/tcp,:::5432->5432/tcp, 8008/tcp, 8081/tcp
ones-fm /bin/sh -c { gunicorn --wo ... Up 0.0.0.0:8787->8080/tcp,:::8787->8080/tcp
ones-fm-db docker-entrypoint.sh postgres Up 0.0.0.0:2345->5432/tcp,:::2345->5432/tcp
ones-gateway ./gnmi-gateway -TargetLoad ... Up 0.0.0.0:9339->9339/tcp,:::9339->9339/tcp
ones-pty-server docker-entrypoint.sh node ... Up 0.0.0.0:8885->8885/tcp,:::8885->8885/tcp
ones-rule-service java -jar /app/rule-engine.jar Up 8080/tcp
ones-rule-service-db docker-entrypoint.sh postgres Up 5432/tcp
ones-ui docker-entrypoint.sh node ... Up 3002/tcp, 0.0.0.0:443->443/tcp,:::443->443/tcp
schema-registry /etc/confluent/docker/run Up 0.0.0.0:8081->8081/tcp,:::8081->8081/tcp
stream-processor java -jar /app/stream-proc ... Up 8080/tcp
zookeeper /etc/confluent/docker/run Up 0.0.0.0:2181->2181/tcp,:::2181->2181/tcp, 2888/tcp, 3888/tcp
Finishing up ONES Installation...
............................................................
Upgraded ONES application successfully...
....................
Open the ONES application at https://192.168.1.1
Document the `instance_id`, `vpc_id`, `region`, and `security_group_id` of the provisioned instance.
ubuntu@ip-172-31-28-5:~/ONES-DL-CLOUD$ tar -xvf one-dl.tar.gz
docker-compose.yml
one-dl-multitenant-installer.sh
.env
VM packages do not support upgrades from one VM to another VM
once the VM is deployed in the network, then the ONES tar-ball file can be used to upgrade it.
Non-SONiC devices needs to add using YAML editor or using CSV file
Deep Telemetry for ASIC and Switch Hardware
Device Inventory details on
Network Operating System (NOS)
Firmware versions - ONIE, BIOS, and CPLD
Hardware SKU, Model, ASIC, and Serial Number
Platform Components – Fan, PSU, Sensors
Link/Interface Health – Speed, Connectivity, Transceivers/Cables
Inventory Operations
Adding/Removing devices using YAML or CSV file
Agent Status Monitoring
Device Monitoring
Device Up/Down State based on Agent and Agent-less
Region and Zone Mapping
Device Roles – Access, Leaf-Spine, Super-Spine
Device Storage monitoring
SSD Temperature
SSD health
SSD memory
Network Compliance with version checks on
Telemetry Agent
Orchestrator Agent
ONIE, NOS, and Linux Distros versions
Resource Trends
CPU and Memory Utilization
PSU and Fan Readings
ASIC Capacity for Routes and ACLs
Software and Kernel Route capacity
Packet Counters – IN/OUT, Errors/Discards
Topology View
Device Connectivity view across Roles and Location
Link/Connectivity Status
Device or Component failure count
Routing Protocol
BGP 2 Byte and 4 Byte AS
BGP Neighbors
Advertised and Received Prefixes
Local AS Number
VXLAN
MGLAG
LACP
RoCE
Orchestrator Use Cases
YAML-based Configuration push
Image Management via ZTP
BGP Numbered(IPv4 & IPv6) and Unnumbered Configuration
BGP Peering with Port-Channel
NTP, SNMP, SFLOW, and SYSLOG Configuration
VXLAN
Symmetric/Asymmetric IRB
L2/L3 MC-LAG
EVPN MultiHoming
Layer2 Leaf-Spine (L2/L3 Mode)
Leaf only Deployment
BGP Peering over MC-LAG PeerLink
BGP Peering using separate Link between MC-LAG Peers
DHCP Relay
SAG / SVI
Licensing
Application License
Telemetry Agent License
Orchestrator Agent License
User Management
Add/Edit/Delete User
Role Management
API Access for configurations originating from External Orchestration Tools
Rule Engine
Slack Channel for push notification
Zendesk ticket generation
Rules status
Cloud Integration for DL
Splunk
Amazon S3
ONES allows users to leverage pre-defined templates, and customize them for Ports, IPv4/IPv6 Routes, BGP-Unnumbered, and Switch Services (NTP, SNMP, SYSLOG, ZTP, etc.) functions
bash$ ssh username@myswitch
Password: <passw0rd>
myswitch> enable
myswitch# configure terminal
myswitch(config)# management api http-commands
myswitch(config-mgmt-api-http-cmds)# no shutdown
myswitch(config-mgmt-api-http-cmds)# show management api http-commands
Enabled: Yes
HTTPS server: running, set to use port 443
HTTP server: shutdown, set to use port 80
Local HTTP server: shutdown, no authentication, set to use port 8080
Unix Socket server: shutdown, no authentication
VRFs: default
ZOOKEEPER_SERVER_VALUE= //Public DNS
KAFKA_SERVER_VALUE= //Public DNS
SCHEMA_REGISTRY_SERVER_VALUE=
INSTANCE_ID= //Instance ID
VPC_ID= //VPC ID
REGION= //REGION ID
SG_ID= //SG ID
Does the ONES-agent is integrated with SONiC NOS? (yes/no): no
Do you want to add only Collector IP for auto-discovery and skip the agent installation ?(yes/no): yes
Enter the ip address of collectors to auto-discover. Do not enter more than 2. Eg - 10.1.1.10,10.2.2.5 : 10.4.5.254
Do you want to restrict access only to provided collector ip?
Note: Providing Yes will restrict access to agent only with the provided collector IP Address
Enter Yes/No: No
468025bd9c12: Loading layer [==================================================>] 2.56kB/2.56kB
1961412d5783: Loading layer [==================================================>] 36.56MB/36.56MB
de3513fa22d1: Loading layer [==================================================>] 42.69MB/42.69MB
ca3343e443b5: Loading layer [==================================================>] 1.421MB/1.421MB
Loaded image: avizdock/agent_installer:latest
Docker image 'avizdock/agent_installer:latest' is loaded.
6466655d5e7dc2631ea51df49e41d75661547603aaa0ddd4615778170fa33a70
Docker container 'agent_installer' is running.
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
6466655d5e7d avizdock/agent_installer:latest "python3" 1 second ago Up Less than a second agent_installer
10.4.5.254
no
no
Selecting ‘Yes’ will exclusively initiate the day-2 deployment of the Ones-Agent,
involving a reconfiguration of the existing agent to establish communication with the specified collector(s).
Choosing ‘No’ will initiate the deployment of the Ones-Agent as an independent third-party container.
[{'ip': '10.4.4.61', 'user': 'admin', 'passwd': 'YourPaSsWoRd', 'layer': 'Leaf', 'region': 'Nyk', 'azid': '1', 'brickid': '1', 'rackid': '1', 'installation_instance': 1, 'agentip': '10.4.4.61', 'collectorip': '10.4.5.254', 'restrict_collector_ip': 'no'}]
Agent installation skipped successfully........
Adding Collector IP for auto-discovery...........
###############Connecting to switch###############
Connection to switch 10.4.4.61 successful.....................
...
...
...
Docker agent_installer has been stopped
agent_installer
Docker agent_installer has been removed
Untagged: avizdock/agent_installer:latest
Deleted: sha256:6604bbec1feedd9169ea6cca598650bf2f1fe659ae587e9637f80adabe877eeb
Deleted: sha256:4754441bc1128ca32bd0085e2a366e111b4882359cdebc92560d243a671f33e3
Deleted: sha256:c803c23f2ec84e2601fa5e5d954b1cbf406167ae057d7200e9d2f61ba1f402fa
Deleted: sha256:f9279b9fbc87fee822c69ea9cacc2f9d9a10d9c54bbc21732205684ea3bcf0b1
Deleted: sha256:27ea50b3cb2914caa5f99e5268ba0b47e15fb7c490275f560c54e400162e2cc3
Docker image has been removed
Layer
Region
azid
brickid
rackid
Work with Aviz Sales/Support contact to create an account on Aviz Networks Support Portal
Click on the Downloads section, under ONES, click to download ONES Release 2.1
Copy ONES Release 2.1 package (qcow2) to KVM Hypervisor Server
Create the VM using GUI App virt-manager
If your host server has Ubuntu Desktop and virt-manager installed you can use it to deploy the VM. Make sure you can start the Virtual Machine Manager and that it connects successfully to the local hypervisor.
Creating a VM with virt-manager is very straightforward, Use the following steps to deploy the ONES-Application
File -> New Virtual Machine -> Import existing disk image -> Forward
Now the ONES Application is ready to use
Create the VM using QEMU (XML configuration)
Create an XML configuration file from the following template using vi
Create a Linux bridge configuration file (bridged-network.xml) for libvirt from the following template
Define the Linux bridge for the VM
Start the VM
If you see a permission error run the virsh command with sudo may fix the issue
This page shows all the links and information connected to each other
Using a right click on device it allow a user to go to specific feature details
Traffic, Health, Capacity, Protocols
Allow a user to connect device via SSH, Console Access
Traffic View
PFC Enabled Device view
input/output packets in million per sec
Health of the devices
CPU & Memory Utilization
CPU & PSU Temperature
Capacity of the devices
IPv6 & IPv6 Routes
ASIC/Software/Kernel
Links Page
All the connected devices
Transceivers info
Protocols status
BGP status
VXLAN
Topology
Navigate to Monitor >> Topology
This shows the complete Topology view, how the devices are connected
Topology can be filtered by Underlay/Overlay/RoCE
We can put the filters to check the customized view of the Topology
Traffic
Using this widget we can check the input and output errors across all the devices
This widget also shows the input and output packet per device
Navigate to Monitor >> Traffic
This page shows the information:
Device Name & IP
Roles & Region
This page shows the traffic drop rate per interface based and will be very useful while doing any troubleshooting for a traffic drop
Using these details a user can check more details inside to fix the issue of dropping/discarding packets
Health
This page shows the latest utilization of all the devices
CPU & Memory utilization
Temperature & Voltage of PSU
fan speed in % & RPM
Health Status
Health Status is reported for the following components
Roles
SKU/ASIC
Ports/Max Speed
Customized View
We can check the health of the device as per some customization
We can filter the devices by:
Roles
Role-based Customization
We can choose a role using the available Role-based option
4 Roles available
Super Spine
Per Device Status
This Platform Widget also gives the option to check the extended capability view of the device
Apart from this monitoring view, we can also verify/check extended feature sets like:
PSU Current (A)
When we choose a specific device we get an output like this
Device Info Ribbon
Feature
Use
CPU Usage (%)
Here we get the complete status of CPU usage with a time range A complete status What was the usage from starting to end
To check a specific time detail we can hover the cursor to any level
Memory Usage (%)
Now here we get the status of Memory Usage of selected device
To check a specific time detail with memory utilization, we can hover the cursor to any level
Services CPU Consumption (%)
This widget shows us the CPU consumption percentage level of all services / per service.
Here we can see we have the option to check the consumption view of CPU
To check a specific time detail we can hover the cursor on any level
Services Memory Consumption (%)
This widget shows us the Memory consumption percentage level of all services / per service.
Here we can also check the consumption view of only Memory.
To check a specific time detail we can hover the cursor on any level
Services Running
The best widget here for Services
We can also check the total number of count of services running on the platform
This graph shows the red colour bar, red colour show at what time one of the services went down
CPU Temperature (C)
This template shows the status of CPU temperature in degree celsius
Here we get the status of all the CPU and Core running on the device
Components
This page outlines the key metrics for accurately monitoring the performance of various components, specifically focusing on Temperature, Current, Fans, and Power.
SSD
Device Resource Utilization Page
This page allows users to monitor the resource usage of their devices, providing a proactive view of how resources are allocated per switch and the trends in service usage. If you notice any resource usage spiking, you can easily navigate to another utilization page to identify the specific processes contributing to the increased demand. This feature is designed to help users manage their resources more effectively and prevent potential issues before they impact performance.
These metrics can be effectively analyzed through a time series graph.
Capacity
This page shows the view of Capacity and a few more details related to devices
This widget Shows
Roles/Region per device
SKU and ASIC details per device
Feature
Details
This Capacity Widget give us the control to get the output per Role and Region basis also
Let's choose Leaf Role to get the customized view
In the same way, we can customize the view by Region & SKUs
This is the extended view of the device capacity for all the IPv4 and IPv6 ASIC routes, ACL utilization, software, and kernel routes
Using this page a user will be able to troubleshoot the protocol or any misbehaviour happening on the devices due to any capacity issue of routes
Per Device Status
This widget gives us the capability to check the extended view of the Routes & ACL usage with a range of time
Click on any of the devices to get the extended view
Links
This page gives a view to the user for all the possible connected links between devices with a few more capabilities
Navigate to Monitor >> Links
Feature
Use
This page helps a user to get the best view of the number of connections between devices with speed and other manufacturer details
This page gives the exact view of the interface name, interface speed, transceivers and admin & operator status
User can also check the transceiver details with timescale database
Protocols
This Protocol Page shares the metrics of below features
BGP (numbered/ unnumbered)
VXLAN
MGLAG
LACP
BGP
This BGP section will help a user to know more and accurate number with following details
BGP 2 byte and 4 byte AS
BGP numbered
BGP unnumbered
Total number of neighbors configured
Feature
Feature
This page gives the best details of the BGP neighbours connected with the devices and possible metrics/values a user can use to troubleshoot a BGP neighbour
Neighbor View
This shows the status of the neighbour's details, the total number of neighbours, received routes, neighbour RID, BGP AS number & much moreWe have the option here to check the neighbour details and status of RoutesWe can click on neighbours to get more details about all neighbours connected
Per device status (Neighbour's & Announcement)
The user can get per-device status by choosing a particular Device
Click on the device name to get the status
This new page shows the status of BGP neighbours about UP and Down status
On the right side it shows the BGP announcements and the local prefixes present in the BGP table
VXLAN
This section is really helpful in finding the devices with active VXLAN features enabled with a few more details
L2 & L3 VXLAN metrics
Local VTEP
Remote VTEP and details on how many are up and down
VLAN to VNI Mapping
Below output shows all the possible devices with VXLAN details
VTP Details
By clicking on Local VTEP ID we get the most accurate details on that device for all the remote VTEP connected to this
Users can click on operation status to get the time series graph about the up nd down status, at what time the VTEP was up or down
VLAN to VNI Mapping
Users can get the details on VLAN to VNI mapping using the option on this page
VRF to VNI Mapping
Using the below option user can check all the possible VRF to VNI Mapping
MCLAG
This feature enhances network management by allowing users to access a timescale graph. This graph shows the status of neighbour and peer links over time, indicating periods when they were down or active. Users can further examine the health page to determine whether downtime resulted from process issues or resource utilization problems.
Additionally, the feature provides tools for verifying:
MGLAG (Multi-Chassis Link Aggregation Group) Domain ID
The status of PortChannels associated with Peer Links
MCLAG-L3 (Multi-Chassis Link Aggregation Level 3) status
With these capabilities, users can effectively track and diagnose network performance and configuration issues.
Time Series Graph with up and down status
Navigate to Monitor >> Protocols >> <Choose Devcice> >> Click on Active/Passive
LACP
Using this Page metrics user will get the details on the EtherChannel status, it will show per device etherchannel status with member ports and the status with a time series graph
Selecting a port channel on any device allows users to view its time series data. This feature enables the analysis of the device's status, ranging from the latest hour up to two weeks of metrics.
Inventory
User can onboard all the devices on the application and can get a complete view of all the populated tables
Agent-Based devices will automatically added using the auto discovery feature
Agent-Less devices needs to be added using this inventory page
The Inventory tab has the below-mentioned features:
Syslog: Using this tab user can easily access all the syslogs and can find out the more relative logs directly in case of some failure of any process or any other module of device
Custom OS upgrade: Upgrade the device OS with any customised image. You need to provide the correct path to ensure the OS is updated successfully
OS upgrade via ZTP: Upgrade the device OS via Zero-touch provisioning
Reboot devices: Reboot the device from a single click in the UI
Add devices from the dashboard: The User can onboard the non-sonic devices using the YAML file upload or via the in-built editor in the UI
Remove devices from the dashboard: The user can remove the auto-discovered(Agent based) and non-sonic devices
Login Page
To access the ONES application, use Server IP/FQDN with HTTPS
https://<host-ip/FQDN>
Use default credentials to login, refer Installing ONES Application page for default credentials
CISCO NXOS (GRPC)
CISCO NXOS support its own way to stream telemetry data using GRPC, we can enable GRPC and can get offered metrics from the device
Enable GRPC
GRPC Verification
Show run GRPC
GRPC Service Statistics
GRPC Summary
Supported Telemetry
Configurations
Navigate to Configurations >> Configure Devices
Allows you to configure new devices
Supports valid YAML files
You can download the sample YAML file, edit it, and upload it again with the desired configuration
YAML Config Illustrator
While configuring the topology, users can utilize the "Visualize YAML" feature to preview the structure and layout.
NOTE: Follow the to know more about configuration and all the possible use cases.
Devices
This section explains how users can add/manage/remove the devices using ONES.
Devices
Navigate to Inventory >> Devices
Rule Engine
Overview
In data centre operations, a rule engine with alerts for various metrics is essential for proactive monitoring and management of critical components and services. Let's see the different types of rule engine alerts for specific metrics in a data centre environment
CPU and Memory Alerts
Adding Devices
The user needs to make sure, The devices have a unique name, otherwise, there will issue while plotting the full topology view(Topology Page).
Agent-based devices auto-discover the ONES-App and get registered automatically on the ONES Inventory page
Slack Channel Integration
1. Create a Channel for ONES-App push notification
2. Generate API for Channel
login to & choose Your apps
Zendesk Support Integration
Login to the Zendesk Support Admin panel & Follow the steps
Does the ONES-agent is integrated with SONiC NOS? (yes/no): no
Do you want to add only Collector IP for auto-discovery and skip the agent installation ?(yes/no): no
Enter the ip address of collectors to auto-discover. Do not enter more than 2. Eg - 10.1.1.10, 10.2.2.5 : 10.4.4.11
Do you want to restrict access only to provided collector ip?
Note: Providing Yes will restrict access to agent only with the provided collector IP Address
Enter Yes/No : Yes
Does the ONES-agent is integrated with SONiC NOS? (yes/no): no
Do you want to add only Collector IP for auto-discovery and skip the agent installation ?(yes/no): no
Enter the ip address of collectors to auto-discover. Do not enter more than 2. Eg - 10.1.1.10,10.2.2.5 : 10.4.4.11
Do you want to restrict access only to provided collector ip?
Note: Providing Yes will restrict access to agent only with the provided collector IP Address
Enter Yes/No: No
468025bd9c12: Loading layer [==================================================>] 2.56kB/2.56kB
1961412d5783: Loading layer [==================================================>] 36.56MB/36.56MB
de3513fa22d1: Loading layer [==================================================>] 42.69MB/42.69MB
ca3343e443b5: Loading layer [==================================================>] 1.421MB/1.421MB
Loaded image: avizdock/agent_installer:latest
Docker image 'avizdock/agent_installer:latest' is loaded.
ab76dcd0da078a25570efe51a0057040f47e8a4a5c7320a47eb7a63ac5b42d8c
Docker container 'agent_installer' is running.
...
...
...
###############Connecting to switch###############
Connection to switch 10.4.4.61 successful.....................
Looking for previous installation........................
...
...
...
Loading Docker image on the device 10.4.4.61 ###########################################
Docker image loaded successfully on the device 10.4.4.61........
Getting name of the loaded image
image = ##avizdock/ones-agent:v2.1.0##
Running docker.....................
docker run -it -v /var/run/docker.sock:/var/run/docker.sock -v /host/reboot-cause:/host/reboot-cause -v /etc/sonic:/etc/sonic -v /var/run/redis:/var/run/redis -v /var/run:/var/hostrun --log-driver local --log-opt max-size=5m --log-opt max-file=3 --cpu-period=100000 --cpu-quota=50000 --net=host --privileged -dt --name ones-agent avizdock/ones-agent:v2.1.0
b'a01ff31e2f11613e608d7aa425748c69e4d803c5c3cd8a14a1524d173b5e9549\n'
Loading Service file on the device 10.4.4.61........
Service file loaded successfully on the device 10.4.4.61##################
Enabling ones-agent.service 10.4.4.61 ##################
Enabled ones-agent as service successfully on the device 10.4.4.61 ##################
Starting ones-agent service on the device 10.4.4.61........
started ones-agent service successfully on the device 10.4.4.61 ##################
Enabling ones-agent to restart after booting on the device 10.4.4.61........
Made ones-agent immune to booting on the device 10.4.4.61########################
Copying ones-agent.tar file
ones-agent.tar file copied successfully on the device 10.4.4.61........
Copying agent.conf file
agent.conf file copied successfully on the device 10.4.4.61........
Copying ones-agent.service file
ones-agent.service file copied successfully on the device 10.4.4.61........
Copying ones_agent_ip_rule.sh file
ones_agent_ip_rule.sh file copied successfully on the device 10.4.4.61........
Copying ones_agent_start.sh file
ones_agent_start.sh file copied successfully on the device 10.4.4.61........
##################################################################
Status of ones-agent.service is - Active: active (running) since Mon 2024-04-15 19:32:21 IST; 1min 35s ago
Deployment of ones-agent to switch 10.4.4.61 is successful
agent_installer
Docker agent_installer has been stopped
agent_installer
Docker agent_installer has been removed
Untagged: avizdock/agent_installer:latest
Deleted: sha256:6604bbec1feedd9169ea6cca598650bf2f1fe659ae587e9637f80adabe877eeb
Deleted: sha256:4754441bc1128ca32bd0085e2a366e111b4882359cdebc92560d243a671f33e3
Deleted: sha256:c803c23f2ec84e2601fa5e5d954b1cbf406167ae057d7200e9d2f61ba1f402fa
Deleted: sha256:f9279b9fbc87fee822c69ea9cacc2f9d9a10d9c54bbc21732205684ea3bcf0b1
Deleted: sha256:27ea50b3cb2914caa5f99e5268ba0b47e15fb7c490275f560c54e400162e2cc3
Docker image has been removed
Line #2 The Name of the VM
Line #3 The amount of System Memory for the VM
Line #4 The amount of System Memory for the VM
Line #5 The number of vCPU Core for the VM
Line #25 The Path to the qcow2 VM image file
Line #35 The name of the Linux bridge on the host machine
Line #4 the name of the Linux bridge on the host machine
#Execute the below command to attach the VM to the Linux Bridge
sonic@sonic-39:~$ virsh net-define bridged-network.xml
sonic@sonic-39:~$ virsh net-start br0
sonic@sonic-39:~$ virsh net-autostart br0
sonic@sonic-39:~$ virsh net-list
Name State Autostart Persistent
----------------------------------------------------------
br0 active yes yes
sonic@sonic-39:~$
virsh create <VM XML configuration file>
#sonic@sonic-39:~$ virsh create ones.xml
#Domain ONES_VM01 created from ones.xml
#sonic@sonic-39:~$
sonic@sonic-39:~$ virsh list
Id Name State
----------------------------------------------------
8 ONES_VM01 running
sonic@sonic-39:~$
Rule Engine pushes the configured rule notification in case any device breaches the threshold value configured under the rule to
Slack channel
Zendesk Support ticket
To use Rule Engine feature User needs to setup first Slack channel integration or Zendesk Support integration
switch# show run grpc
!Command: show running-config grpc
!Running configuration last done at: Mon Jan 29 13:59:36 2024
!Time: Mon Jan 29 14:06:27 2024
version 9.3(9) Bios:version 04.18
feature grpc
grpc use-vrf default
switch# show grpc gnmi service statistics
=============
gRPC Endpoint
=============
Vrf : management
Server address : [::]:50051
Status : Running - certificate expired
Cert notBefore : Jan 10 07:07:03 2024 GMT
Cert notAfter : Jan 11 07:07:03 2024 GMT
Max concurrent calls : 8
Listen calls : 1
Active calls : 0
Number of created calls : 32
Number of bad calls : 29
Subscription stream/once/poll : 15/0/0
Max gNMI::Get concurrent : 5
Max grpc message size : 8388608
gNMI Synchronous calls : 20496
gNMI Synchronous errors : 0
gNMI Adapter errors : 0
gNMI Dtx errors : 0
=============
gRPC Endpoint
=============
Vrf : default
Server address : [::]:50051
Status : Running - certificate expired
Cert notBefore : Jan 10 07:07:03 2024 GMT
Cert notAfter : Jan 11 07:07:03 2024 GMT
Max concurrent calls : 8
Listen calls : 1
Active calls : 0
Number of created calls : 1
Number of bad calls : 0
Subscription stream/once/poll : 0/0/0
Max gNMI::Get concurrent : 5
Max grpc message size : 8388608
gNMI Synchronous calls : 0
gNMI Synchronous errors : 0
gNMI Adapter errors : 0
gNMI Dtx errors : 0
switch# show grpc gnmi rpc summary
=============
gRPC Endpoint
=============
Vrf : management
Server address : [::]:50051
Status : Running - certificate expired
Cert notBefore : Jan 10 07:07:03 2024 GMT
Cert notAfter : Jan 11 07:07:03 2024 GMT
Capability rpcs : 20474
Capability errors : 0
Get rpcs : 22
Get errors : 0
Set rpcs : 0
Set errors : 0
Resource Exhausted : 0
Option Unsupported : 0
Invalid Argument : 0
Operation Aborted : 0
Internal Error : 0
Unknown Error : 0
RPC Type State Last Activity Cnt Req Cnt Resp Client
--------------- ---------- -------------- ---------- ---------- ----------------------------------------
Subscribe Listen 01/29 08:42:41 0 0
=============
gRPC Endpoint
=============
Vrf : default
Server address : [::]:50051
Status : Running - certificate expired
Cert notBefore : Jan 10 07:07:03 2024 GMT
Cert notAfter : Jan 11 07:07:03 2024 GMT
Capability rpcs : 0
Capability errors : 0
Get rpcs : 0
Get errors : 0
Set rpcs : 0
Set errors : 0
Resource Exhausted : 0
Option Unsupported : 0
Invalid Argument : 0
Operation Aborted : 0
Internal Error : 0
Unknown Error : 0
RPC Type State Last Activity Cnt Req Cnt Resp Client
--------------- ---------- -------------- ---------- ---------- ----------------------------------------
Subscribe Listen 01/10 08:12:32 0 0
switch#
switch# show grpc gnmi transactions
=============
gRPC Endpoint
=============
Vrf : management
Server address : [::]:50051
Status : Running - certificate expired
Cert notBefore : Jan 10 07:07:03 2024 GMT
Cert notAfter : Jan 11 07:07:03 2024 GMT
RPC DataType Session Time In Duration(ms) Status
------------ ---------- --------------- -------------------- ------------ ------
Capabilities - 0 01/29 12:04:07 0 0
Capabilities - 0 01/29 12:03:47 0 0
Capabilities - 0 01/29 12:03:35 0 0
Get ALL 3698131864 01/29 08:43:34 1186 0
...
...
...
switch# show telemetry yang direct-path cisco-nxos-device
1) Cisco-NX-OS-device:System/lldp-items
2) Cisco-NX-OS-device:System/mac-items
3) Cisco-NX-OS-device:System/intf-items
4) Cisco-NX-OS-device:System/procsys-items
5) Cisco-NX-OS-device:System/ipqos-items/queuing-items/policy-items/out-items
6) Cisco-NX-OS-device:System/ch-items
Statuses
Not streaming, Faulty Fans, Faulty PSUs, Links Down
Metrics
Bandwidth RX & TX
Memory, CPU & ASIC Utilization
Can get SYSLOGS
Errors and Discard packets per interface
And more related metrics
PSU Voltage & Fan Speed
SSD Temperature, Health and Memory utilization
ASIC ACL capacity
MCLAG
LACP
Count of devices
All devices onboarded
Not streaming
Faulty Fans & PSUs
Links Down
We can also check Down Links to check the topology those are having links in the shutdown state
When we hover the cursor over any device and use right click we get few more controls
Device details/Ports
Direct Navigation per device
Traffic/Health/Capacity/Protocols
Console connect
SYSLOG
we can also filter the view by using
Statuses
Not streaming, Faulty Fans, Faulty PSUs, Links Down
Metrics
Bandwidth RX & TX
Memory, CPU & ASIC Utilization
Device details
Interface speed and ports
Errors & Utilization of the links
Filter Ribbon can be used to get a customized view
PFC Enabled Devices
Operator up/down
Admin down
PFC Enable interfaces
When we click on any device it gives more information about the interface traffic
Errors per interfaces
Bandwidth Utilisation per interfaces
When we click on any particular interface it gives the timescale of the inputs and output packets with Errors and Discards & all metrics in detail
SSD Temperature, Health and Memory Usage
Navigate to Monitor >> Health
CPU Utilization (%)
Memory Utilization (%)
CPU Temperature (℃)
PSU Temperature (℃)
PSU Voltage (V)
Fan Speed (RPM)
SSD
Temperature(℃)
Health(%)
Memory(%)
Details of the temperature of the CPU across all the devices in degrees celsius
Any device that breaches the configured acceptable or critical value will be shown here
Click on any device to get the view/status of all the components related to that device
Average PSU Temperature (C)
Power Supply Temperature in degrees celsius
Any device that breaches the configured acceptable or critical value will be shown here
Click on any device to get the view/status of all the components related to that device
PSU (Voltage)
Power Supply Voltage readings in volts
Any device that breaches the configured acceptable or critical value will be shown here
Click on any device to get the view/status of all the components related to that device
Average Fan Speed (%)
Fan Speed in % of maximum supported RPM
Any device that breaches the configured acceptable or critical value will be shown here
Click on any device to get the view/status of all the components related to that device
SSD
SSD Status will be shown here
SSD Temperature: will allow a user to track the temperature
SSD Health: will allow a user to check the health utilization in percentage
SSF Memory: this metric will be useful to check the utilization of SSD
Region
Spine
Leaf
ToR
Let’s check it with a Leaf filter
After selecting Leaf input, here is the new view of only devices that belong to the Leaf role
PSU Power (W)
Services Running
Services CPU/Memory Consumption (%)
To view per device status with all possible widgets, click on any of the devices present on the list
API Explorer
7
Device Details
Platform
Number of Ports and Speed
Agent Version
ASIC ACL Capacity utilization
IPv4 Routes (ASIC, Software, Kernel)
IPv6 Routes (ASIC, Software, Kerneel)
Device Manufacturer
Manufactured Date
Date of Manufacturing
Admin and Operator status
Local and Remote status of link
How many neighbors are up and down
total number of prefixes and how many we are advertising
BGP neighbor details
These are the total number of advertised prefixes by the router to other BGP neighbours
This is the Local BGP AS number
Here we have the control to check more details on neighbours
Here is the view of the Route Refresh messages count:
Tx: how many Route-Refresh messages have been transmitted
RX: how many Route-Refresh messages have been received
Here is the view of the Updates Count:
Tx: how many times updates have been transmitted
RX: how many times updates have been received
VRF to VNI mapping
MCLAG-L2 (Multi-Chassis Link Aggregation Level 2) status
HOST / IP
Device Name
Device IP
Roles/Region
Device Role
Device Region
SKU/ASIC
SKU (Stock Keeping Unit)
ASIC
Port/Max Speed
Total number of ports available
Speed of ports
CPU Utilization (%)
CPU Utilization reported in 4 states
Normal
Acceptable
Critical - Action needed
Not Streaming - Agent is not up
Click on any device to get the view/status of all the components related to that device
Memory Utilization (%)
Memory Utilization reported in 4 states
Normal
Acceptable
Critical - Action needed
Not Streaming - Agent is not up
Click on any device to get the view/status of all the components related to that device
1
Time Frame: Check Utilization Trends based on Time Range
The application has the capacity to store up to 2 weeks of data
2
Refresh Component Status
3
Alerts: show all the alerts triggered by rules
4
Raise a Ticket for Technical Support
5
Documentation
Roles/Region per device
SKU and ASIC details per device
ASIC ACL Capacity utilization
IPv4 Routes (ASIC, Software, Kernel)
IPv6 Routes (ASIC, Software, Kernel)
Feature
Use
When we move the cursor to metrics this gives the usage view of ipv4:
ASIC
Kernel
Software
When we move the cursor to metrics this gives the usage view of ipv6:
ASIC
Kernel
Software
When we move the cursor to metrics this gives the usage view of ACL:
ASIC
Hostname
Hostname of the managed device
Role
Role of the device
Port/Interface
Interface details
Port Speed
Link speed of connected devices
Transceiver
SFP/QSFP Optics statuts
Here we can get:
Device name
Device IP
Here We get the view of:
Roles and Region
this column shares the details of:
SKU
ASIC
This shared the count of total BGP neighbours
This column share the status of
how many BGP neighbours are UP and running
How many BGP neighbours are in Down state
Feature
Feature
Here we get the details of connected neighbours
Neighbour Device Name
Neighbour IP
Here we get the Neighbour BGP AS number
This shares the neighbour status of uptime, from how long the neighbour is connected
Here we get the detail of the last neighbour reset timer
This share the count of established and dropped connections per neighbour
Average CPU Temperature (C)
6
Manufacturer
This column shares the Total Prefixes Present in BGP
Here is the view of the Keep Alive timer:
Tx: how many keepalives have been transmitted
Using this tab, user can:
Onboard the non-sonic(Agent-Less) device to the application using Add devices
Syslogs capture
Upgrade the device using Custom Upgrade
Upgrade the device using ZTP (Zero Touch Provisioning)
Reboot individual devices or multiple devices by selecting them in one click
Remove the devices
Complete Inventory can be downloaded in CSV format
The user needs to make sure, The devices have a unique name, otherwise, there will issue while plotting the full topology view(Topology Page).
Sonic Devices
Agent-based devices auto-discover the ONES-App and get registered automatically on the ONES Inventory page
Add Non-Sonic Devices
To Onboard the Agent-Less devices user needs to add them manually
Navigate to Inventory
This page gives the control to onboard the devices with two options
Add Devices using the YAML Editor
Upload the CSV file containing the device list
The movement user chooses CSV upload, then the YAML Editor will be disabled
1. Add Devices using YAML
Click on Add Devices
Upload Device Inventory using YAML Editor
Navigate to Inventory >> Devices >> Add Devices >> Use YAML
Use the below format to add devices to the application
To Identify the device type user needs to mention the platform
Device Type
Cumulus: cumulus
Arista: arista
Cisco: cisco-nxos
SONiC: sonic
For SONiC-based devices, the user can also leave the type field empty
Make sure to use the correct indentation for the YAML files
Click Save & Apply
ONES Application is now ready to manage the added devices
2. Add devices using CSV
Click on Add Devices
Upload Device Inventory using CSV File
Navigate to Inventory >> Devices >> Add Devices >> Use CSV
Use the below format to add devices to the application
SKU / ASIC: Shows the device hardware SKU and ASIC vendor
Port / Max Speed
Shows the number of ports per device and max port speed on the device
Click on the number of ports to get a detailed view of all the ports on a particular device
PSUs / Fans: Shows the total number of Power supplies and Fans present on a particular device
NOS Image: Shows the details of the network operating system running on the device and when it was last updated
ONIE Version: Shows which ONIE version is running on the device and when the last reboot time of the device
Agent Version / Network OS: Shows the agent version running on the device and the current active OS version on the device
Agent status / Last contact: Latest status of the Agent and when it was last communicated with that Agent.
Connect: Using this feature we directly get the CLI access of the device
SSH Connect
Console Connect
Details: This last option we can again use to get the details of the device
Remove Devices from the ONES Application
Navigate to Inventory >> Devices >> Remove Devices
Choose the devices to be removed & confirm
once the user clicks on confirm, the Inventory page will remove the device
If the devices are agent-based they will get added again after some time,
if the user wants to remove the agent-based devices, then the user needs to uninstall the agent from the device
Now the selected devices have been removed from the ONES application
Custom Upgrade
This feature gives the control to upgrade the device to the new version
An HTTP image link is required to use the custom upgrade
Select any of the devices to upgrade to the new version
Click on Custom Upgrade
put the new Image URL and thenSubmit
It will show the status as In Progress
HTTP image URL should be accessible
This image will be downloaded to the device and configured as the next boot image and devices will be reloaded
Once the device comes up with the new image, the ONES application will install Telemetry and Fabric manager agent
when we upgrade any device that will be locked to do any further changes after a successful upgrade user can again use the same device for another task
Once the image is loaded, the ONES application will show the last image details and time stamp
Upgrade via ZTP
Using this page a user can directly upgrade the box
Select any of the devices to upgrade via ZTP
Click on Upgrade via ZTP
Click on Yes
when we upgrade any device that will be locked to do any further changes, after a successful upgrade user can again use the same device for another task
Reboot Device
Here we will see how we can reboot a device using
We have the option to choose one or multiple devices at a time to reboot
Choose one of the devices that we want to reboot
Click on Reboot
Click on Yes
While rebooting the device, the device will be locked to do any other task, once the reboot is successful, the lock will be removed and the user can take any new action
To Onboard the Agent-Less devices user needs to add them manually
Navigate to Inventory
This page gives the control to onboard the devices with two options
Add Devices using the YAML Editor
Upload the CSV file containing the device list
The movement user chooses CSV upload, then the YAML Editor will be disabled
1. Add Devices using YAML
Click on Add Devices
Upload Device Inventory using YAML Editor
Navigate to Inventory >> Devices >> Add Devices >> Use YAML
Use the below format to add devices to the application
To Identify the device type user needs to mention the platform
Device Type
Cumulus: cumulus
Arista: arista
Cisco: cisco-nxos
SONiC: sonic
For SONiC-based devices, the user can also leave the type field empty
Make sure to use the correct indentation for the YAML files
Click Save & Apply
ONES Application is now ready to manage the added devices
2. Add devices using CSV
Click on Add Devices
Upload Device Inventory using CSV File
Navigate to Inventory >> Devices >> Add Devices >> Use CSV
Use the below format to add devices to the application
Allow user to include or exclude the devices from the rule
Entity by Property
Allow a user to create Rules by using HwSKU, Role, OS Version across all the managed devices
1. Entity Based explained
Possible Values & Description
Rule Name: The user can choose any related name
For: The user can choose 2 options
Device: Once the user chooses the rule for Devices it will show the below Metrics
ASIC IPv4 Routes
ASIC IPv6 Routes
Interface: Once the user chooses the rule for Interfaces it will show the below Metrics
Interface flap
Interface PFC Receive Counters
Metrics: Metrics depend on the above (For: Device/Interface) condition
Measure: Metrics are measured in three diff ways
MIN
Conditions
When Measured Value is: This option allows a user to choose what condition has to match when the measured value is
EQ: Equal to
NEQ: Not Equal to
Notification
Notify: The user can choose the integrated SLACK Channel
Create Ticker: Zendesk Users can choose this to raise the Zendesk support ticket
Weekly Digest: Slack Users can choose this for Weekly Digest to SLACK Channel
2. Entity by Property
Possible Values & Description
Rule Name: The user can choose any related name
Filter: user can filter the rule for all managed devices by
HWSKU
Device: Once the user chooses the rule for Devices it will show the below Metrics
ASIC IPv4 Routes
ASIC IPv6 Routes
Interface: Once the user chooses the rule for Interfaces it will show the below Metrics
Interface Flap
Interface PFC Counters
Select: this option depends on the Filter category, possible values are
Select HWSKU :
Select ROLE :
Conditions
When Measured Value is: This option allows a user to choose what condition has to match when the measured value is
EQ: Equal to
NEQ: Not Equal to
Notification
Notify: The user can choose the integrated SLACK Channel
Create Ticker: Zendesk Users can choose this to raise the Zendesk support ticket
Weekly Digest: Slack Users can choose this for Weekly Digest to SLACK Channel
Alerts
Overview
When a user creates a rule, and the threshold value is exceeded, alerts will be generated. These alerts will also be displayed on this page.
Alerts
Notifications from the Alerts Page are always sent to:
Zendesk Support: Integrated Zendesk Support systems will receive all push notifications.
SLACK Channel: If integrated, notifications will also be sent to the configured SLACK channel.
Alert Page: It will always display the alerts on ONES Alert page
Alert Management
Count of alerts related to feature
Alert Name
First seen of the alert
Last seen of the alert
Expand Option is used to check the payload and total alerts
Time Scale Alert Updates
Users can choose the time range to check the more alerts
Alert Page allows a user to download the report in CSV format with a time range
Add Rules: Entity
Add Rules
Navigation >> Watcher >> Rules
Create New & Add the required inputs
Preview & Create
Once a user create the rule it will be available in the rule list
Once the device SSD Memory Utilization goes above the threshold value it will start pushing notifications to SLACK & Zendesk Support tickets & also inside the ONES Alert Page
Add Rules: Entity by Properties
Add Rules
Navigation >> Watcher >> Rules
Create New & Add the required inputs
Preview & Create
Once a user creates the rule it will be available in the rule list
Once the device CPU Utilization goes above the threshold value it will start pushing notifications to SLACK & Zendesk Support tickets and the ONES App Alert Page
Analytics
Hardware
The dashboard provides the NetOps with an overview of the data centre. It contains the entire hardware inventory of the network and shows the status whether these switches are streaming or not streaming.
Feature
Use
After the Installation of ONES Application for the first time, the Dashboard is empty and Devices need to be onboarded for them to reflect
Dashboard will be used to
monitor the status of an agent running on all the devices present
Components
Navigate to Dashboard >> Components
Feature
Use
Interfaces
Navigate to Dashboard >> Interfaces
Feature
Use
Using this page a user gets the status of
the cables utilized in the network
how many pairs of cable can be used for future topology (helps the admins in capacity planning)
Software
Navigate to Dashboard >> Software
Feature
Use
Settings
Overview
Using this feature setting we can set the acceptable and critical percentage level for the following device components
This page gives control over the widget refresh timer and user idle state
Users can set the manual timer to refresh all the widgets after a time interval (default is 120sec)
Zendesk Support Integration
Login to the Zendesk Support Admin panel & Follow the steps
Period: Measured metrics can be verified with a buffer of a timer
5 min
10 min
15 min
30 min
1 hour
GE: Greater than Equal to
LE: Less than Equal to
GT: Greater than
LT: Less than
Critical Threshold: The user can set a Critical value on which push notification will be triggered
Warning Threshold: The user can set a Warning value on which push notification will be triggered
Do not notify if the same alert trigger in: 30min, 1hour, 2hours, 10hours, 24hours
Stop notifying after: The user can choose a value of occurrence then it will not trigger the same in the next 24 hours
ROLE
OS Version
For: The user can choose 2 options
BGP Neighbours Down
Device CPU Core Temperature
Device CPU Utilization
Device Down
Device Memory Utilization
Device Queue Counter
FAN Speed
Failed FANs
Failed PSUs
PSU Temperature
SSD Health
SSD Temeperature
SSD Used Memory Percent
frr CPU Utilization
syncd CPU Utilization
Interface Queue Counters
Traffic InDiscards
Traffic InErrors
Traffic OutDiscards
Traffic OutErrors
Traffic Rx Utilization
Traffic Tx Utilization
Transceiver Rx Power
Transceiver Temperature
Transceiver Tx Power
Transceiver Voltage
Select OS VERSION :
Metrics: Metrics depend on the above (For: Device/Interface) condition
Measure: Metrics are measured in three diff ways
MIN
AVG
MAX
Period: Measured metrics can be verified with a buffer of a timer
5 min
10 min
15 min
30 min
1 hour
GE: Greater than Equal to
LE: Less than Equal to
GT: Greater than
LT: Less than
Critical Threshold: The user can set a Critical value on which push notification will be triggered
Warning Threshold: The user can set a Warning value on which push notification will be triggered
Do not notify if the same alert trigger in 30min, 1hour, 2hours, 10hours, 24hours
Stop notifying after: The user can choose a value of occurrence then it will not trigger the same in the next 24 hours
Integrations
ONES-2.1.0 Application allow users to add third-party tools to get the desired Alerts and metrics in raw format
Alerts Push Notification
Wherever there is a breach threshold value in ONES we can get the same notification in 2 different application
Slack Channel Integration: Push notification of all the alerts in the form of a message
Zendesk Support Integration: A Zendesk email alert with payload of triggered value
DataLake 1.0(Cloud Service)
ONES2.1 allows the use of DataLake, users can integrate 2 different platforms and ONES provides the capability to store the RAW data of all the Metrics in the Cloud and then the user will be able to use that RAW data for any deployment or any other use cases.
Splunk
Amazon S3
Cloud Services
DataLake 1.0(Cloud Service)
ONES2.1 allows the use of DataLake, users can integrate 2 different platforms and ONES provides the capability to store the RAW data of all the Metrics in the Cloud and then the user will be able to use that RAW data for any deployment or any other use cases.
Splunk
Amazon S3
ONES DL provides a flexible and scalable platform for storing, managing, and analyzing diverse data types at scale. By leveraging a schema-on-read approach and supporting various analytics tools, ONES DL facilitate advanced data analytics and enables organizations to derive valuable insights from their data assets. However, proper governance, security, and metadata management are crucial to ensure the usability, reliability, and integrity of data lakes.
VXLAN-Symmetric
Option to delete the alerts
Device Roles and associated Regions and
Details of Switch Hardware SKU and ASICs
interface to identify if any power failure is happening over the interface
Devices
Status of Switch
Not Streaming: The device is Inactive/Unreachable
Streaming: The device is in a Working state
Non-Licensed: The device is added out of licensed devices
Regions
Status of Regions, their Location and Device Mappings
Switch SKUs
Switch Hardware Vendor, Model Number and SKU
ASICs
ASIC Vendor, Model and Hardware version details
Roles
Device Roles in Customer Environment
Super-Spine
Spine
Leaf
ToR
PSUs
Shows the list of
all faulty Power Supplies across managed switches
LED status of managed switches
Fans
Show the list of
all faulty fans across managed switches
airflow direction of faulty fans for troubleshooting
Transceivers Temperature
Temperature readings and alerts for Optics
Transceivers Voltage
Voltage readings and alerts for Optics
Interfaces
Total number of ports available across devices
Status of Up interfaces across devices
Unused interfaces across the devices
Cabling
Total number of cables used across devices
Cable type used across device
Fiber
Copper
Count of cables required for unused ports
Interface Down
Information on Down Interfaces
Agent Status of a Device for a Down
Interface Historical Flaps starting with 5, 15, 30 minutes and 1 hours interval
Provides two types of Status:
Device Name with interface details
At what time the interface went down
Telemetry Agent version
version across all managed switches
distribution of Agent-based vs Agent-less switches
Device Status (Up/Down) based on Distro
Orchestrator Agent version
version across all managed switches
Agent Health - Up and Down
Network OS
NOS status and version across all managed switches
Distribution based on NOS versions
Device Status (Up/Down) based on NOS versions
Firmware Version
This widget shows the BIOS & ONIE version running on all managed devices
Linux Distro
version across all managed switches
Distribution based on Linux Distros
Devices Status (Up/Down) based on Linux Distro
Uptime
CPU Utilization
Memory Utilization
CPU Temperature
Services running on the device
RX: how many keepalives have been received
Users can set the timer after how many minutes of idle state the ONES-UI should be logout
Thresholds we can set for Components :
CPU Utilization(%)
Memory Utilization(%)
CPU Temperature(℃)
PSU Temperature(℃)
PSU Voltage(V)
Fan Speed(%)
SSD Health(%)
SSD Temperature(℃)
SSD Used Memory(%)
As per requirement, we can set different lower and higher threshold values for each component and Users can see the acceptable and critical number of devices in the Monitor Health tab view
Navigate to Monitor >> Platform when any component breaches the higher value
Thresholds
Navigate to Settings >> Thresholds
Change the values as per your requirements
Update it to get these new settings live, after Save Changes, all these metrics will be reflected on devices metric pages under Inventory
Application Control
Using this tab user can control the timer of all the widgets refresh and idle timers of the application
Navigate to Settings >> Application
Using this page, users can change the refresh interval in seconds for all the widget
In the dropdown menu, available intervals are:
30 Seconds
60 Seconds
90 Seconds
120 Seconds
Users can set the timeout in minutes(2-60) for the ONES-UI, by default the timeout is disabled
Zendesk API
Enable Token Access
Give API Token Description (Optional)
Copy the API Token
Save the Settings
Open ONES-App and select Integration >> Ticketing
Add Channel & Paste the required details
After saving it will be available to use while creating any rule using Rule Engine feature
Enable the Service
Slack Channel Integration
1. Create a Channel for ONES-App push notification
Navigate to Cloud Service >> Integrations >> Amazone S3
Add Instance
The user needs to add all these mandate details of the Amazone S3 instance
Splunk
Steps to onboard splunk
Navigate to Cloud Service >> Integrations >> Splunk
Add instance
The user needs to add all these mandate details of the Splunk Instance
VXLAN-SVI
Provide any App Name and choose the workspace where the user wants to get the push notification & Create App
Choose Incoming Webhook and Activate Incoming Webhooks & Add New Worbhook to workspace
Select the configured Channel & Allow
Copy the newly created webhook link
Open ONES-App and select Integration >> Messaging
AddChannel & Paste the Webhook URL
After saving it will be available to use while creating any rule using Rule Engine feature
Nickname
ARN Role
Region
Bucket Name
External ID
All these values the user needs to fetch from the Amazon S3 instance
After saving the details it will be available to push the metrics to the Amazon S3 instance
Catalog Update
Using this option user will have control over metrics and the frequency of streaming the metrics to the Amazon S3 Instance
Users can modify the metrics as per interest and can set a frequency from 1 minute to 60 minutes
Nickname //can be any
Endpoint URL
Token
Index
All these values user needs to fetch from Splunk instance
After saving the details it will be available to push the metrics to the Splunk Instance
Catalog Update
Using this option user will have control on metrics and the frequency of streaming the metrics to Splunk Instance
Users can modify the metrics as per interest and can set a frequency from 1 minute to 60 minutes
Input the details
ONES Orchestration
This section explains how large data centers can be designed seamlessly using ONES.
Configuring Devices
Most fabric orchestration solutions available today are complex and often difficult to understand. ONES provides simple and effective tools, such as predefined templates (YAML file), to configure data centers at scale.ONES allows a customized way of configuring devices that includes enhancements to the standard configuration.
Configuration Commands:
Save Config:
Copy to File:
Restore Config: //If needed only
Speed config for the Host Facing port should be updated before Orchestration if there is any change in the default speed config.
Ex: Interface speed is 25G, but you can use it with 10G Transceiver. In such cases, user needs to update speed.
Configuration Commands:
Save Config:
Copy to File:
Restore Config: //If needed only
You need to provide the following inputs to configure the devices:
Local AS number - Local BGP AS number a user wants to use
Subnet Details
IPv4 Subnet - IPv4 address range a user wants to use in the domain
IPv4 Loopback - IPv4 address range to use on the devices only for the loopback address
IPv6 Subnet - IPv6 address range a user wants to use in the domain
Connectivity
Link connectivity between (SuperSpine, Spine, Leaf & TOR)
Link Type (Access Trunk)
Layer-2 / Layer3 - interface type
MCLAG Details
VLAN - VLAN to b used for interfaces
PO Group - PortChannel number to be used to bundle the interface
Keepalive VLAN - VLAN a user wants to use to send keepalive messages
VRF number - VRF number to be used for MC-LAG
Host Interface
L2 Access & VLAN - Host facing interface with Access port & VLAN allocation
L2 Trunk & VLAN - Host facing interface with trunk port & VLAN allocation
L3 - Host facing interface with Layer 3 properties
Network Service Address
NTP server - NTP server a user wants to add
Syslog - Syslog server IP address
SNMP - SNMP Server address a user wants to add
VXLAN
VLAN Range - VLAN range a user wants to use for VXLAN Ex. 200-205
VNI Range - VNI range a user wants to use for VXLAN Ex.20000-20005
Any-Cast Gateway - Any-cast Gateway Subnet
Host Per Vlan - Allocation of Host per VLAN
IRB VLAN Range - IRB VLAN is a separate input and should not overlap with VXLAN VNI
ONES requires a minimal set of information from the users to configure the devices. The tool is simple to use and allows to configure a large number of devices simultaneously.
Configuration Overview
This section explains the device configuration procedures, that gets applied seamlessly across the entire fabric
Navigate to Configurations >> Devices
Click on the Configure Devices button in the top right corner. This opens a new screen with a sample device configuration. You can edit the configurations directly in the UI to make the desired changes. You can also download the sample YAML file by clicking on Download YAML button, provided at the bottom of the above screen, make new changes and upload the YAML file using UploadYAML button.
A detailed explanation of all the intend fields
Inventory: Specify how many devices you want to add in any particular role - Super Spine, Spine, Leaf, and ToR.
Connectivity: Specify the parameters required to establish link connectivity such as:
Device switch ID: unique ID for every device, required to correctly render the topology
Switch name: hostname of the device
IP address: management address of the device
BGP: Specify if you want to enable regular BGP peering or BGP unnumbered peering.
PhysicalIfCfg: Enable or disable FEC and change the MTU settings on all the links being configured.
ASN: Assign a BGP ASN (Autonomous System Number) from the specified pool. Dynamic assigning ASN will be implemented in ONES release 2.0. For release 1.0, you need to specify the ASN under the device configuration, as shown in the sample YAML file.
IPv4Pool: Assign IP pools to different subnets. ONES automatically divides the subnets according to the number of available links.
ONES application uses IPv4 subnets for:
Interfaces
Loopbacks
Host interfaces
IPv6Pool: Assign IPv6 subnet.
ONES application uses:
IPv6 subnet for interfaces
Subnets to configure BGP neighborship
Automatically advertise these subnets in BGP
Note* IPv6 loopback is not supported
NTP: Provide the NTP server address, to enable NTP. You can choose your desired timezone.
Supported Time Zone
Africa/Abidjan
Africa/Accra
Africa/Addis_Ababa
Africa/Algiers
Africa/Asmara
SYSLOG: Provide the SYSLOG server address, to enable SYSLOG.
SNMP: Provide the SNMP server address, to enable SNMP.
Parameters: Provide user the ability to enable VXLAN and related Parameters.
Creating Configuration
Navigate to Configurations >> Devices >> Configure Devices
Applying Configuration
Click on Apply Configs button in the bottom right corner of the above screen, to push the configs across the entire fabric.
ONES provides real-time updates when the devices are being configured and validates the configurations automatically to ensure the network is ready to use.
You will see the below screen, after the configurations are successfully verified:
FRR config Issue
With Few versions of FRR, when user gives reboot, existing config is erased and default config is pushed to FRR. This will lead to missing the configuration for users.
below is the workaround config in this situation
Go inside BGP Container & Open docker_init.sh
Look for "Split" Keyword & Comment out the statemets inside that with if conditions
Add a new line which writes Service integrated in vtysh.conf
Various fields shown in the top right corner of the above slide are explained below:
Connect
Configuration Page allows a user to connect to the device using console access or using SSH method too.
Navigate >> Configurations >> Devices >> Connect
User can choose SSH or console option to access the device.
Console Logs
Console Logs show the exact configuration loaded on the device and the overview of the config loaded as per YAML & Derived Host IP Range can be viewed from the "Derived_Config:" section
This page allows the user to compare applied configurations to the running configuration of a selected device.
The user should select only one device from the list and perform 'Compare Config' on this page, it may take several minutes to fetch the running configuration from the device
YAML Editor will appear with two windows comparing applied and running configurations as depicted in the below picture
The difference in configuration is highlighted with colour coding in the respective window of the configuration
Backup & Configuration
This Section describe the use of Backup & Restore Configuration across all the managed devices
Using this feature, a user will be able to take multiple backups and will be able to restore the backup at any time.
Taking a Backup
Select Backup & Restore Configs
Give a Tag Name to the device & Select the device, the device you want to take a backup config
Submit the task
Once submitted the config will backup with the given name & can be used to restore the config at any time in future
Restoring a Backup
Select Backup & Restore Configs
Select Restore Config
Click on the drop-down button of the device on which the user wants to restore the config
Select the backup a user wants to restore from the Backups Available list
Then Submit the backup & Click on Yes to confirm
VXLAN-Symmetric-SAG-no-mclag-vrf
If devices are already holding V1.3 orchestrated configuration, then V2.0 configuration would not be integrated along, the user needs to use the V2.0 template only while using GA2.0
VXLAN-Symmetric-SAG-no-mclag-vrf Standard Template
Supported Time Zone
Africa/Abidjan
Africa/Accra
Africa/Addis_Ababa
Africa/Algiers
Africa/Asmara
VXLAN-Symmetric-SVI-no-mclag-vrf
If devices are already holding V1.3 orchestrated configuration, then V2.0 configuration would not be integrated along, the user needs to use the V2.0 template only while using GA2.0
VXLAN-Symmetric-SVI-no-mclag-vrf Standard Template
Supported Time Zone
Africa/Abidjan
Africa/Accra
Africa/Addis_Ababa
Africa/Algiers
Africa/Asmara
Accounts
Overview
Use this feature to
Create new users and roles
Remove and suspend existing user one by one
Remove and suspend multiple users at the same time
Password reset of existing users
Only super admin or Enterprise admin can perform these actions
Users
Navigate to Accounts >> Users
Initially, we get one default Admin User Credential
In this view, we get the User status and Role given to it and the Last login time by the user.
On this User tab, we can Add new users and can remove or suspend any existing users
Users - Add New
Navigate to Accounts >> Users >> Add
Admin can add
Profile picture
Username
"on the first login by the user with given details, ONES application prompt with a screen to rest the password
Now User is ready to login with valid credentials
Users - Reset Password <Pending due to a bug>
Navigate to Accounts >> Users >> Edit User >> Reset Password >> Yes
Using Admin credentials you can reset the user password
Click on Reset Password
Submit & Save
After this user can try login with the temporary password, on the first login, ONES application actively asks to change the password
Users - Remove User
Navigate to Accounts >> Users >> ((Select Users you want to remove)) >> remove >> Yes
We can remove multiple users at a time
After this, the user will be removed from the database and cannot use credentials to login again
Users - Suspend User
Instead of removing any user, we can also suspend the user
Once we suspend a user it will not be removed from the database but it will be in an Inactive state in the database
Later if we need we can restore the user to its active state
Choose users & click on Suspend
Users - Restore User
Navigate to Accounts >> Users >> ((Select Users you want to restore)) >> Restore >> Yes
Roles
Navigate to Account >> Roles
By default, the device comes with these 4 Roles
Super Admin
Enterprise Admin
Roles - Add User Roles
Navigate to Accounts >> Roles
Now let's add few extra permissions
Add/Remove Devices
After giving permissions, the user can be added to this role in the user section.
VXLAN-Symmetric-SAG-mclag-vrf
If devices are already holding V1.3 orchestrated configuration, then V2.0 configuration would not be integrated along, the user needs to use the V2.0 template only while using GA2.0
If devices are already holding V1.3 orchestrated configuration, then V2.0 configuration would not be integrated along, the user needs to use the V2.0 template only while using GA2.0
If devices are already holding V1.3 orchestrated configuration, then V2.0 configuration would not be integrated along, the user needs to use the V2.0 template only while using GA2.0
The device should not have any IP, VLAN, Portchannel, BGP, SAG, or MCLAG config on any of the Interfaces to avoid overlapping during the orchestration and run into cleanup issues.
Post that, the user needs to save the config to a file, in case of any orchestration failure, or misconfiguration user can rollback to the saved config.