17 January 2019
How to Feed Your Network Monitoring Solution
Introduction
A common challenge I hear from prospective customers is their concern with the number of resources needed for the daily upkeep of a network monitoring solution. Resources are at a premium, and making sure devices are added, updated, and retired from the monitoring platform is commonly a low-priority task, often relegated to inexperienced engineers if not forgotten altogether.
Opmantek was recently selected by NextLink Internet, a Wireless ISP located in Hudson Oaks, Texas, to provide solutions around fault and performance monitoring, event and configuration management, and NetFlow analysis. Like many other clients, a key requirement of Ross Tanner, NextLink’s Director of Network Operations, was automating the general upkeep of devices, or as Ross put it “the daily feeding and watering of the solution”.
Operational Process Automation
Definition
Operational Process Automation (OPA) is all about using digital tools to automate repetitive processes. Sometimes fully autonomous automation can be achieved, but more often complex workflows can make use of partial automation with human intervention at key decision points.
Automating the Feeding and Watering
The key to maintaining the list of devices to be monitored is keeping track of new, existing, and retired devices. Opmantek’s suite of network monitoring tools includes Open-AudIT, an agentless device discovery and auditing platform. While Open-AudIT contains a built-in connection to Opmantek’s NMIS fault and performance platform, the connection required significant manual intervention which could not scale easily to the scope needed by NextLink.
As part of system implementation, Opmantek conducted onsite interviews with NextLink’s engineering teams; everyone from internal architects to field managers, to understand their concerns and requirements. As a result, it was quickly determined that Open-AudIT’s existing link to NMIS needed to be automated in a way that was easy to set up and maintain, even by novice engineers.
As NextLink was deploying a 2-tiered monitoring architecture, comprised of a series of pollers connecting directly with devices and reporting back to one or more primary servers, the solution would need to scale horizontally as well as vertically. While NextLink intended to start with a single server dedicated to device discovery and auditing, the solution would also need to be flexible enough to support multiple Open-AudIT servers.
These conversations resulted in the design and development of opIntegration to intelligently link Open-AudIT with NMIS.
“At Nextlink we care for our customers, we want them to succeed as much as we do, when it comes to partnering with vendors that is a large deciding factor for us. With Opmantek it was never a question… we could not have asked for a better team to work with on going to the next level of monitoring and automation.”
Ross Tanner, Director of Network Operations, NextLink Internet
Use Cases
The first step in developing an automation system is to identify the most common use cases, and if time permits, as many edge cases as possible. For this implementation, Opmantek’s engineering team storyboarded the following as a version 1 release:
New Device Added to Network
A list of devices would be periodically pulled from Open-AudIT via the Open-AudIT API and added to the NMIS server. By maintaining a list of integrated devices, opIntegration will know if a device was new, or if an update was being provided to an existing integrated device.
Existing Device Replaced (same IP and/or device Name)
It is not uncommon, especially in WISPs like NextLink, to regularly swap out in-field equipment due to failure or simply as part of a planned upgrade. Depending on your configuration of Open-AudIT, these replacement devices can either be categorized as a new device (usual) or overwrite an existing device entry (considered an edge case). As a result, opIntegration will either add the new device as previously described or update an existing device entry in NMIS with the new device type.
Device Retired/Removed from Network
Queries for devices not seen for several audit cycles are already included with Open-AudIT. Once a device has exceeded a given period (not seen for y audit cycles or x number of days) then a custom query would be used by opIntegration to retrieve that list, and set those devices to inactive in NMIS, effectively retiring them without deleting their historical data. Permanently removing the device from NMIS would remain a manual, user-initiated step.
Add Device(s) Manually to NMIS
In addition to creating an automation path, it was imperative that the solution allow and account for users manually adding devices to NMIS either through the GUI or some import process.
Building the Feeding System
Creating the Proof of Concept
The initial Proof of Concept (POC) leveraged Open-AudIT’s powerful API to retrieve a list of devices for each poller. This list was created using custom queries built in Open-AudIT. By using custom queries, users would be able to very granularly control the list of devices being sent to each NMIS poller. Once each poller had its list of devices, opIntegration would then utilize NMIS’ Node Administration function to manage adding, updating, and retiring devices from NMIS. A series of configuration files on each NMIS poller would control the Open-AudIT query to be executed, manage specifics like NMIS Group assignment and other parameters. A simple cron job would call opIntegration on whatever cadence the client desired.
User Acceptance Testing (UAT) went well, with only minor changes to the initial code base, primarily in the areas of debugging and visual presentation. After operating successfully on-premise with NextLink for 90-days the solution passed Opmantek’s internal tests and validations and was determined stable enough for inclusion in shipping code.
Next Steps in Automation
opIntegration will be included natively starting with the pending release of Open-AudIT 3.1. While the POC version was driven from the command line, Open-AudIT 3.1 will include a fully detailed GUI under Manage -> Integrations, to make configuration straight forward for all users. While the GUI will be designed to configure a single server (i.e. Open-AudIT and NMIS installed on the same server) the Integration can be used to set up configuration with remote NMIS platforms by copying the resulting configuration and script files to the remote NMIS server. Integrations can also be scheduled like any other Task in Open-AudIT, providing a simple GUI to create a detailed schedule.
How it looks
From the Open-AudIT GUI, navigate to Manage -> Integrations -> List Integrations.
Conclusion
Operational Process Automation is a large concept, often traversing multiple processes and stages. However, by prioritizing problem points, identifying manpower intensive steps, and focusing automation efforts on those items you can achieve significant improvements in performance, reliability, and satisfaction. With the new Integration routines that are built-into Open-AudIT Professional and Enterprise, users can easily automate the feeding and watering of NMIS for live performance and fault monitoring.
“With the integration of these two powerful systems, it has given us the automation that we have dreamed of in Operations. No longer are there missing gaps in monitoring or inventory, nor do you have to worry about the device model being incorrect as the system does it for you.”
Ross Tanner, Director of Network Operations, NextLink Internet