Using Postman to Query The Open-AudIT API

I often utilise Postman to query the Open-AudIT API when developing. Just using a browser, it’s difficult to send anything other than a GET request – but Postman makes it simple to send a POST, PATCH or DELETE as required. You can get it from https://www.getpostman.com/downloads/ for Windows, Mac and Linux.

Install and start Postman. You can elect to create an account or not. You can also elect to create a new item using the wizard, or just close the modal and jump in. Let’s do that!

For the below, my Open-AudIT server is running on 192.168.84.4. You should substitute the IP address of your Open-AudIT server.

First, you need to make a post to /login to get a cookie. Set the dropdown to POST and the URL to http://192.168.84.4/omk/open-audit/login. Set the header Accept to application/json. Set the Body to form-data and provide the username and password keys, with values as appropriate for your installation. By default, it will look as below.  Now click the Send button.

Postman Open-AudIT API 1 - 650
Postman Open-AudIT API 2 - 650

You should see the JSON result saying you have been authenticated.

Once that’s done, it’s time to request some data. Make a GET request to http://192.168.84.4/omk/open-audit/devices and you should get a JSON response containing a list of devices. You can see the start of the JSON in the screenshot below.

Postman Open-AudIT API 3 - 650

What about changing the attribute of an item? Not too difficult. You’ll need the ID of the device you want to change, along with the attribute name from the database. You can see these in the application by going to menu → Admin → Database → List Tables and clicking on the “system” table. Let’s change the description for our device with ID 14.

You’ll need to create a JSON object and assign it to the “data” item to do this. It’s not too difficult. Your JSON object should look like below (formatted and indented for easy reading).
{
"data": {
"id": "14",
"type": "devices",
"attributes": {
"description": "My New Description"
}
}
}

It looks worse than it is. Normally you would use code to do this, so it’s a simple two line conversion. Because we’re using Postman, we’ll have to do it ourselves. A useful site is https://jsonlint.com/

So now you have your payload, let’s send it to Open-AudIT. Make a new PATCH request and use the URL http://192.168.84.4/omk/open-audit/devices/14.
Supply the data attribute in the body → x-www-form-urlencoded section and hit Send. You should see the request as below.

Postman Open-AudIT API 4 - 650

Deleting an item is even easier. Let’s delete an Org. In this case, our Org with ID 2. Make a new DELETE request to http://192.168.84.4/omk/open-audit/orgs/2. That’s it – easy!/span>

And if we want to read a specific entry, it’s just a GET request. Let’s get our default Org – ID 1. Just make a GET to http://192.168.84.4/omk/open-audit/orgs/1.

What about running a query? What’s the HTTP verb used to EXECUTE something? There is none. But we’ll make do by supplying /execute after the ID. So to execute a query, make a GET request to http://192.168.84.4/omk/open-audit/queries/1/execute. To execute a discovery, task or baseline, use the same format – ID/execute.

Remember we always receive the result in JSON as that is in our request header. We could receive it as HTML is we want – just remove that header item. Maybe more useful is a CSV output. Remove the Accept header and change the URL for a GET to http://192.168.84.4/omk/open-audit/queries/1/execute?format=csv. Done – CSV output you can copy and paste into Excel.

It really is that simple. The only one to watch is the PATCH request because you have to create your own JSON. Just about everything else is quite discoverable. Make sure you check the pages for Collections which detail the request formats. And don’t forget the Open-AudIT API page as well.

Onwards and upwards.
Mark Unwin.

Uncategorized

Open-AudIT | Device SubSection Data Retention Options

With the release of Open-AudIT 3.1.0, we have massively expanded the options around keeping and processing data from devices. SubSections of a device within Open-AudIT refers to the many tables that hold specific data types – software, netstat ports, processors, memory, disks, users, groups, etc, etc. These options exist (for now at least) in the Configuration of Open-AudIT. The items of interest are create_change_log* . and delete_noncurrent*. We previously had these options for a couple of select couple of Subsections, but have expanded these to cover every subsection.

Create Change Logs

The items named create_change_log_* use the database table names to specify which subsection they apply to – so create_change_log_software and create_change_log_memory are both valid examples. You can override ALL items by setting create_change_log to “n” – this will stop any change logs being generated, regardless of the individual table setting. So if a device has a piece of software added (for example), a corresponding change log would not be inserted if create_change_log_software was set to “n”. This is set to “y” by default. This matches how Open-AudIT has always worked.

Special Items

We have also introduced three special configuration items for Netstat Ports. Because ports above 1024 are mostly designed to be dynamic, we now provide three options for keeping this data:

  • create_change_log_netstat_registered
  • create_change_log_netstat_well_known
  • create_change_log_netstat_dynamic

These options correspond to the ports 0-1023, 1024-49151 and 49152-65535. See this wiki list of TCP and UDP port numbers. In particular, Windows DNS servers open a LOT of ports high in the range that are (in my opinion) silly to keep track of, see here and here. By default, only create_change_log_netstat_registered is set to “y”. We may add to these options in the future for other subsections if required.

Delete NonCurrent Items

Along similar lines, the configuration items for delete_noncurrent* use the database table names to specify which subsection they apply to. If set to “y”, then no historical entries will be kept for that table, only the “current” items as at the last audit (or discovery). Again, these individual items can be overridden by the global “delete_noncurrent” item. If set to “y”, it will remove all noncurrent items from all tables. This is set to “n” by default. This matches how Open-AudIT has always worked.

Hopefully, these options provide some customisability for you to only keep the data you actually need.

Onwards and upwards.

Mark Unwin.

Uncategorized

Open-AudIT | The Default Network Address

With the new release of Open-AudIT 3.1.0, we no longer require the configuration item “default”network”address” to be set for Discoveries. It is still required for the “Audit My PC” functionality, but we hope to minimise this dependence going forward as well.

Why was Default Network Address required?

Initially, when we ran a discovery, on both Linux and Windows, we ran the audit script in such a way that it needed to know where to submit its results. What URL should it use – hence the requirement for the configuration item. A while back now we changed how Discoveries ran under Linux, removing this requirement.

Linux

Linux discoveries send the audit script to the target, run it with a flag of “submit_online = n” and “create_file = w”. So do not submit the result to the server, create a file and output the filename to the console. The server waits for the script to finish and captures the console output. It now has the filename of the result on the target system. It copies the result from the target to itself and processes it. All good so far.

Windows

We could never make Windows work this way. The account we use for Apache is the standard “Local System” account. This account has no access to network resources. Hence it cannot simply copy the script to or from a target PC. This was always a pain because the Linux way of running the Discovery was so much better and cleaner. After some (more) research we realised we can use network resources via “net use” – we simply don’t assign a drive letter. Yay! So Windows now can copy the audit script to the target, run it, wait for the console output and then copy the result file back and process it, just like Linux.

Finally!

All that is a long explanation for “we don’t need the default network address set”. That’s one less item a user needs to worry about.

We do still have the requirement to set the default network address for the functionality of the “Audit My PC” on the login page. We have plans to minimise this as well – if you can view the login page, we can use the request URL and work out what the default network address should be.

For now, it’s still required (as at 3.1.0), but look for it to be removed as a requirement in a near future release.

One step at a time, we’re trying to make Open-AudIT as easy to use as possible.

Onwards and upwards.

Mark Unwin.

Uncategorized

How to Feed Your Network Monitoring Solution

Introduction

A common challenge I hear from prospective customers is their concern with the number of resources needed for the daily upkeep of a network monitoring solution. Resources are at a premium, and making sure devices are added, updated, and retired from the monitoring platform is commonly a low-priority task, often relegated to inexperienced engineers if not forgotten altogether.

Opmantek was recently selected by NextLink Internet, a Wireless ISP located in Hudson Oaks, Texas, to provide solutions around fault and performance monitoring, event and configuration management, and NetFlow analysis. Like many other clients, a key requirement of Ross Tanner, NextLink’s Director of Network Operations, was automating the general upkeep of devices, or as Ross put it “the daily feeding and watering of the solution”.

 

Operational Process Automation

Definition

Operational Process Automation (OPA) is all about using digital tools to automate repetitive processes. Sometimes fully autonomous automation can be achieved, but more often complex workflows can make use of partial automation with human intervention at key decision points.

 

Automating the Feeding and Watering

The key to maintaining the list of devices to be monitored is keeping track of new, existing, and retired devices. Opmantek’s suite of network monitoring tools includes Open-AudIT, an agentless device discovery and auditing platform. While Open-AudIT contains a built-in connection to Opmantek’s NMIS fault and performance platform, the connection required significant manual intervention which could not scale easily to the scope needed by NextLink.

As part of system implementation, Opmantek conducted onsite interviews with NextLink’s engineering teams; everyone from internal architects to field managers, to understand their concerns and requirements. As a result, it was quickly determined that Open-AudIT’s existing link to NMIS needed to be automated in a way that was easy to set up and maintain, even by novice engineers.

As NextLink was deploying a 2-tiered monitoring architecture, comprised of a series of pollers connecting directly with devices and reporting back to one or more primary servers, the solution would need to scale horizontally as well as vertically. While NextLink intended to start with a single server dedicated to device discovery and auditing, the solution would also need to be flexible enough to support multiple Open-AudIT servers.

These conversations resulted in the design and development of opIntegration to intelligently link Open-AudIT with NMIS.

 

“At Nextlink we care for our customers, we want them to succeed as much as we do, when it comes to partnering with vendors that is a large deciding factor for us. With Opmantek it was never a question… we could not have asked for a better team to work with on going to the next level of monitoring and automation.”

Ross Tanner, Director of Network Operations, NextLink Internet

 

Use Cases

The first step in developing an automation system is to identify the most common use cases, and if time permits, as many edge cases as possible. For this implementation, Opmantek’s engineering team storyboarded the following as a version 1 release:

 

New Device Added to Network

A list of devices would be periodically pulled from Open-AudIT via the Open-AudIT API and added to the NMIS server. By maintaining a list of integrated devices, opIntegration will know if a device was new, or if an update was being provided to an existing integrated device.

Existing Device Replaced (same IP and/or device Name)

It is not uncommon, especially in WISPs like NextLink, to regularly swap out in-field equipment due to failure or simply as part of a planned upgrade. Depending on your configuration of Open-AudIT, these replacement devices can either be categorized as a new device (usual) or overwrite an existing device entry (considered an edge case). As a result, opIntegration will either add the new device as previously described or update an existing device entry in NMIS with the new device type.

 

Device Retired/Removed from Network

Queries for devices not seen for several audit cycles are already included with Open-AudIT. Once a device has exceeded a given period (not seen for y audit cycles or x number of days) then a custom query would be used by opIntegration to retrieve that list, and set those devices to inactive in NMIS, effectively retiring them without deleting their historical data. Permanently removing the device from NMIS would remain a manual, user-initiated step.

 

Add Device(s) Manually to NMIS

In addition to creating an automation path, it was imperative that the solution allow and account for users manually adding devices to NMIS either through the GUI or some import process.

 

Building the Feeding System

Creating the Proof of Concept

The initial Proof of Concept (POC) leveraged Open-AudIT’s powerful API to retrieve a list of devices for each poller. This list was created using custom queries built in Open-AudIT. By using custom queries, users would be able to very granularly control the list of devices being sent to each NMIS poller. Once each poller had its list of devices, opIntegration would then utilize NMIS’ Node Administration function to manage adding, updating, and retiring devices from NMIS. A series of configuration files on each NMIS poller would control the Open-AudIT query to be executed, manage specifics like NMIS Group assignment and other parameters. A simple cron job would call opIntegration on whatever cadence the client desired.

User Acceptance Testing (UAT) went well, with only minor changes to the initial code base, primarily in the areas of debugging and visual presentation. After operating successfully on-premise with NextLink for 90-days the solution passed Opmantek’s internal tests and validations and was determined stable enough for inclusion in shipping code.

 

Next Steps in Automation

opIntegration will be included natively starting with the pending release of Open-AudIT 3.1. While the POC version was driven from the command line, Open-AudIT 3.1 will include a fully detailed GUI under Manage -> Integrations, to make configuration straight forward for all users. While the GUI will be designed to configure a single server (i.e. Open-AudIT and NMIS installed on the same server) the Integration can be used to set up configuration with remote NMIS platforms by copying the resulting configuration and script files to the remote NMIS server. Integrations can also be scheduled like any other Task in Open-AudIT, providing a simple GUI to create a detailed schedule.

How it looks

From the Open-AudIT GUI, navigate to Manage -> Integrations -> List Integrations.

Integrations 1 - 700
This will provide you with a list of all integrations that have been created.
Integrations 2 - 700
If we click on the blue details icon it will give you a summary of the integration if you have not run this before, the green execute button will launch this process for you.
Integrations 3 - 700
By clicking the devices tab you will see exactly which devices are included in the integration.
Integrations 4 - 700

Conclusion

Operational Process Automation is a large concept, often traversing multiple processes and stages. However, by prioritizing problem points, identifying manpower intensive steps, and focusing automation efforts on those items you can achieve significant improvements in performance, reliability, and satisfaction. With the new Integration routines that are built-into Open-AudIT Professional and Enterprise, users can easily automate the feeding and watering of NMIS for live performance and fault monitoring.

“With the integration of these two powerful systems, it has given us the automation that we have dreamed of in Operations. No longer are there missing gaps in monitoring or inventory, nor do you have to worry about the device model being incorrect as the system does it for you.”

Ross Tanner, Director of Network Operations, NextLink Internet

Uncategorized

How to Purchase Open-AudIT Professional

Getting Open-AudIT Professional has never been easier.

 

The Discovery and Audit software used across 95% of the globe can be yours with a few easy steps.

This guide assumes that you have Open-AudIT installed and running if you aren’t at that stage yet, these links will help you:

Once you have Open-AudIT installed, you can navigate to the ‘Upgrade Licenses’ menu item and click on ‘Buy More Licenses’.

This will bring up the feature and price list for Open-AudIT. Click on the node count that suits your needs.

Currently, only Professional license can be purchased online, if you wish to purchase an Enterprise license, you can request a quote from our team.

The next screen will confirm your selection and you can proceed to the checkout.

Fill in all the details that you would like associated with the account, the email address will be used to create an account that is required to access the licenses/support.
Once the payment has been processed our team will email you a confirmation and a license key for the software. To add this navigate back to the ‘Upgrade Licenses’ menu item, this time clicking ‘Restore My Licenses’.N.B. The license will be automatically added to your account if you have an Opmantek User account – register here!

Click on the ‘Enter License Key’ button and that will show a text box for you to paste the license key into and add it to your profile.

After that, you will have full access to Open-AudIT Professional.

Uncategorized

A Primer in Root Cause Analysis

We’ve seen it time and time again, a ticket comes into the help desk, a customer is complaining about a slow application or poor voice quality during a call. We start digging into the problem, maybe pull some logs, grab some performance data from the NMS. Everything we find is inconclusive, and when we check back with the client the symptoms have passed. The problem is gone, and another ticket is in the queue, so we move on – no wiser as to what caused the issue – knowing that it will reappear.

The process of getting to the core, or root of a fault or problem is called Root Cause Analysis (RCA). Root Cause Analysis is not a single, stringent process, but rather a collection of steps, often organized specifically by type of problem or industry, which can be customized for a particular problem. When it comes to identifying the root cause of an IT-related event a combination of process-based and failure-based analysis can be employed. By applying an RCA process, and remediating issues to prevent their future occurrence, reactive engineering teams can be transformed into proactive ones that solve problems before they occur or escalate.

In this article I will attempt to outline a general process for troubleshooting network-related events, meaning those issues which directly impact the performance of a computer network or application resulting in a negative impact on user experience. While I will use Opmantek’s Solutions in the examples, these steps can be applied to any collection of NMS tools.

 

Introduction to Root Cause Analysis

Describing the RCA process is like peeling back an onion. Every step of the process is itself comprised of several steps. The three main steps of the RCA process are included below. The first two steps are often completed in tandem, either by an individual or by a team in a post-mortem incident review meeting.

  1. Correctly and completely identify the event or problem,
  2. Establish a timeline from normal operation through to the event,
  3. Separate root causes from causal factors

 

Identifying the Event or Problem

Completely and accurately identifying the problem or event is perhaps the easiest part of RCA when it comes to networking issues.

That’s right, I said easiest.

It’s easy because all you have to do is ask yourself Why? When you have an answer to the question Why ask yourself why that thing occurred. Keep asking yourself Why until you can’t ask it anymore – usually that’s somewhere from 4-5 times. This process is often referred to as the 5 Whys.

Many engineers advocate utilizing an Ishikawa, or fishbone diagram to help organize the answers you collect to the 5 Whys. I like this myself, and often utilize a whiteboard and sticky notes while working the problem. If you prefer using a software diagramming tool that’s fine, just use what is comfortable for you.

 

Example – The Power of Why

Here’s a real-world example Opmantek’s Professional Services team encountered while conducting onsite training in system operation. An internal user called into the client’s help desk and reported poor audio quality during a GoToMeeting with a potential customer.

  1. Why? – A user reported poor voice quality during a GoToMeeting (first why)
  2. Why? – Router interface that services switch to user’s desktop experiencing high ifOutUtil (second why)
  3. Why? – Server backups running during business hours (third why)
  4. Why? – cron job running backup scripts set to run at 9 pm local timezone (fourth why)
  5. Why? – Server running cron job is configured in UTC (fifth why)

 

The team started with the initial problem as reported and asked themselves Why is this happening. From there, they quickly came up with several spot checks and pulled performance data from the switch the user’s desktop was connected to, and the upstream router for that switch; this identified a bandwidth bottleneck at the router and gave us our second Why.

Once the bandwidth bottleneck was identified, the engineers used our solutions to identify where the traffic through the router interface was originating from. This gave them the backup server, and a quick check of running tasks identified the backup job – and there the third Why was identified.

System backups were handled by a cron job, which was scheduled for 9 pm. A comment in the cron job suggested this was meant to be 9 pm local timezone (EST) to the server’s physical location. This gave the team the fourth Why.

A check of the server’s date and time indicated the server was configured for UTC, which gave us the fifth Why.

Not every problem analysis will be this simple, or straightforward. By organizing your Why questions, and their answers, into a fishbone diagram you will identify causes (and causal factors) leading to a problem definition and root cause. In short, keep asking Why until you can’t ask it any further – this is usually where your problem starts.

 

Establish a Timeline

When establishing a timeline it’s important to investigate both the micro (this event’s occurrence) and the macro (has this event occurred in the past).  Thinking back to grade school mathematics, I draw a timeline, a horizontal line, across my whiteboard. In the center I place a dot – this is T0 (time zero) when the event occurred.

To the right of TI add tick marks for when additional details were discovered, when the user reported the issue, and when we collected performance or NetFlow information. I also add in marks for when other symptoms occurred or were reported, and for any additional NMS raised events.

To the left of the T0, I place everything we learned from asking Why – when did the backups start, when should they have started? I also review my NMS for events leading up to the performance issue; was interface utilization slowly on the rise, or did it jump dramatically?

Once I have mapped the micro timeline for this one occurrence I begin to look back through my data. This is where having a good depth of time-related performance information comes in handy. Opmantek’s Network Management Information System (NMIS) can store detailed polling data indefinitely which allows fast visual analysis for time-based recurring events.

Timeline - 700

Example – The Power of Time

As the engineers worked through their Why questions and built a fishbone diagram, they also created a timeline of the event.

They started defining T0 as when the event was reported, but as they collected data adjusted this to when the impact on the network actually started.

To the right of T0, they added in when the user reported the problem, when they started the event analysis, when performance data was collected from the switch and router, and the NetFlow information from the NetFlow collector. They also add in marks when other users reported performance impacts, and when NMIS raised events for rising ifOutUtil on both the router and backup server interfaces.

To the left of T0, they added when the backups started as well as when they should have started.  They reviewed NMIS and found the initial minor, major, and warning events for rising ifIOutUtil on the router interface.

Once the timeline was complete the engineering team went on to look for past occurrences of this event. By widening the scale on the router’s interface graphs the engineers could instantly see this interface had been reporting high ifOutUtil at the same time every weekday for several weeks. This cyclic behavior suggested it was a timed process and not a one-time user related event.

 

Root Causes vs. Causal Factors

As you build out and answer your Why questions you will inevitably discover several possible endpoints, each a potential root cause. However, some of these issues will simply be an issue caused by the root cause – a causal factor – and not the root cause itself.

It is critical to effecting long-term improvements in network performance that these causal factors be identified for what they are, and not misrepresented as the root cause.

 

Example – Distracted by a Causal Factor

The engineering team quickly realized that all servers had, at one time on the past, been configured for local timezones and had only recently been standardized to UTC. While a process had been in place to identify schedules, like this cron job, and update them for the change to UTC, this one had been missed.

Some members of the team wanted to stop here and fix the cron schedule. However, the wider group asked: Why was a cron job for a critical process, on a critical server, missed in the update process?

Pulling the list of processes and files from the update team’s records showed this file HAD been identified and updated, testing had been completed and verified. This brought about the next question: Why was the updated cron job changed, by who/what process?

While you can address causal factors they are often just a temporary fix or workaround for the larger issue. Sometimes this is all that can be done at the time, but if you don’t identify and completely address the root cause any temporary solutions you apply to causal factors will break down over time.

 

Example – Finding the Root Cause

Upon digging further, the engineers discovered that the cron job had been properly updated, but an older archived version of the cron job had been copied onto the server via a DevOps process. A Tiger Team was put together to research the DevOps archive and determine the extent of the problem. The Tiger Team reported back to the engineering group the next day; other outdated schedule files were found and also corrected. The engineering team worked with the DevOps engineers to put a process in place to keep the DevOps file archive updated.

 

Closing Out the Event

At this point, you’ve completed the Root Cause Analysis and identified the core issue causing the reported performance degradation. Since you’ve addressed the root cause this issue should not reoccur again. However, you can’t stop here – there are two follow-up steps that are critical to your future success as an engineering organization –

 

  1. Document the issue
    I like to use a centralized wiki, like Atlassian’s Confluence, to capture my organization’s knowledge in a central repository available for the entire engineering team. Here I will document the entire event, what was reported by the user, the performance information we captured, the RCA process and the end result – how and what we did to prevent this from happening again. Using tools like Opmantek’s opEvents, I can then relate this wiki entry to the server, router, interfaces, and ifOutUtil event so if it reoccurs future engineers will have this reference available to them.

 

  1. Follow-Up
    The root cause has been identified, a remediated put in place, and a process developed to preclude it from happening again. However, this doesn’t mean we’re done with the troubleshooting. There are times where what we assume is the root cause is, in fact, just a causal factor. If that is the case, this problem will reassert itself again as the solution would only be a workaround for the true problem. The solution is to put a process in place, usually through a working group or engineering team meeting, to discuss user impacting events and determine if they relate to either open or remediated RCA processes.

 

What I’ve presented here is a simplified root cause analysis process. There are additional steps that can be taken depending on the type of equipment or process involved, but if you start here you can always grow the process as your team absorbs the process and methodology.

 

Best,

Uncategorized