The Importance of Beta Testing

Beta testing is a vital phase of the software development life cycle here at Opmantek. Our software forms the critical business platforms for Network Management Software and IT Auditing and Compliance for our customers – hence reliability and performance are what our customer demand and have come to expect from us. We want to launch software that is successful, continuing to build upon our hard work and success. We want the user to have the best experience they can, so we need to get rid of the bugs and issues the software may have before it is launched.

This is where the Opmantek Early Adopter Program comes in – we work with beta testers to ensure new products and updated versions are up to scratch.

There are a number of different factors that are checked during beta testing – let us look at usability, performance and quality.

Is the product easily usable?

If you want customers to talk about the software for all the right reasons, then the product needs to have great usability.

To do this, you need to consider your target audience and come up with the solutions that would best suit this audience’s problems.

Through beta testing, you will get feedback about the user experience of your software, and through this, the developer will be able to improve the product to ensure it’s meeting the users’ requests.

How strong is the performance of the product?

During beta testing, the speed and performance of the product will be analysed. While we can virtualise a lab network and run all our various automated testing scenarios, we may not always use it in the way a customer does. It needs to be tested by real users trialing the software in their environment.

Why? Put simply, the product will get to operate on all different hardware with a blend of other software installed all begging for resources. There are many variables, which is why the software needs to be tested by real users – such as you – through the beta phase.

Is the product of high quality?

The overall goal of the beta testing phase is to create a product that is fully functional and of high quality. This involves testing all the features within the product to check that they are all working as they should.

Although it is tempting to add a plethora of features to the software, sometimes it’s best to keep its design simple, as having too many functions can sometimes reduce the overall quality and user experience of the product. Even at the late stage of beta testing, feature feedback can help us decide upon which features are useful, and also ensure that we document any unusual ways the tester applied the feature, so everyone can benefit.

Are there any bugs?

One of the most important steps of beta testing is checking the products for bugs. Officially launching a product on to the market with bugs in it can quickly turn users off the software and vastly reduce the number of users interested in your product.

By determining bugs early on, you can weed them out. Sometimes, one of the benefits is that by uncovering one type of bug, other types of bugs are found too. The earlier these are detected, the better.

As an Opmantek Beta Tester in our Early Acceptance Program, you will play a key role in helping software developers to launch their products. There are many benefits for the tester: you can try out new products before they are officially launched, contribute to innovative products, and be rewarded with an Amazon gift card or access to the new product for a year.

Our current program is open for Open-AudIT Cloud. Beta Testing officially starts on 20th May 2019.

Uncategorized

Top Tips to Achieve Operations Team Buy in For An Automation Project

Gartner projected that “Global spending on robotic process automation (RPA) software is estimated to reach $680 million in 2018, an increase of 57 per cent year over year”.

Currently the biggest adopters of RPA include banks, insurance companies, utilities and telecommunications companies. With these industries being some of the most prevalent users of Opmantek Software to automate their network administration and orchestration, we decided to talk to our partners and customers about their experience in implementing robotic process automation and operational process automation in their businesses.

Here are the top 4 tips they gave for gaining team buy-in for an operations project.

Keep the team focused on their job outputs, not tasks

“We hired you for your communication and problem-solving skills, not for your typing speed”.  When implementing an automation project it is really important that staff who will have their routines impacted understand that the automation will serve them to do the job they were hired to do better.  The reports you used to spend an hour running are now ready to review when you arrive at your desk, giving you an extra hour to do what you do best.

Focus on small, regular, time-consuming tasks

When choosing the first tasks to be automated, keep it simple and measurable, so that staff and management can see tangible results from the exercise.  Daily or weekly reports or tasks that have dedicated maintenance windows such as system upgrades or audits are great places to start.

Get your team to workshop the best practice process steps

By giving the team exposure and input into the work that will be automated it allows them to feel some control over the project.  Running a best practice workshop both benefits the project, in ensuring that the most efficient and effective paths to resolution are implemented for the project and helps the staff to understand exactly what the ‘bots’ will be doing for them.

Let the team have some fun with it

One IT manager shared that within his team, each of the bots that completes a specific process was given a name and soon became part of the work team.  If a process failed the team would laugh and say ‘R2D2 is sleeping on the job again”.  This seemed to help with human-computer interaction quite significantly.

 

Here at Opmantek, we have seen many IT departments transformed through the implementation of our suite of tools to automate regular tasks like software and hardware audits, configuration changes, maintenance routines and IP address allocation and management.  To speak to one of our engineers about your Network automation project contact us here.

Uncategorized

Monitor rising tensions in the global cybersecurity landscape

Escalating tensions between the United States and competing countries are reshaping the cyber-security landscape – with severe implications for Australian businesses and government organisations. The United States set down its case in a cyber strategy document released in late 2018 and has escalated its measures from there. According to the document, the United States had adopted a vision of ‘a shared and open cyberspace for the benefit of all,’ but its adversaries had conducted economic espionage and malicious cyber activities that had damaged individuals, commercial and non-commercial interests and governments across the world.

The document listed Russia, China, Iran, and North Korea as challenging the United States in cyberspace, ‘often with a recklessness they would never consider in other domains.’ Since then, the United States Government has issued executive orders to shore up a cybersecurity workforce short of 300,000 practitioners and more importantly, to declare a national emergency and bar United States companies from using foreign telecommunications equipment made by companies it considers a national security risk. The initiative is already causing consternation among local users of smartphones from one supplier – a leading United States-headquartered company is reportedly restricting the supplier’s access to its applications and operating system.

Businesses need to monitor and respond quickly to these measures and the evolving security landscape. Companies exposed to the United States will need to evaluate the risks of doing business with firms targeted by current and future executive orders. Given Australia’s close relationship with the United States, local organisations will also need to remain aware of the ongoing risk of cyberattack or espionage from individuals or groups acting on behalf of ‘cyberspace challenger’ countries.

Some of the steps businesses should take include ensuring cyber-security planning including incident response is up to date and people know their roles and responsibilities; checking that anti-malware products are installed and up-to-date; and that employees are aware of potential cyber-threats and the steps they need to take to minimise risk. If you would like to learn more, please contact sales@firstwavecloud.com.

Uncategorized

Building a Topological Diagram With opCharts

Prerequisites

Please ensure either opCharts or the Opmantek Virtual Machine are installed to use the below feature.

Overview

The Topological Diagram style of Map allows you to dynamically build live, informational diagrams based on the logical Layer 2 connections devices have.

A menu listing of all available Maps can be accessed by selecting Views -> Maps from the opCharts menu bar.

Creating a New Topological Diagram

Join Paul McClendon, an Opmantek Support Engineer, as he demonstrates how to create a topological map in opCharts.

For the letter lovers amongst us

A Topological Diagram must be created before it can be used or added to a Dashboard. To create a new Map, Click the blue button with the “+” icon in the top-left corner from the Maps screen (Views -> Maps).

newmap - 700
Next, select Topological Diagram from the Map Type drop-down located in the top-left corner.
Topological Map - 700

Assign your Topological Map a Map Name – This must be unique; no 2 maps can have the same Map Name. You can also provide a Description of your Map. This will be displayed on the Maps View page, and also when adding a Map to a Dashboard.

Options

Title – This is what will be displayed in the Component window’s title bar.

Background Image – Disabled for Topological Diagram style Maps

Layout – Provides auto arrangement of the icons and their connections. Each has pros and cons, depending on the network architecture, number of devices, and types of connections found

Apply Layout – Applies the currently selected layout to the Topological Map

Auto checkbox – When checked will automatically apply the selected Layout option to the mao and continue to update the layout as new nodes, neighbours, or subnets are added. Checked by default.

Add Node

The Add Node button allows you to add an individual Node to the Map. You may assign a Display Name, separate from the Node’s internal name, or leave this field blank and no label will be displayed. A specific icon may also be assigned or will be auto-selected from the built-in icon options based on the type of equipment.

Link to Map

If set, the Link to Map option will open a new URL when the link is clicked. You can select either a Map on the current server or by selecting Custom use any URL (even to other software/sites). This is especially powerful – allowing you to drill down from a top-level abstract diagram to more in-depth levels of detail.

By default, the Link to Map / Custom option opens the target in the current browser window. However, you can force opCharts to open the link in a new tab/window by enclosing the link URL in double quotes and following it with target=_blank i.e “http://someserver.com//en/omk/opCharts/dashboards/myawesomedashboard” target=_blank

link_to_map (1)

Once the node is added it may be moved around the Map by left-clicking and dragging it into position.

Add Group

The Add Group button allows you to add all nodes contained within a Group at one time. The Display Name field has no effect on the individual nodes being added.

Add Link

Note: Links may be added manually. However, the true power of a Topological Map is in dynamically drawing the connections between devices and subnets. See Building the Topological Map below.

The Add Link button adds a physical connecting line between 2 Nodes or 2 Groups. You can assign the Link a Link Name, which will be displayed within a bordered box at the centre point of the line between the 2 Nodes. These links are convenient ways to show relationships between components, without linking those relationships to specific interfaces or data patterns. A link can be deleted by right-clicking on the link line and selecting Delete from the pop-up menu.

The-Link

Add Interface Link

The Add Interface Link button allows you to add an interactive Link representing an interface’s flow traffic between 2 Node or Group icons. Select your Link Source, the Node providing the Interface, the specific Interface that handles the link, and the Link Destination.

Interface Link - 700

The resulting link will be anchored to the 2 Nodes/Groups and display both the inbound and outbound link speeds as a percentage of the available interface speed. The link is also hinged in the middle, allowing some modicum of adjustment for readability.

link_sample (1)

Note: Link sources and Node/Interface is not required to be the same, the GUI fills out the node name as a suggestion as it’s the most likely scenario.  If required, the link source and/or link destination can be left blank and the endpoint will remain open for moving to a convenient location.

Add Placeholder

The Add Placeholder button allows you to add an icon to the Network Map that is not linked to a specific Node or Group (like “the Cloud”). Similar to both Nodes and Groups you can assign a Display Name, select a Display Icon, and Link the icon to another Dashboard.

Building the Topological Map

While you can manually add links and Interface Links to a Topological Map, the true power lies in using the logical information the network contains to create those connections.

Add Neighbours

Right-click on a node and select Add Neighbours.  Neighbours are direct connections found between devices but can also be virtual machines hosted by a VMware host.

add-neighbors

Add Subnets

Right-click on a node and select Add Subnets.  Subnets are a logical connection between nodes and not a direct “physical” connection but help to organise and understand logical layouts.

Editing a Node

Nodes on the Topological Map can be edited. Simply return to edit mode (open the Map by selecting Edit from the Map view or by clicking the Edit button in the top-right corner of the Component window) then RIGHT-click on the Node you want to edit, select Edit from the pop-up menu.

Uncategorized

Getting Metrics, Reachability, Availability from your Enterprise Network Monitoring System

Managing a large complex environment with ever-changing operational states is challenging, to assist, NMIS as a Network Management System which is performing performance management and fault management simultaneously monitors the health and operational status of devices and creates several individual metrics as well as an overall metric for each device. This article explains what those metrics are and what they mean.

Summary

Consider this in the context that a network device offers a service, the service it offers is connectivity, while a router or switch is up and all the interfaces are available, it is truly up, and when it has no CPU load it is healthy, as the interfaces get utilised and the CPU is busy, it has less capacity remaining. The following statistics are considered part of the health of the device:

  • Reachability – is it up or not;
  • Availability – interface availability of all interfaces which are supposed to be up;
  • Response Time;
  • CPU;
  • Memory;

All of these metrics are weighted and a health metric is created. This metric, when compared over time, should always indicate the relative health of the device. Interfaces which aren’t being used should be shut down so that the health metric remains realistic. The exact calculations can be seen in the runReach subroutine in nmis.pl.

Metric Details

Many people wanted network availability and many tools generated availability based on ping statistics and claimed success. This, however, was a poor solution, for example, the switch running the management server could be down and the management server would report that the whole network was down, which of course it wasn’t. OR worse, a device would be responding to a PING but many of its interfaces were down, so while it was reachable, it wasn’t really available.

So, it was determined that NMIS would use Reachability, Availability and Health to represent the network. Reachability being the pingability of device, Availability being (in the context of network gear) the interfaces which should be up, being up or not, e.g. interfaces which are “no shutdown” (ifAdminStatus = up) should be up, so a device with 10 interfaces of ifAdminStatus = up and ifOperStatus = up for 9 interfaces, the device would be 90% available.

Health is a composite metric, made up of many things depending on the device, router, CPU, memory. Something interesting here is that part of the health is made up of an inverse of interface utilisation, so an interface which has no utilisation will have a high health component, an interface which is highly utilised will reduce that metric. So the health is a reflection of load on the device and will be very dynamic.

The overall metric of a device is a composite metric made up of weighted values of the other metrics being collected. The formula for this is configurable so you can weight Reachability to be higher than it currently is, or lower, your choice.

Availability, ifAdminStatus and ifOperStatus

Availability is the interface availability, which is reflected in the SNMP metric ifOperStatus if an interface is ifAdminStatus = up and the ifOperStatus = up that is 100% for that interface if a device has 10 interfaces and all are ifAdminStatus = up and the ifOperStatus = up that is 100% for the device

If a device has 9 interfaces ifAdminStatus = up and the ifOperStatus = up and 1 interface ifAdminStatus = up and the ifOperStatus = down, that is 90% availability it is the availability of the network services which the router/switch offers

Configuring Metrics Weights

In the NMIS configuration, Config.nmis there are several configuration items for the these are as follows:
'metrics' => {
'weight_availability' => '0.1',
'weight_cpu' => '0.2',
'weight_int' => '0.3',
'weight_mem' => '0.1',
'weight_response' => '0.2',
'weight_reachability' => '0.1',
'metric_health' => '0.4',
'metric_availability' => '0.2',
'metric_reachability' => '0.4',
'average_decimals' => '2',
'average_diff' => '0.1',
},

The health metric uses items starting with “weight_” to weight the values into the health metric. The overall metric combines health, availability and reachability into a single metric for each device and for each group and ultimately the entire network.

If more weight should be given to interface utilisation and less to interface availability, these metrics can be tuned, for example, weight_availability could become 0.05 and weight_int could become 0.25, the resulting weights (weight_*) should add up to 100.

Other Metrics Configuration Options

Introduced in NMIS 8.5.2G are some additional configuration options to help how this all works, and to make it more or less responsive. The first two options are metric_comparison_first_period and metric_comparison_second_period, which are by default -8 hours and -16 hours.

These are the two main variables which control the comparisons you see in NMIS, the real-time health baselining. These two options will be calculations made from time now to time metric_comparison_first_period (8 hours ago) to calculations made from metric_comparison_first_period (8 hours ago) to metric_comparison_second_period (16 hours ago).

This means NMIS is comparing in real-time data from the last hour 8 hours to the 8 hour period before that. You can make this smaller or longer periods of time. In the lab I am running -4 hours and -8 hours, which makes the metrics a little more responsive to load and change.

The other new configuration option is metric_int_utilisation_above which is -1 by default. This means that interfaces with 0 (zero) utilisation will be counted into the overall interface utilisation metrics. So if you have a switch with 48 interfaces all active but basically no utilisation and two uplinks with 5 to 10% load, the average utilisation of the 48 interfaces is very low, so now we pick the highest of input and output utilisation and only add interfaces with utilisation above this configured amount, setting to 0.5 should produce more dynamic health metrics.

Metric Calculations Examples

Health Example

At the completion of a poll cycle for a node, some health metrics which have been cached are ready for calculating the health metric of a node, so let’s say the results for a router were:

  • CPU = 20%
  • Availability = 90%
  • All Interface Utilisation = 10%
  • Memory Free = 20%
  • Response Time = 50ms
  • Reachability = 100%

The first step is that the measured values are weighted so that they can be compared correctly. So if the CPU load is 20%, the weight for the health calculation will become 90%, if the response time is 100ms it will become 100%, but a response time of 500ms would become 60%, there is a subroutine weightResponseTime for this calculation.

So the weighted values would become:

  • Weighted CPU = 90%
  • Weighted Availability = 90% (does not require weighting, already in % where 100% is good)
  • Weighted Interface Utilisation = 90% (100 less the actual total interface utilisation)
  • Weighted Memory = 60%
  • Weighted Response Time = 100%
  • Weighted Reachability = 100% (does not require weighting, already in % where 100% is good)

NB. For servers, the interface weight is divided by two and used equally for interface utilisation and disk free.

These values are now dropped into the final calculation:

weight_cpu * 90 + weight_availability * 90 + weight_int * 90 + weight_mem * 60 + weight_response * 100 + weight_reachability * 100

which becomes “0.2 * 90 + 0.1 * 90 + 0.3 * 90 + 0.1 * 60 + 0.2 * 100 + 0.1 * 100” resulting in 90% for the health metric

The calculations can be seen in the collect debug, nmis.pl type=collect node=<NODENAME> debug=true
09:08:36 runReach, Starting node meatball, type=router
09:08:36 runReach, Outage for meatball is
09:08:36 runReach, Getting Interface Utilisation Health
09:08:36 runReach, Intf Summary in=0.00 out=0.00 intsumm=200 count=1
09:08:36 runReach, Intf Summary in=0.06 out=0.55 intsumm=399.39 count=2
09:08:36 runReach, Intf Summary in=8.47 out=5.81 intsumm=585.11 count=3
09:08:36 runReach, Intf Summary in=0.00 out=0.00 intsumm=785.11 count=4
09:08:36 runReach, Intf Summary in=0.06 out=0.56 intsumm=984.49 count=5
09:08:36 runReach, Intf Summary in=0.00 out=0.00 intsumm=1184.49 count=6
09:08:36 runReach, Intf Summary in=8.47 out=6.66 intsumm=1369.36 count=7
09:08:36 runReach, Intf Summary in=0.05 out=0.56 intsumm=1568.75 count=8
09:08:36 runReach, Calculation of health=96.11
09:08:36 runReach, Reachability and Metric Stats Summary
09:08:36 runReach, collect=true (Node table)
09:08:36 runReach, ping=100 (normalised)
09:08:36 runReach, cpuWeight=90 (normalised)
09:08:36 runReach, memWeight=100 (normalised)
09:08:36 runReach, intWeight=98.05 (100 less the actual total interface utilisation)
09:08:36 runReach, responseWeight=100 (normalised)
09:08:36 runReach, total number of interfaces=24
09:08:36 runReach, total number of interfaces up=7
09:08:36 runReach, total number of interfaces collected=8
09:08:36 runReach, total number of interfaces coll. up=6
09:08:36 runReach, availability=75
09:08:36 runReach, cpu=13
09:08:36 runReach, disk=0
09:08:36 runReach, health=96.11
09:08:36 runReach, intfColUp=6
09:08:36 runReach, intfCollect=8
09:08:36 runReach, intfTotal=24
09:08:36 runReach, intfUp=7
09:08:36 runReach, loss=0
09:08:36 runReach, mem=61.5342941922784
09:08:36 runReach, operCount=8
09:08:36 runReach, operStatus=600
09:08:36 runReach, reachability=100
09:08:36 runReach, responsetime=1.32

Metric Example

The metric calculations are much more straight forward, these calculations are done in a subroutine called getGroupSummary in NMIS.pm, for each node the availability, reachability and health are extracted from the nodes “reach” RRD file, and then weighted according to the configuration weights.

So based on our example before, the node would have the following values:

  • Health = 90%
  • Availability = 90%
  • Reachability = 100%

The formula would become, “metric_health * 90 + metric_availability * 90 + metric_reachability * 100”, resulting in “0.4 * 90 + 0.2 * 90 + 0.4 * 100 = 94”, So a metric of 94 for this node, which is averaged with all the other nodes in this group, or the whole network to result in the metric for each group and the entire network.

Uncategorized

Understanding the NMIS KPI interface

What is a KPI and why is it relevant it for network monitoring?

Key Performance Indicators (KPIs) were introduced into NMIS to provide insight as to why the health of a node was getting better or worse.  As discussed in the article on NMIS Metrics, Reachability, Availability and Health, NMIS is tracking the health of a node and providing a single number which indicates what the health of a node is, this is called the Health Metric.  To make up the Health Metric, NMIS is tracking many aspects of a node’s health including:

  • Reachability – Node availability or pingability
  • Availability – Interface availability
  • Response time
  • CPU Utilisation
  • Memory Utilisation
  • Interface Utilisation
  • Disk Utilisation
  • Swap Utilisation

NOTE: Not all nodes have disk and swap, so for some nodes these values are blank, e.g. a Cisco Router will have no value for disk and swap KPI’s.

NMIS has a history of being a Network Management System, the generation of the Metrics and KPI’s is something that makes NMIS more than a Network Monitoring System and helps IT professionals by providing better information about their environment to help with their decisions. By giving users more information about devices, troubleshooting or improving the health of devices is much easier.

As of NMIS 8.5G, we started storing the individual KPI scores so that it was possible to see the health metric break down over time.  This is now shown at the top of a node view panel in NMIS8 and looks like the image below.

KPI Scores

You can think of the KPI Scores like a report card, the student (node) has received 10/10 for English (reachability), 10/10 for Maths (availability) and so on. The KPI Scores in the screenshot above come from the polled data and are scored out of the weighted value, this weighted value is a percentage, so in the configuration file, it is 0.1 which means it is 10% or a maximum possible KPI score of 10/10.  The table below shows the configuration value and the resulting KPI Score value.

KPI Item Configuration Item Configured Weighting Maximum KPI Score
Reachability weight_reachability 0.1 10 (10%)
Availability weight_availability 0.1 10 (10%)
Response weight_response 0.2 20 (20%)
CPU weight_cpu 0.2 20 (20%)
Memory weight_mem 0.1 10 (10%)
Interface weight_int 0.3 30 (30%)

Because they are not present in all node types, there are two additional KPI values which overload onto the Memory and Interface KPI values these are, Swap and Disk, these split the weighting of each into half and track that separately, e.g. Interface KPI by default is 30%, so when the Disk KPI is present the Interface KPI gets a value of 15% and the Disk KPI gets a value of 15%.  So the table would like like this when all 8 KPI’s are present, as they are for Linux Servers.

KPI Item Configuration Item Configured Weighting Maximum KPI Score
Reachability weight_reachability 0.1 10 (10%)
Availability weight_availability 0.1 10 (10%)
Response weight_response 0.2 20 (20%)
CPU weight_cpu 0.2 20 (20%)
Memory weight_mem 0.1 x 50% 5 (5%)
Swap weight_mem 0.1 x 50% 5 (5%)
Interface weight_int 0.3 x 50% 15 (15%)
Disk weight_int 0.3 x 50% 15 (15%)

The result is that the maximum KPI Score for a node will be 100 or 100%.

Interpreting Health and KPI Values

So you are looking at the main NMIS dashboard and you see that a node has a Health score of 92.2% as the example below, there is also a red arrow beside that, which is the result of the longstanding NMIS feature for auto baselining, this red arrow is pointing down, meaning that the health now is lower than the last period. So WHY is this node less healthy now than it was before, clicking on the node will reveal the KPI scores and you can start looking at what is changing.

You see this KPI summary again, you can see the overall breakdown of the health metric represented in the KPI values and you can see that the MEM KPI has a red arrow pointing down, the auto baselining is showing us that the Memory score is lower than previously with a score of 2.04 out of a possible score of 5.  If we look at the graph for the last 2 days, we can see that the average value for the MEM KPI is 2.28%, showing us that the memory utilisation has increased a little.
If you want to know WHY the health from the front page is 92.2% we can look at all the KPI values, like the Disk KPI of 10.50/15, CPU KPI is 19.98/20 and SWAP KPI is 4.75/5, we can take 100% and subtract the remainders so:
KPI Item KPI Score Remainder Calculation Health Remainders
Reachability 10/10 10 – 10 0
Availability 10/10 10 – 10 0
Response 20/20 20 – 20 0
CPU 19.98/20 20 – 19.98 0.02
Memory 2.04/5 5 – 2.04 2.96
Swap 4.75/5 5 – 4.75 0.25
Interface 15/15 15 – 15 0
Disk 10.5/15 15 – 10.5 4.5

Adding together the Health Remainder results and subtracting from 100 gives us: 100 – (0.02 + 2.96 + 0.25 + 4.5) = 92.27%

The difference between the result and the displayed numbers are rounding precision.

Conclusion

NMIS KPI Scores are a powerful way to get to the bottom of the health of your infrastructure, they will assist to see where resources are being used and assist to identify operational problems very fast.

Uncategorized