Event Correlation with opEvents

As a support engineer with Opmantek, I work with many organizations that monitor thousands of devices across their networks. In complex network environments, thousands or even millions of events are generated in a short period. These events range from critical to informational, identifying and understanding the two is key to keeping a network running efficiently.

Looking through event logs is tedious, and with many events, it’s easy to miss the critical ones. Engineers have told me they stopped looking at event notifications because there were so many happening, they became nonchalant about them. Opmantek’s event management solution, opEvents not only reduces event spam but, can also be used for effective time and event management.

A team of engineers at a large organization were being bombarded by events from hundreds of machines during their regularly scheduled Windows update period. This team ignored event notifications during this time since they occurred so frequently. However, at the same time, they had multiple notices that a group of servers along with their services had gone down. The event logs were indicating that this was happening, the IT staff were notified but, since it occurred during a typical busy event period, they were ignored. As a result, these servers stayed down until someone finally noticed the event hours later. This downtime resulted in lost revenue for the company and some very unhappy managers. 

It was discovered that this problem occurred due to a router that stopped working. The team looked for a solution and came upon Opmantek’s opEvents. With opEvents, your organization gains the ability to sort and correlate multiple events from various sources into a single event. This reduces event spam and clutter to help your team quickly identify which events are important and which are not. opEvents will intelligently analyze, sort, and correlate multiple events across various sources into a single event, reducing noise before any alert is created. This team of engineers can now quickly identify not only when a router is entirely dead but, also see if any router is underperforming preventing any future downtime, making the team more proactive.

The team of engineers in the example above discussed how opEvents could be used to prevent a situation like this from occurring again. They came up with an event correlation rule to notify them in similar cases. 

To create this type of correlation rule, start by navigating to the conf directory of your opEvents install and creating an entry in EventRules.nmis.

A simple event correlation rule consists of:

  • An event name, specifying the name of your newly created event.
  • A list of event names that are the events desired for correlation
  • A minimum count of events that have to be detected to trigger the rule
  • An optional list of groupby clauses. These define whether the count is interpreted globally for all named events, or separately within smaller groups.
  • An optional enrich clause. This adjusts the content of the newly created event.
  • Last a window parameter, which defines the time window to examine for the event.

An example event correlation rule is shown below:

‘3’=> { name => ‘Customer Outage’, events => [“Node Down”,”SNMP Down”], window => ’60’, count=> 5, groupby=>[‘node.customer’], # count separately for every observed value of customer enrich=>{priority => 3, answer => 42}, # any such items gets inserted in the new event }, The example shows an event correlation event rule indicating that when the events “Node Down” and “SNMP Down” are triggered within a 60-second window, separate them into per-customer groups; if it counts 5 or more events in a group, then create a new event called Customer Outage. This is only one example of a custom event correlation rule. There are many more examples, use cases, and features that are discussed more on our opEvents Wiki page.

Using this Event Management tool will reduce event spam allowing your team to notice critical events that need action quickly. Important events will be harder to overlook during event storms. Redundant events can be reduced by automating event handling. Save time, reduce operational costs, gain network insight, and keep your network performing smoothly. Expand your toolkit with these features and more in opEvents and take control of your network.

For more information on Opmantek’s Event Management Tools, other Opmantek solutions, or to schedule a demonstration, please visit our website at www.opmantek.com. You can also email us at contact@opmantek.com.

Uncategorized

Benefits of Developing a Strategic NOC Service

Integrating automation is a crucial step in developing your IT department into a beneficial business contributor to a company. However, automation alone will not achieve this; there also has to be a shift in ideology towards improving the user experience inside the business.

When a traditional Network Operations Center (NOC) is present, the fault response is reactionary, and monitoring focuses on equipment state and roles are usually split between fault resolution and routine maintenance. Pressure on NOCs has never been higher, increasing the requirements of greater network performance while simultaneously reducing downtime puts increased stress on a NOC.

By using a strategic NOC model, the stress implications can be significantly reduced. A strategic model will focus on improving the collaboration between all lines of business, increasing user satisfaction and ensuring that there is an end-to-end quality of the network. This model looks into the application performance rather than the equipment performance. A vital example of this may be that the internet is currently connected, but Office365 is down, this will affect the user experience and decrease productivity despite the hardware states being unaffected.

To facilitate this transition to a strategic NOC, automation is required with the goal to increase user experience (UX) not save on overheads; the emphasis is on improving the UX which leads to increased productivity. This can be exemplified by Visa’s 75% reduction in time to resolve incidents, and JPMorgan Chase’s 75% first call resolution rate (Reference). Both those figures were attained by utilising the core principles that a strategic NOC operates on.

To find out more about these principles, understand how to develop a service catalogue or to architect a solution for fast client on-boarding register your interest at our free webinar (Link).

Uncategorized

Open-AudIT Helps Solve Your Software Asset Management Needs.

This was the phrase that started it all, this forced Open-AudIT founder Mark to develop the Open-AudIT software. Fast-forward almost 20 years and similar questions are still being asked in many organisations worldwide. What has changed though is the process of acquiring this information, 20 years ago Mark drove to each location and manually counted each install. Today to get this same information, a report can be run and it will take a few seconds. The more proactive user will have this report scheduled and waiting in an inbox whenever desired.

The goal extends further than a counting exercise, there is now a lot more at stake. Formerly, software licensing was the number of installs, but this process has expanded and become more complicated. Gartner has presented guidelines to leverage the current software licensing that is in place using Software Asset Management.

The first steps, however, are fundamental to what Open-AudIT does as an open source program. Device Discovery and Inventory Management are two core principles behind Open-AudIT and they also coincide with the first two steps in minimising your software licensing spend.

Uncategorized

The Importance of Network Visibility in Response to The Internet of Things

The Internet of Things (IoT) has led to many businesses capitalising on the computational potential and the increase in data available in everyday objects. The breadth of devices with internet connectivity has been increasing exponentially, CEB (CEBglobal – IoT Security Primer) suggests that the number of connections will grow from 6 million in 2015 to 27 billion by 2025. This increase has led to many new products and many new vendors operating in a market that can be vulnerable to catastrophic attacks. They continue by saying almost 40% of businesses believe that Poor Visibility and Understanding is their leading risk management challenge.

The underlying problem with a network that is considered to have poor visibility is the limited ability to discover everything that is connected to it. NMIS can manage any device that has an IP address, so if it is connected to your network, directly or indirectly, NMIS will know.

With the evolution of devices, there should be equal to greater sophistication in the understanding practices that are used to monitor devices. NMIS collects information from any device on your network and by using the ‘sysObjectId’ variable, it can attribute a vendor to the device from the Enterprise list. The list of vendors is continually expanding, you can peruse the most common list here. However, the true functionality of NMIS is the ability to control new vendors. This process is better explained – Here!

The increased visibility combined with custom thresholding using NMIS, there will be greater control over your network. Users of NMIS will be familiar with SNMP and device modelling, but there are more custom controls that are available. Watch Keith Sinclair (Opmantek CTO) present a webinar that walkthroughs the use of MIBs for custom function, device modelling and custom thresholding. This webinar is located – Here!

 

Here at Opmantek, we are constantly looking for new ways to help your workday. If you have any feature requests, webinar topics or ideas you would like to see get developed, don’t hesitate to reach out.

 

Uncategorized

20 Years of Open-AudIT

A long, long time ago, in a town far, far away, I used to work for a financial institution. A small financial institution. Quite small. As in no IT management software small. As in if we wanted to update our desktops, we had to write a batch script and copy it “by hand” to individual devices and run it one at a time.

Once upon a time, my manager approached me and asked: “How many installs of MS Office do we have?”. I could not reliably answer the question, so I set about finding out how I would find out. At the time Microsoft had a product called SMS Server. Its purpose was to manage your Microsoft Windows PCs. It was also expensive. Well, it was expensive for a small financial institution. Expensive enough that my manager denied the funding and put me in a car to drive from north to south and record by hand the MS Office installs on 100 PCs across 12 branches and 200 kilometres. Good times!

I’ve always been the kind of guy who likes to write code. I think I first wrote some basic back in about 1982. Damn, I’m showing my age now! Obviously, I was thinking – well, if Microsoft can retrieve the information, then how? How are they doing that? That lead me to VBScript and WMI. For our Windows NT machines, these were optional components, but for our new Windows 98 machines, it was built in, yay! Yes – Windows NT and 98. Things are a little different now, but back then a lot of businesses looked at IT as a simple expense that they didn’t want. Hence as little money as possible was spent on it. Windows NT and 98 it was. And no management software for you.

OK, so I found VBScript and WMI. So what? I somehow need to write a script to retrieve details from PCs and actually store it somewhere. The obvious answer is in a database. We were a Microsoft shop, so SQL server. Uh oh – that costs money. No way. Funding denied. Sigh. Well, guess what? Further research turned up this software called “Open Source”. I could have a web server, a database and even an entire operating system FOR FREE. What? What is this voodoo? Oh, and the kicker – it would run on an old desktop PC we had retired. Call me sold.

I was so enamoured with the idea of open source that when requesting the project approval I stated that the code should be licensed under an Open Source license. I would write it by night at home and use it at work. The copyright would stay with me, but the business would benefit from having a tool to be able to list what software was on our machines. It would cost the business $0. Project approved!

And so was born WINventory. Windows Inventory. It was designed first and foremost to retrieve details from Windows machines. Along the way came a name change to Open-AudIT, a healthy community, the ability to audit network devices (routers, switches, printers, etc) as well as computers running various operating systems (Windows, Linux, MacOS, AIX, Solaris, etc). Open-AudIT has grown and grown.

We added the ability to run reports on the data. Even to make your own reports. To “discover” a network as opposed to running the audit scripts on individual PCs and so much more.

Today, almost 20 years later, I couldn’t be more proud of how far this little spare time project has come and what we’ve achieved. Nowadays I work for Opmantek and develop Open-AudIT for a full-time job. Since arriving at Opmantek, Open-AudIT has gone from strength to strength and shows no signs of slowing down. Indeed we have so many ideas that I don’t know how I’m ever going to realise them all!

So many ideas, so little time.

So that’s how Open-AudIT came to be. We’re not slowing down so get in, sit down, shush up and hang on!

Onwards and upwards.

Mark Unwin.