7 Steps to Network Management Automation & Engineer Sleep Insurance

Quietly, somewhere in an office downtown, bearings designed to last for 25k hours have been running non-stop for over forty-three-thousand. The fan was cheaply made by machine from components sourced over several years across a dozen providers. It sat boxed for weeks before it was installed in the router chassis, which itself was boxed-up. Two months at sea, packed tight in a shipping container, then more months bounced around and shuffled from truck to warehouse, and back to a parcel delivery. Finally, the device was configured, boxed and shipped to its final installation point. Stuffed into a too tight closet with no air circulation this mission critical router been running non-stop for the past five-years. It’s a miracle really that it worked this long.

Fan speed was the first thing to be affected by the bearing failure.

Building friction on the fan’s impeller shaft caused the amperage draw to increase to compensate and maintain rotational speed. When the amperage draw maxed out, rotations per minute (RPM) dropped. With the slower fan speed came less airflow, with lower airflow the chassis temperature increased.

Complex devices, like routers, require low operating temperatures. The cooler it is, the easier it is for electrons to move. As the chassis temperature increased the router experienced issues processing the data packets traversing the interfaces. At first it was an error here or there, then routine traffic routing ran into problems and the router began discarding packets. From there things got much worse.

It’s late Saturday evening and your weekend has been restful so far. A night out with your significant other, a movie and dinner. It’s late now and you’re ready for bed when your phone chirps. The text message is short;

Device: Main Router

Event: Chassis high temperature with high discard output packets

Action Taken: Rerouted traffic by increasing OSPF cost

Action Required: Fan speed low, amperage high. Engineer investigate for repair/replacement.

A fan went bad, what’s next?

The system had responded as you would – it rerouted traffic off the affected interface preventing a possible impact to system operation. Adding a note to your calendar to investigate the router first thing Monday morning you turned in for a good night’s sleep.

Our Senior Engineer in Asia-PAC, Nick Day, likes to refer to Opmantek’s solutions as “engineer sleep insurance”. Coming from a background in managed service providers I can appreciate the situation. Equipment always breaks on your vacation time, often when the on-call engineer is as far away as possible, and with little useful information from the NMS. This was a prime scenario we used when building out our Operational Process Automation (OPA) solution.

Building a Solution

Leveraging the combined ability of opTrend to identify operational parameters outside of trended norms, opEvents correlates events and automates remediation. With the addition of opConfig configuration changes to network devices are then able to be automated. Operational Process Automation (OPA) builds on this statistical analysis and rules-based heuristics, to automate troubleshooting and remediation of network events. This in turn reduces the negative impact on user experience.

Magicians never reveal their secrets
but we’ll make an exception.

Now let’s see how this was accomplished using the above example. At its roots opTrend is a statistical analysis engine. opTrend collects performance data from NMIS, Opmantek’s fault and performance system and determines what is normal operation. Looking back over several weeks, usually twenty-six, opTrend determines what is normal for each parameter it processes. It does this hour by hour, considering each day of the week individually. So, Monday morning 9-10am has its own calculation, which is separate from 3-4pm Saturday afternoon. By looking across several weeks opTrend can normalize things like holidays and vacation time.

Once a mean for each parameter is determined opTrend then calculates a statistical deviation for the parameter and creates a window of three standard deviations above and below the mean. Any activity above or below these windows triggers an opTrend event into NMIS. These events can be in addition to those generated by NMIS’s Thresholding and Alert system, or in place of.

In the example above, opTrend would have seen the chassis temperature exceed the normal window of operation. Had fan speed and/or amperage also been processed by opTrend (it is not by default but can be configured to be if desired) these would have reported as a low fan speed, and high amperage).

This event from opTrend would have been sent to NMIS, then shared with opEvents for processing. A set of rules, or Event Actions, looked for events that could be caused by high temperature; often related to interface packet errors or discards. With wireless devices (WiFi and RF) this may affect signal strength and connection speed. A similar result could be handled using a Correlation Rule, which would group multiple events across a window of time into a new parent event. Both methods are relevant and have their own pros and cons.

opEvents now uses the high temperature / high discards event to start a troubleshooting routine. This may include directing opConfig to connect to the device via SSH and execute CLI commands to collect additional troubleshooting information. The result of these commands can have their own operational life – being evaluated for error conditions, firing off new events and themselves starting Event Actions.

Let’s review the process flow:

  1. NMIS collects performance data from the device, including fan speed, temperature and interface performance metrics.
  2. opTrend processes the collected performance data from NMIS and determines what is normal/abnormal behavior for each parameter.
  3. Events are generated by opTrend in NMIS, which are then shared with opEvents.
  4. opEvents receives events from opTrend identifying out of normal temperature and interface output discards. These events are then correlated into a single synthetic event, given a higher priority, and evaluated for action
  5. An Event Action rule matches for a performance impacting event on a Core device running a known OS. This calls opConfig to initiate Hourly and Daily configuration backups, then execute a configuration change to increase the OSPF cost on the interface forcing traffic to be rerouted off this interface.
  6. opEvents also opens a helpdesk ticket via a RESTful API, then texts the on-call technician with the actions taken, and recommended follow-on activities.
  7. Once traffic across the interface drops the discards error will clear, generating an Up-Notification text to the on-call technician.

 

This is an example of what we would consider a medium complexity automation. It is comprised of several Opmantek solutions, each configured (most automatically) to work together. These seven solutions share and process fault and performance information, correlate resulting events, apply a single set of event actions to gather additional information and configure around the event. When applying solution automations, we advocate a crawl-walk-run methodology where you start by collecting troubleshooting information (crawl), then automate simple single-step remediations (walk), then slowly deploy multi-path remediations with control points (run).

Contact Us & Start Automating Your Network Management

Contact our team of experts here if you would like to know about how this solution was developed, or how Operational Process Automation can be leveraged to save on manhours and reduce Mean Time to Resolve (MTTR).

Uncategorized

How to Manage Capacity, Before it Becomes a Problem.

Capacity Management is the proactive management of any measurable finite resource.

This blog will help you with a simple to follow outline on how to properly manage capacity, so if you ever have to resolve capacity issues, you are ahead of the curve and ready to implement remediation.

Capacity management has been considered by many as difficult to achieve. But all worthwhile achievements take discipline to execute and accomplish. So, with careful consideration, monitoring and planning you can ensure that it becomes manageable and deliverable.

Don’t forget that as part of any new deployment or upgrade, and as budget allows, additional demand should be incorporated into the design, with additional capacity ready to service the new capacity peaks. The new peak load is accounted for and new baselines are created.

Analysis Paralysis

The overall concept is that you don’t create reports just to create reports. People might read them once and never again. But as it’s automated, they will continue being sent and remain unopened, filtered or archived. This is not the result you want.

The behaviour you want to drive is for people to use your reports. So, you create reports that drive actions. For example, node health reports can provide checklists to drive daily troubleshooting, flag maintenance check-ups, apply upkeep maintenance or repair of devices. Use daily event reports to help the engineering team understand what the normal background noise and static is across your network or to drive a cleanup. Then of course weekly or monthly reports. For example, a WAN/interface report to support bandwidth and equipment investment might only need to be produced monthly, but a faster growing capacity consumption resource should be produced weekly.

Detecting capacity issues through threshold management.

The problem with capacity issues is that they can present themselves in so many different ways, with the result that something isn’t working the way it was, or should be. Just like what I talked about in my blog on bandwidth congestion , a user will report that “some application” doesn’t work like it did yesterday, a capacity threshold alarm has escalated. If you want to learn about root cause analysis, check out Marks video here –> MARKS WEBINAR.

Using Opmantek Products to manage capacity

Add your devices to NMIS (and while you’re at it, ensure that you have a naming convention to follow, have all your SNMP done and your network documented)

  1. IP, Name and Community String
  2. Assign roles to devices (use the in built Core, Distribution, Access)

Preparing Visibility

  1. Set up regular reports using opReports
    1. If you manage a network choose the network reports
    2. If you manage servers use the capacity report
    3. If you manage servers and networks do steps a + b
    4. Set up the scheduling – Have them emailed once a week in time for your planning and performance review session.
  2. Set up capacity Dashboards, Use TopN views in opCharts
    1. Add TopN and Network Maps to your view (good practise)
    2. Create charts for your most important resources

 

Simple Alarming and Notifications

  1. Enable notifications for critical resource capacity issues – Start with Critical and Fatal only out of this list Normal/Warning/Minor/Major/ Critical/Fatal.

Add more later as you gain insight.

  1. Set up email notification to the right teams based on the Role (Core, Distribution Access) or Type of device (Server, Router, Switch) devices for Threshold events to be sent.

Trending – for predictive capacity planning

  1. Enable opTrend to find anomalies in usage (events) and resources which are continuously trending outside of normal (Billboard)
    1. Notify on critical opTrend threshold events.
    2. Review opTrend Top of The Pops Billboard at your regular capacity review meetings.

Simple steps when managing capacity issues as incidents.

While not ideal, issues/incidents seen at the helpdesk could potentially originate from a change that took place on the network or in the environment. In a real world, even the best change management implementation or outage may cause a capacity issue somewhere and trigger an alarm.

Ask. What has changed? Has something in the environment changed?

Typically a capacity threshold breach is an indicator of:

  1. A new service added?
  2. A new demand?
  3. A network change?
  4. Some other change?
  5. A finite asset reaching a predetermined capacity

Approaches to Baselining for Monitoring and Support:

Look at all your resources and review and categorise your resource types, .e.g Internet Connections, Site links etc.  For each category conclude some baseline usage levels as percentages (Fatal , Critical, Major etc) which will be your starting baseline. It is critical to know your baseline as all your threshold alarms will be triggered at the levels you set and your Notifications of Threshold Alarms want to only be for the more serious alarms. You don’t want to “cry wolf.”

Consider grouping your resources, for example: Core, Application, DMZ, Edge, Branch, Internet Links, General WAN etc.

And within each group, consider the following resources you want to monitor:

CPU, Memory, Bandwidth Utilisation

Start by using general thresholds for each based on the peak demands you have seen.

These are your proactive warnings that will send an alarm to your management platform. You may want to set some escalation rules for the resource for example:

85% – 95% → Major → Alarm Notification (business hours) → to the capacity team

>95%+ → Critical → Alarm Notification (24×7) → helpdesk/NOC

Using the trend analysis provided by opTrend, you can identify very Anomalous usage (it’s low when it should normally high at that time of day) or pro-actively look at resources consistently trending up or down vs their normal levels. Hence ahead of time we can start reviewing the resource for appropriate modification (upgrade, downgrade, offloading work etc). As the network continues to grow and support new services, the baseline will change over time (sliding baseline), thus capacity issues may “creep up” on you as alarm thresholds may not be breached all the time to send an alert. It is important to look at the baseline “rate of change” over time as well to determine capacity needs (ex. 10% change over a one week timeframe).  When planning to increase capacity, be sure to allow for the procurement and provisioning time.

I mentioned the sliding baseline and tracking rate of change of the baseline so the capacity issues don’t “creep up”

Uncategorized

Performance & Fault Management for MSPs & Enterprise Class Business

The primary objective of fault management and performance management for MSPs (managed service providers) and enterprise-class businesses is to reduce downtime, so the quicker a network manager can identify a network error, the better. In this blog, we’ll provide a comprehensive overview of how you can optimise your performance and fault management services.

Packaging performance and fault monitoring as a service

In network management services, it’s vital to be proactive in order to maintain user satisfaction, so it’s a good idea to have separate requirements for your remote management, such as patching and remote desktop services.

When you’re offering your services to clients, it’s important to clearly define the assistance you provide in your service-level agreement. Place your focus on how your services can improve reaction times and reduce the amount of time it takes to predict outages and come to a resolution. The pricing of your services should be tied to their value, so ask your customers what the cost of 1 hour of lost network time will be and explain how you can amend it. In order to make sure that your services are always evolving and improving, ensure you review the results of your automated network monitoring services and the actions taken on a weekly then monthly basis.

Handling overlapping IP addresses and multi-tenancy

A case of overlapping IP addresses may be out of your control, but it’s critical for you to prepare for them, both between new and existing clients. Segmenting client data into separate tenants to ensure that your logical firewalls will prevent the exposure of confidential information to your other tenants.

Putting the control back in your hands Opmantek’s NMIS system handles overlapping IP addresses and multitenancy by providing support in the form of Fully Qualified Domain Names (FQDN), IP addresses for devices, metadata tags (such as department and customer) and tables. Opmantek’s system also provides network scalability by using opHA to deploy multiple polling services.

Leveraging trending data to intelligently adjust fault management

IT equipment requirements and functionalities can fall short in the ‘real world’ in comparison to a vendor’s best-case lab; which is why Dynamic Trending has now replaced static thresholds for alerting customers. This has been achieved by understanding what’s normal for each device. Opmantek’s opTrend has the ability to replace static thresholds with what’s normal, creating focused, purpose-driven dashboards by client and task through automation.

Customer portal with customized dashboards

Self-service dashboards reduce client interruptions while providing them with the feeling of control and transparency; for billable customers, it can be an up-sell or a service differentiator. An implementation of opCharts is exposed to the internet via a reversed proxy. Client accounts are then created within them, which can be scripted. Custom dashboards, maps, charts and business services are then assigned to that user, but they can only see the elements you give them access to.

 

Here at Opmantek, we have seen many IT departments transformed through the implementation of our automate network performance and fault management tools. To start making data-lead decisions book a demo here. and speak to one of our engineers about your next performance and fault management projects.

To hear more about how our automated solutions to help optimise your performance and fault management services, check out Senior System Engineer Mark Henry’s full webinar for further insights and downtime reductions.

Uncategorized

La Tele-medicina ha llegado para quedarse.

Por lo regular cuando nos sentimos enfermos o tenemos alguna preocupaciĂłn sobre nuestra salud recurrimos a un nuestro mĂ©dico de cabecera o incluso a los hospitales. Sin embargo, con los avances en la tecnologĂ­a de las comunicaciones nuestro menĂș de opciones para el cuidado de la salud se ha expandido. Con la ayuda de la telemedicina podemos recibir consultas mĂ©dicas donde sea y a la hora que sea sin la necesidad de salir de casa.

Gracias a la telemedicina, podemos consultar nuestros síntomas y cuestiones médicos con un profesional de la salud en tiempo real; pudiendo así recibir un diagnóstico opciones de tratamiento e inclusive una receta, cuando esto es necesario.  Así mismo, los doctores pueden monitorear nuestros signos vitales a distancia para estar al pendiente de nuestra condición.

Existen tres tipos:

  1. Tele-medicina en vivo: TambiĂ©n llamada “interactiva” es cuando los doctores y pacientes se comunican en tiempo real.
  2. Monitoreo remoto de pacientes: permite a los cuidadores monitorear a los pacientes que utilizan equipo mĂ©dico para obtener ciertos signos vitales como presiĂłn sanguĂ­nea, niveles de azĂșcar etc.
  3. Archivo y reenvĂ­o: Es cuando los doctores pueden compartir los datos sobre la salud de un paciente con otros profesionales o especialistas.

 

Una de las tantas consecuencias de brote de COVID-19 en el mundo, es que la tele-medicina dejarå de ser considerada como una herramienta del futuro, ya que ha llegado para quedarse como un recurso alternativo a la consulta presencial.

 

Hoy en dĂ­a, un doctor puede atender a un paciente a distancia gracias a la evoluciĂłn de las telecomunicaciones, lo cual ha traĂ­do como resultado una disminuciĂłn en el nĂșmero de personas en las salas de espera y, por consiguiente, una disminuciĂłn en   las posibilidades de contagio.

 

De esta forma, el avance tecnológico se ha vuelto clave en la industria de salud, mejorando los servicios a través de   esta innovadora forma de atención médica a distancia y generando toda clase de beneficios para todos los actores implicados en ello.

 

Si eres un profesional de la salud y te gustaría evolucionar en tu negocio, recuerda que puedes confiar en las herramientas de Opmantek para dar este paso.

 

ÂĄNo esperes mĂĄs y contĂĄctanos!

Uncategorized

Agile RMM Solutions For MSPs

Remote monitoring and management (RMM) is the process of tracking, monitoring, and managing endpoints for multiple clients. It is mostly used by managed service providers (MSPs) to provide IT services to organisations who outsource their IT requirements. Read on to find out how a self-hosted RMM solution can help MSPs to increase functionality and save on operational costs.

Are you an MSP that wants to replace expensive RMM systems with a better solution?

As an MSP, did you know that you can replace multi-million dollar RMM systems by combining NMIS with opHA and opCharts? FirstWave offers a full-service software solution that is made to scale. Our products can be used in synergy, as a complete solution.

What do our RMM software solution products include?

NMIS

NMIS is one of the world’s most popular network management systems. Manage anything at any scale. Extend NMIS with our modules and increase your performance, awareness and control.

opHA

opHA allows you to boost the performance of applications and deliver high scale and high availability environments, which includes the geographical distribution of the system and overlapping IP address ranges.

opCharts

Featuring dynamic charting, custom dashboards and a RESTful API to visualize NMIS data and more, opCharts provides a single pane of glass through which you can view all managed customer equipment. This allows engineers to drill down from a single device in a remote location, yet still enabling customers to view their own sites privately and in the moment.

opEvents

opEvents effectively helps to reduce the impact of network faults and failures using proactive event management.

Why should I choose FirstWave over a cloud-based SaaS solution?

In recent times, there has been a shift towards software as a Service (SaaS) and one-size-fits-all cloud-based solutions. However, we have found that our customers require flexibility and bespoke solutions that can grow with each individual business. Disappointed by current SaaS offerings, more and more MSPs are now looking for evolved solutions.

Facilitates scalability

As you have the control, scalability potential is naturally increased, to enable your RMM to grow with your business. The scalability of the software allows for your needs to be met in the future, not just at this present moment. In today’s unpredictable business landscape, scalability is essential for success. However, as businesses grow and change, many SaaS providers force their users into unnecessary paid upgrades.

More visibility and control over your network

Opmantek software can be deployed in the cloud or on-premise but because you retain ownership of the database and have access to the source code at the core of NMIS, you have more control over your managed devices and network data. Data ownership is another key security concern for many companies, a concern which Opmantek directly addresses.

Easy to integrate with other services

If you already have multiple different products performing unique functions within your network environment, it is unlikely that you will want to or be able to replace them all at once. To make it easy, at Opmantek our RMM software is easy to integrate for a fully cohesive solution. We offer multiple integration options, including for REST APIs (HTTP(S), batch operations and the information provided in JSON files and CSV forms.

Unmatched automation technology

Our automated network monitoring is above industry standard and allows you to provide the best service possible to clients.

We make it easy for you to increase profitability

You can save money for your MSP, with a solution that grows with and adapts to your business, removing the regular expensive upgrade fees charged by SaaS software providers. As part of the changeover period, we offer a full onboarding service. Your designated team will be there with you along the way, answering your questions and making the transition seamless. Our support services can be easily accessed at any time.

A bespoke solution for your business

If you want to experience a RMM solution that is tailored to your business requirements, you can try it out for yourself with no commitment! Simply opt-in for an FirstWave RMM software demo request to get started.

Uncategorized

How To Fix Bandwidth Issues: Detect, Diagnose & Solve Your Network Congestion

Network bandwidth has always been a precious commodity and given our current circumstances with so many people working from home, many companies have not had the bandwidth they need in the right places. This blog will help you with some strategies on how to detect bandwidth issues, further diagnose those issues, and what actions you can take to relieve those bandwidth issues.

Detecting network bandwidth issues through congestion management.

Most issues related to network bandwidth will present as congestion, that is there is not enough bandwidth to satisfy the demands of the users and applications. Users will report that “some application” doesn’t work like it did yesterday. After you have confirmed the application is up, and the user reports are correct, where do you look next?Check the network:

  1. Monitor the helpdesk cases raised in particular where users are reporting problems with applications across the network. Knowing whether this is from a branch, remote site or from home (will shorten troubleshooting), it is likely to indicate network congestion.
  2. Monitor utilisation of network links and raise alerts when bandwidth becomes heavily utilised.
  3. Make sure you monitor packet discards and errors.
  4. And finally, monitor Quality of Service (QoS) parameters available in the network device; in particular, you are looking for where QoS caused packet loss.

The first step to detection is to get NMIS installed and let it start collecting data NOW. DOWNLOAD NMIS

Diagnosing Network Bandwidth Issues

What issues are being reported by users about the network or internet speed, is the application slow due to a slow internet connection or is it unusable? For example, is there a problem with voice over IP or video conferencing? Does it occur during file transfers? Are they connected with a wired connection va an ethernet cable or by Wi-Fi? Is bandwidth throttling being used? The more qualified information you get from your helpdesk, the faster you can get to work.By monitoring the network for issues related to congestion, you are ready to start further diagnosis to determine what is causing those issues and look for possible solutions to avoid the congestion firstly or control it secondly.

Depending on the tools available to you, you should have an idea of those causes. For example, putting aside transmission, format errors, or device health issues packet discards will generally be caused by QoS classes dropping packets, so the solution is to refine the QoS configuration to prevent the desired traffic from being discarded.

Depending on the application, the dropped packets will be causing retransmissions if they are using TCP, while voice and video symptoms are voice clipping or slow refreshing video or video and voice not keeping sync.

Depending on the wired connections or wireless devices and operating systems being used, you should be able to see key performance indicators for this, which will be collected by your monitoring system, like NMIS.  For example you could monitor for TCP retransmissions on servers, this would indicate issues with internet bandwidth performance or low bandwidth for those applications.

Using systems like Cisco IPSLA are a great way to monitor for changes in latency or variability in latency (Jitter).  NMIS can collect your IPSLA data, providing graphs as well as alerts when it detects issues.

Monitoring these metrics will guide where you need to look deeper, you might need to collect more detailed information from the devices to determine what the issues are, e.g. looking at command outputs for QoS or interface information to decide what changes are available to resolve the helpdesk reports.

If you identify the QoS Classes which are exceeding their configuration limits with resulting packet loss, you will need to consider changing the bandwidth allocations for those classes, increasing the available bandwidth for voice and video, for example.

HOW TO DIAGNOSE: Use NMIS and opConfig to collect data, which can then be analysed. 

OPA can help with the detection and diagnosis of congestion problems.

Actions to fix network bandwidth problems

Ultimately to fix a bandwidth issue, you should upgrade the overall capacity at the site. If you are not able to upgrade or need to buy time, then implementing QoS features to manage which traffic is less important to the business and have it shaped or dropped during times of congestion.

Contrary to popular belief, QoS does not create more throughput. It does create better “goodput,” with critical applications protected, and applications that are hogging bandwidth, controlled.

Two standard policy options for QoS are shape or police. Policing will ensure bandwidth is never exceeded and drop the offending traffic. Shaping will delay traffic to smooth out the traffic over time. Note that as shaping limits are exceeded, it may result in dropped traffic.

Talk to us about how our solutions can give you the insight you need to make data-based decisions. You’ll reduce helpdesk stress, own your infrastructure all while improving the user experience.

Frequently Asked Questions

What is the main issue related to network bandwidth?

The main issue related to network bandwidth is congestion, where there is not enough bandwidth to satisfy the demands of the users and applications.

This can result in slow or unusable applications, dropped packets, retransmissions, and issues with voice and video quality.

With so many people working from home, many companies have struggled to have the necessary bandwidth in the right places to support their employees.

To detect, diagnose, and fix network bandwidth issues, it is important to monitor network utilization, packet discards and errors, and Quality of Service (QoS) parameters.

How can I detect issues with my network bandwidth?

To detect bandwidth issues, you can monitor helpdesk cases raised by users reporting problems with applications across the network.

You can also monitor the utilization of network links and raise alerts when bandwidth becomes heavily utilized, monitor packet discards and errors and monitor Quality of Service (QoS) parameters available in the network device.

For a better, and more automated approach, installing NMIS and letting devices connected to it start collecting data can also help in detecting network bandwidth issues.

What is NMIS?

NMIS (Network Management Information System) is a comprehensive network management system that assists with fault, performance, and configuration management.

It provides performance graphs and threshold alerting, as well as customizable notification policies with different types of notification methods.

NMIS monitors the status and performance of an organization’s IT environment, identifies faults and assists in their rectification, and provides valuable information for IT departments to plan expenditures and IT changes.

It features a sophisticated business rules engine, automated health live baselining, configurable alert thresholds, policy-based actions, escalations, and planned outage management.

NMIS is customizable, scalable, and has pre-configured out-of-the-box solutions, and can be used by Telco and Internet Service Providers, enterprises, and governments.

It offers community support and has predictable and transparent pricing that scales with the user’s requirements.

How can I diagnose issues with my network bandwidth?

Diagnosing network bandwidth issues requires gathering qualified information from your helpdesk and monitoring the network for issues related to congestion.

Depending on the tools available to you, you can determine the causes of the issues.

For example, packet discards will generally be caused by QoS classes dropping packets, so refining the QoS configuration to prevent the desired traffic from being discarded can solve the issue.

Using systems like Cisco IPSLA can also help monitor for changes in latency or variability in latency (Jitter).

What is QoS?

QoS stands for Quality of Service. It is a set of techniques and mechanisms that aim to ensure that network traffic is prioritized according to certain criteria in order to meet the requirements of different applications and users.

QoS mechanisms are used to manage network congestion, reduce latency, and ensure that important applications receive the necessary bandwidth and resources.

QoS can be used to prioritize different types of traffic, such as voice and video, over other types of traffic, such as file transfers and email.

This is done by assigning different levels of priority to different types of traffic and using mechanisms such as traffic shaping and prioritization to ensure that higher-priority traffic is given preferential treatment.

QoS is particularly important in real-time applications such as voice and video conferencing, where delays or dropped packets can seriously affect the quality of the service.

QoS mechanisms can help to ensure that these types of applications receive the necessary resources and are not affected by other types of traffic on the network.

Overall, QoS is an important tool for network administrators to manage and prioritize network traffic, ensuring that important applications and services receive the necessary resources and perform as expected.

What are the steps to fix issues with network bandwidth?

Actions to fix network bandwidth problems include upgrading the overall capacity at the site or implementing QoS features to manage which traffic is less important to the business and have it shaped or dropped during times of congestion.

QoS does not create more throughput but creates better “goodput,” with critical applications protected, and applications that are hogging bandwidth controlled.

Two standard policy options for QoS are shape or police, with shaping delaying traffic to smooth out the traffic over time, and policing ensuring bandwidth utilization is never exceeded and drops the offending traffic.

It’s worth noting that while QoS can help control bandwidth usage, it does not create more throughput.

However, it does create better “goodput,” with critical applications protected and non-critical applications controlled.

Uncategorized