03 April 2020
Identify And Remedy a Failing Web Server
A customer of ours reached out to us recently to help them solve and potentially reduce the outages they were experiencing to their public website. The first step to help remedy this situation was to identify the root cause of the fault.
Digging into the logs, we were able to identify there had been an accidental (perhaps) Distributed Denial of Service (DDoS) attack produced by around 1200 IP Address crawlers that overloaded both the web server and the application, requiring a server reboot. The resolution for this singular problem was to block that IP Address range to prevent this from occurring again. This, however, was only a partial solution, as this could happen again from a separate range.
This is where the power of Opmantek software began to shine.
Firstly, the engineering team must shift their mindset from a reactive one to being proactive; identify the issue before it becomes a problem and take automated action to prevent an outage. Dependent on how your network is set up, your staffing situation and personal preferences, you may tackle this issue in a variety of different ways.
There are several methods that can be implemented to identify the root cause of the service impact. From NMIS, you could run a service check on the web server that looks to identify if the quantity of connections exceeds a present threshold. You can test the number of open connections on the web server with a command such as;
netstat -ntu | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n
One step further and we can use a combination of NMIS and opTrend to monitor for a sudden increase in CPU/memory utilization on the server and raise an event from there.
Once the event condition is satisfied the next step is to identify the attack vector and remediate. In this case opEvents could retrieve and parse the Apache logs, identifying the IP Address range, then instruct opConfig is reconfigure the necessary firewalls and applications to block them. Nick Day, Opmantek’s Senior Network Engineer in Asia-PAC, helped another customer by leveraging automated remediation; you can find out how in this blog.
Not comfortable with this level of automation? Once the event is properly identified, engineers could be notified of the situation and using opConfig’s Virtual Operator reconfigure the firewalls/applications to block the DDos attack and restart any services/applications/servers all without giving those operators command line access or sudo/root privilege.