Thursday, February 15, 2018

K For Troubleshooting

People think of Kibana as this awesome data visualization and exploration tool.  What does that even mean?  Considering the breadth of logs that can be fed into Kibana, that can mean many things.

Today, I'm going to explore a real use that may not be normally considered.  Troubleshooting.

Fortigate VPN tunnels, for example, have fairly explicit error logs.  If you aren't used to reading them, they can be annoying to understand.  For example, "vpn SA peer proposal does not match local policy" - in other words, "Hey, your firewall rules may be blocking this traffic."  At least some are easily understood, like "probable preshared key mismatch", for example.

If you have these logs going into the ELK stack, you can use Kibana to find these errors for you, so all you would have to do is look at a Visualization or Dashboard when you arrive at work and periodically throughout the day - fix the VPNs before anyone even knows there is a problem and have an awesome day - not having to fight those fires when some random person mentions them.

In order to show only the down VPNs, on the Discover page, I showed only the firewall logs, and did a search for "probable preshared key mismatch".  I saved that search.

When I created the visualization, I chose the option to create the visualization from a saved search, and selected the "Probable Preshared Key Mismatch" saved search.

I used a data table because if you're working in a large environment, there might not just be a couple of VPN tunnels down, there could be a lot of them.

For the metric, I used count - this tells the number of times that this error was seen per bucket.

For the bucket, I used the Terms bucket - VPNDeviceName.  For the sub-bucket, I used the Terms bucket - VpnTunnelName so that we knew which specific tunnels were down.  (No sense in fixing every tunnel on the device if only one is down.)  These make up the columns in the data table.

I tested the visualization by changing the time frame from the last fifteen minutes to the last day.  (If there weren't any down in the last few days, change the time frame to the last few days - trust me, they go down quite a bit - you will eventually see at least one down.)  Sure enough, it showed VPN tunnels that had been down in the last day because of a "preshared key mismatch".

Then I did the same steps for the other common errors that happen when VPN tunnels go down.

If they ever change the error messages, I will have to change these, so if there is a better way to do it, please let me know.

Once I saved the visualizations, I saved them to a Dashboard so that I could easily see what was down and why.  This saved a lot of troubleshooting time.

Another awesome thing about this:  If you change this visualization a small amount, it can be used as a metric to show how often vpn tunnels are down and how often an error occurs.  It can be used to find if these vpns going down is a symptom of an even larger problem.

What other ways have people found to use Kibana?


No comments:

Post a Comment