Blogs

   
Michael Petrov
Co-Founder, CEO
1/13/2014
Monitoring Software Auto-Discovery Isn’t all it’s Cracked up to be

Auto-discovery is a known feature in monitoring packages like Nagios, OpenNMS etc. First these packages scan your infrastructure for IP addresses and host names. Then each host is scanned for known ports or SNMP structures based on the device type.  The host can be a network device, a server or even a storage device. Afterwards, the monitoring software automatically builds a profile for the device.
 

The logic works something like this:

1.    I am a monitoring package, starting up on 192.168.1.211 with subnet mask 255.255.255.0 -my IP block – 192.168.1.0/24.

2.      I have default configuration. Let me perform discover my IP block.

a.      Monitoring package pings the IP block and finds “something” on 192.168.1.5.

b.      Tries to understand what is that “something” on 192.168.1.5.

                                                              i.      Scan known ports on that ”something”, get hostname of that something.

                                                            ii.      Looks like it is a Linux server

                                                          iii.      Retrieve all known Linux SNMP counters, enumerate disks etc.

c.       Builds a profile for that “192.168.1.5” such as:

                                                              i.      Host Name

                                                            ii.      Monitoring parameters, CPU, RAM, swapping, iowaits etc.

                                                          iii.      Disks, sizes, current allocation

                                                           iv.      Running processes

                                                             v.      Available ports

d.      Scans next IP to see if it is ping-able. See if any ports or access is available so that a profile can be built against it. 

3.      Go through all configured network subnets discovering all the devices on the network. 

This sounds like a great feature right? Lots of vendors build automatic network maps, graphs this way. Lazy IT guys are just thrilled with the outcome because it's effortless and the reports look pretty.  But if you are a serious Enterprise IT organization and want to have a strong grip on your infrastructure, this feature can play against you.

Here's why:

A serious Enterprise IT organization would know that the real value of monitoring is not merely in the CPU, its Memory, or the Disk parameters that come out-of-box, but in the custom monitoring scripts. These scripts monitor availability of “business functionality” rather than simply affirming that a process is running or a port is available. I am sure that IT professionals see situations left and right where a process is running, the port is available but application is not working. The real goal of an Enterprise IT organization is to see that actual functionality is working correctly. None of the standard software packages have this functionality out-of-box. So virtually, it doesn’t matter what monitoring package to use. You will still have to customize and script around it to really monitor your infrastructure.

Digital Edge’s strong position & approach – We make sure we know what we are monitoring, what is important, what is not important and what the policies are for each device. We don’t want devices magically disappearing and reappearing so we have to know exactly what are the device's dependencies, SLAs, thresholds, important services, and not so important services etc.

Historically, Digital Edge co-manages out-of-box monitoring packages for our clients and integrates them with our Digital Edge Enterprise Monitoring System. In doing this, we’ve observed multiple problems with the default configurations of auto-discovery. For example, imagine configuring your custom script to work with a certain device ID and then having that device ID disappear or its profile totally change. What would happen then? There have been instances where primary nodes reboot in clustered environment and then the auto-discovery begins to pick up secondary nodes thereby completely changing profile of the device.

What is even more frustrating is that the whole monitoring system would stop working all together. This means that the entire system would stop reporting alerts for the device and even stop collecting stats etc. We have seen this behavior with OpenNMS.  How is this possible? Simple – networks grouped mistakenly can close visibility from OpenNMS to certain network segments. That means no one would even see or know that there are alerts in those blind folders.

Sure there are ways to work around this with a few configuration tweaks. If you are in love with a certain monitoring packages you might say that you can completely disable auto discovery.

But this blog is not about OpenNMS or any other monitoring package. This blog is to make TWO points:

  1. It doesn't really matter what package you use, you still need to put lots of effort into configuring monitoring policies. You will have to maintain, adjust and test your policies all the time. This is a full time job for any sizable organization.
  2. Auto-discovery and the potential problems it introduces isn't always worth the benefits. In our opinion, it should not be used regardless of which monitoring package you choose. 

   

Replies

Leave a reply

Name (required)
Email (will not be published) (required)

Number from the image above
  
Latest blog posts
VNX Versions
11/10/2014
Subscribe to the blog by e-mail

Sign up to receive
Digital Edge blog by e-mail


Subscribe    Unsubscribe