Michael Petrov
Co-Founder, CEO
Monitoring… What really matters.


This was a long pending blog but the guys below cover most of my ideas. But I have some more: SEE Original STATEMENT

What are we really monitoring?...

Do we really care about CPU utilization? Or Memory. Disk – yes, we do care as it can just stop processing. But more of the information are indirectly related to what we really want to know.

What we really want to know “How the system works for the business”. I call it “business experience”. If business experience is fine, end user or end business process flows feel good, who cares about CPU and Memory and all other things. They could be reviewed once in a while to see if we are really frying that CPU and maybe we are on the edge of the performance. But otherwise if business feels good, we should be OK too. I know I am simplifying things but still, you should go down to simplicity to see the problem.

Our as IT biggest problem that we are measuring technological aspects of IT trying to correlate them to business experience. Yes, it is relevant but we as IT often think that we are smart and can monitor “everything” so when something out of that “everything” is out of a norm, we should react as the business experience may suffer.


Being too long in IT I don’t think that I am too smart anymore. Not because I don’t know lots of things. I know that I know a lot but I don’t know a lot too. Also the complexity in the IT makes that “everything” almost unmanageable. Sometimes a situation when 2 or 3 parameters that are all individually in check produce conditions when business experience suffer. Sometimes you think that you monitor everything but one parameter was not included that would indicate a problem. Then the business experience would indicate “we stuck” but IT would see that everything is OK.


So my thoughts are that we really need to be rigorous and say that we have to monitor business experience and correlate IT sensors to it. Why do I say rigorous – because it is hard. There is no easy solution to this. No “WhatsUp” or OpenNMS. WhatsUp is detecting if it is UP. It doesn’t check if “it” is up AND PRODUCING. What if it is UP but NOT producing?


So I think most of the IT managers just look away from the real problem as there is no out if box solution and real monitoring is hard to implement. But in fact there is a solution, it is not easy, there is no wizards or magic one step deployment. There is work involved but the solution exists. Check our EMS system. This solution helped us to deliver 100% clients’ up time for many years.




Namari on 9/9/2012 2:06:05 PM

BION I'm ipmressed! Cool post!


