I have a Java-application behind a webserver which I want to monitor.
I do think that Splunk can do what I want but maybe I am wrong and maybe there are better solutions for my problem... Tell me if you know a better solution!
Here is what I want to do:
My linux machine (CentOS 64bit) runs an Apache webserver and one big Java application.
I want to monitor Apache, Java (as much as possible e.g.: Heap, Old/New Gen, GC, threads, loaded classes, etc....) and the machine itself (network traffic (seperated by interfaces e.g. eth0, [url removed, login to view]), disk i/o, cpu, memory).
The apache logs includes more than the apache standard. We also log %b %I %F and even more. (I will tell you when we discus things more deeply...)
The Java application writes a GC log (-Xloggc) and uses [url removed, login to view] (remote JMX with JSON over HTTP)!
The monitor data has to go into a searchable database, so I am able to see nice graphes using a web-gui what has happen between e.g. '2013-10-01 09:15:14' and '2013-10-09 19:15:01' with the Java application. Or the Apache. Or the machines memory... I am also able to combine these data! E.g. show me what has happen in this time with the response time of each .html-Document while also seeing the GC and the network traffic on interface xyz. This output has to be combined into one graphic or into one graphic for each 'service', but must be drawn one below the other so I see the time relationship. Show me what is possible!!
Additionally I want
- to define predefined searches
- to save individual searches
- to automatically send alerts, when 'something special' happens (e.g. CPU higher than X). How complated can this expressioen be? (e.g. "(CPU higher than x AND new-gen contains Ymb) OR apache response time bigger than Zms")
... and how do I do it?
Because I want to do this with more than just one machine:
I want to be able to define users which have only access to their own zone/environment/or whatever this is called. In the end a user loggs in, "sees" only the data he can search and some buttons for predefined searches....
Normally I want to sent the data by syslog. So how do I configure that whole stuff so the data for each machine/service goes into the correct datastore?
Additionally I need a solution for complicated situations; e.g. where the production machine and the monitoring machine cannot talk to each other, so they need a third machine to communicate!
So the best solution is to have an 'injector-script' (or whetever this is called).
This script reads from ths stdin and writes the data into the databse at the right places.
This way I can handle the above scenario very easily by my own! So: How do I do an 'injector-script'?
BTW: This could use the above defined syslog (if possible)!
Does your solution has a real-time feature? Will it work in all above scenarios?
('real time': The user looks at some search output, which has something like 'up to now' as the end time and sees the new monitoring data arriving in 'real time' (= without clicking 'update' or something like that))
How does ths real-time feature cooperates with the above 'injector-script'?
If you want this job show me an example of your work in that area.
If some of my thoughts above are unrealizable dreams talk to me! If I award the project to you and hear later that this or that isn't possible we (and especially you) will have a problem!
And no, you won't get access to the production machine!
And be aware that I will ask some stupid newbie questions how to do some searches and change the look to the graphs etc!!!
I will install Splunk and also the solution you build by myself so I am able to do it again and again!!
So be prepared to write it down! (I don't need to be a perfect ready-to-print documentation! It just needs to be understandable!)