Squid log processing
This project was awarded to varun580 for $799 USD.Get free quotes for a project like this
Currently we are using customize mysar data collector (C program) to pump regular squid [url removed, login to view] data to mysql for processing. Based on mysar program reports, where are able to pump around 1000 - 1200 records per second.
This almost hit the limit. Currently we are looking for suggestion what kind of DB to use. We currently look into hadoop, MandoDB and so on, as close to mysql if possible (in term of query for data). The DB system should be able to extend to multiple server (horizontal expansion).
What we looking for this job is
1. Suggestion on what kind of DB to use + server requirement (processor + RAM)
2. New data loader (script or C program) to load default squid [url removed, login to view] into the DB. This should be high performance loader at least 10k line per second. -> we will provide a data for testing. Currently our data around 10GB-15GB per day.
Requirement - basically as mysar function as below
-> Cut the domain to up to 3 segment e.g [url removed, login to view] -> save as [url removed, login to view] except for IP
-> tag the access as cached(TCP_HIT, xxx, xxx) or not (TCP MISS, xxx, xxx) as in mysar
-> if possible able to stop and start processing in the middle of the file.
-> if possible no parameter should be saved in DB (unlike mysar). Mysar store certain parameter in DB. If can avoid is better. Only traffic info should be in DB.
-> Store the whole URL for keyword searching -> if possible. Sometime we search for certain keyword such as porn, the application will return the result as
Host, URL, Byte. This usually search for 1 particular day only.
Additional requirement (not available in mysar)
-> Translate/tag IP to zone. We have table to map ip to zone. Data in table like
172.30.10.0/23 Zone A
172.30.12.0/23 Zone A
172.30.14.0/23 Zone B
We have about 20 zone. There a few ip subnet pointing to the same zone. This zone must be expandable in feature.
-> Tag to server IP. Since we have about 10 server. We need to know the log from which server.
3. Sample for php report (how to query the data)
1) Top 20 web site (URL destination) for particular month, -> Site, No of Access, No of User, Total Byte, % cache hit, % cache byte hit
2) Top 20 host for particular month -> IP, No of Site, Total Byte, % cache hit, % cache byte hit
3) Report 1 & 2 but for each zone
4) Total User, Bytes, Request, % cache hit, % cache byte for particular month
5) Report 4 for each squid server
Looking to make some money?
- Set your budget and the timeframe
- Outline your proposal
- Get paid for your work
Hire Freelancers who also bid on this project
Looking for work?
Work on projects like this and make money from home!Sign Up Now
- The New York Times
- Wall Street Journal
- Times Online