This web crawler will only be used to gather URL and backlink information like the one used by SEOMoz who have over 60 billion URL’s indexed. The results will not be publicly available; they will only be used by us for a reporting suite that is in development
- The crawler needs to be run in a language that will be able to index billions of URL’s.
- The crawler needs to be built in such a way that it will not slow down when the database increases.
- The crawler needs to recognise and remove duplicate URL’s.
- The crawler needs to automatically create and index new links.
- The crawler needs to index where links come from, where links are pointing to, any anchor text that is used and if the link is follow or nofollow.
- The crawler will need to show how many outbound links are on each page.
- All information needs to be stored to an MySQL database.
We are aware that this is something that can be built fairly quickly however our we have our developer working on other projects so are looking to bring someone else in to complete the task.
Before commencing we will need to discuss this project via email or Skype messenger to ensure that all of the boxes are ticked and we are not missing anything that could be vital to the project.
9 freelancers are bidding on average $219 for this job
Hi, I recently developed web crawler with data processing and insertion to MySQL database (check my review for details). I want to help you with your project. Best regards, Viktor