Java based webcrawler

This project was successfully completed by pazis for $76 USD in 4 days.

Get free quotes for a project like this
Project Budget
$30 - $250 USD
Completed In
4 days
Total Bids
Project Description

I have started a small Java project in Eclipse that implements Crawler4J webcrawler ([url removed, login to view]). This crawler uses the BerkleyDB ([url removed, login to view]). I don't have time to finish it. At this time it compiles, runs, and creates the DB. I have not tested for what gets saved to the DB. I want to get the project to where it will take a class that reads the Berkley db and outputs its contents so I know if the crawler is getting the data I want.

The second part of this project will be to create different methods in the custom crawler class I created to specify different types of data to be extracted.

The last part of the this project is to be able to feed in a list of URLs to crawl.

You can go to the Crawler4J website on [url removed, login to view] and review the code. You can also go the the DB site and review the basics of that DB if you are not familiar with it. You can open up the project in Eclipse. It is configured to run in Juno 3.8+, Java 1.7, and Maven 3.0.4.

Completed by:
Skills Required

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online