The project's goal is to develop a focused web crawler that produces categorizable web links to be structured within an open source database

This project received 3 bids from talented freelancers with an average bid price of €2645 EUR.

Get free quotes for a project like this
Project Budget
€1500 - €3000 EUR
Total Bids
Project Description

Project Summary:

The project's goal is to develop a focused web crawler that produces categorizable web links to be structured within an open source database. The web crawler should be language independent, and allow high user flexibility both in terms of sources and keyword combinations to be crawled on a daily basis.

From an IT perspective, it could mean to program a "focused web crawler" that can search specific domains (mostly news and specific industry sources), index the resulting pages' content und filter these content's based on an intelligent algorithm ("text search") that takes into account a given selection of keyword combinations. We are open to discuss other ways of realizing the project in case the freelancer is able to convincingly argue a better/ easier/ more cost efficient methodology.

A typical scope of a daily crawl for one language could involve about 500 sources and about 200 keyword combinations. As a result, we would expect the crawler to find about 5-50 new results (links) for each of such daily crawls. The resulting links and meta data (such as frequency of keywords found, date, source, mime-type) should subsequently be stored in a database to be further analyzed.

Required capabilities:

• Experience in Python as the preferred programming language, alternatively Java

• Experience in Lucene/ Solr/ Nutch as the preferred frameworks and technologies to be used. Potentially alternative search technologies.

• Experience with necessary open source databases for the input (keyword combinations, web sources) and the output data (links, meta data)


• The project's time frame is estimated to be around 4 weeks, 12 days for developing the application and 8 days for testing/ modifying.

• The proposed fee would range between [url removed, login to view]€, depending on the candidate's experience. Part of the fee will also depend on the quality and completeness of the results links.

• The IP rights and the entire code on the final product will stay with the customer. During the testing phase, the customer should have full access to the final test version, without any limitations.

If interested, please write an e-mail to: [REMOVED BY [url removed, login to view] ADMIN] with your comments and conditions.

Thank you

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online