Configure crawlers using regular expressions

  • Status Closed
  • Budget $2 - $8 USD / hour
  • Total Bids 9

Project Description

We have a crawling technology that we are using to find all relevant content on specific sites. We need someone who can help us configure those crawlers using regular expressions so that:

1) The crawlers find ALL RELEVANT content. That meaning following all relevant links but not the irrelevant links

2) Extract all relevant metadata in a specific HTML/URL. We have a system for this too but need the regular expressions from you.

We will do this in several countries all around the world so if you perform well here it will be a lot of work to do in the months and years to come.

Applicants to this job will be invited to take a small test before they get selected.

The test has 10 Regular Expression questions and 1 open question, and it takes approximately between 30 minutes and 1 hour. Expressions will be tested with .NET Regex engine, Case Insensitive and Single Line.

We will provide a tool for testing the Regular Expressions during the test.

We are using [url removed, login to view], so this should be taken slightly into consideration. Take a look at the attached file.

Get free quotes for a project like this
Skills Required

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online