Data Extraction/ Transformation w/ continual monthly milestones long-term

This project was successfully completed by evidyarthi for $347 USD in 5 days.

Get free quotes for a project like this
Project Budget
$250 - $750 USD
Completed In
5 days
Total Bids
Project Description

Project Description:

We are looking to build an Open Access archive of freely available scholarly journals which we want the article level data and associated meta data defined below. [url removed, login to view] is a good explanation of what the content and project field is related to. We would like this project to continue long-term.


A. Create a harvesting engine in your own choice of coding (parallel processing has proved the best results) that can:

1.) Crawl specific Internet sites (targets), we will help with the target choices, OAI is one method some sites support

2.) If not crawling read from an input file to gleam the data, some sites supply

3.) Ensure the data is accurate and test URLs for correctness

4.) Dump the defined data to a text delimited file format

5.) Transfer the data via ftp to us

B. Work with us to find new resources and refresh existing sources on a monthly basis at $500 usd/month.

C. Provide new and updated data feeds continually

D. Provide your own platform to run the harvests, a multi-core processor should be sufficient

E. The data provided will be Article level data relative to each Journal. The detail data will need these output fields:

"Publisher", "Journal Title", “Article Title”, "ISSN", "Alternate ISSN", "Journal Year", "JournalVol","JournalIssue", "HTML URL", "PDF URL", "Start Page", "End Page"

NOTE: Journal level data is easy to get, all the articles in the journal are a little more of a challenge.

Sample data attached to project.

Completed by:

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online