In Progress

Data Extraction/ Transformation w/ continual monthly milestones long-term

Project Description:

We are looking to build an Open Access archive of freely available scholarly journals which we want the article level data and associated meta data defined below. [url removed, login to view] is a good explanation of what the content and project field is related to. We would like this project to continue long-term.


A. Create a harvesting engine in your own choice of coding (parallel processing has proved the best results) that can:

1.) Crawl specific Internet sites (targets), we will help with the target choices, OAI is one method some sites support

2.) If not crawling read from an input file to gleam the data, some sites supply

3.) Ensure the data is accurate and test URLs for correctness

4.) Dump the defined data to a text delimited file format

5.) Transfer the data via ftp to us

B. Work with us to find new resources and refresh existing sources on a monthly basis at $500 usd/month.

C. Provide new and updated data feeds continually

D. Provide your own platform to run the harvests, a multi-core processor should be sufficient

E. The data provided will be Article level data relative to each Journal. The detail data will need these output fields:

"Publisher", "Journal Title", “Article Title”, "ISSN", "Alternate ISSN", "Journal Year", "JournalVol","JournalIssue", "HTML URL", "PDF URL", "Start Page", "End Page"

NOTE: Journal level data is easy to get, all the articles in the journal are a little more of a challenge.

Sample data attached to project.

Skills: C# Programming, Java, Perl, Ruby on Rails, Web Scraping

See more: wiki challenge, what are milestones, start a wikipedia page, java coding challenge, data challenge, create a wikipedia page, challenge wiki, sample coding in java, wikipedia w, what is milestones, sample data, multi core, meta data, help e w, data w, data extraction and input, data en, crawling of data, crawl data, correctness, core java project, article w, sample data input, html open pdf, wikipedia article create

About the Employer:
( 5 reviews ) Windsor, United States

Project ID: #3996484

Awarded to:


Hi, I am very interested in the project,please check your PMB

$347 USD in 5 days
(2 Reviews)

12 freelancers are bidding on average $391 for this job


Hello, I am a Java expert from China, I have lots experience on this kind project(extraction and trafomation), I like long term and stable cooperation,so I am very interested in this project, please let me More

$550 USD in 10 days
(58 Reviews)

Hello, please refer to your INBOX. Thank You .

$500 USD in 10 days
(49 Reviews)

Hi sir, please check PM, thx Kimi.

$250 USD in 5 days
(68 Reviews)

Your main requirements are clear. We can do this for you and we have mentioned how we are going to do this in the message we sent you. Please go through it and contact us for more details. Solution Infinity.

$700 USD in 10 days
(19 Reviews)

Hi , I am interested in taking this project. Please check my PMB for more details.

$550 USD in 15 days
(1 Review)

Please check PM

$250 USD in 8 days
(9 Reviews)

fast epoll based search engine with module style site parsers perl, linux

$500 USD in 7 days
(21 Reviews)

Hello! Im very interested!

$500 USD in 10 days
(5 Reviews)

Please check my PM.

$250 USD in 4 days
(3 Reviews)

we have a team of 4 Php experts member and we provide good quality of work and give satisfaction to the clients by our [url removed, login to view] claim and guarantee 100% satisfaction and delivered the results at exact time and per More

$250 USD in 10 days
(0 Reviews)

We are freelance software developers. If you contact me I can give a quote for your project and we can discuss the details. <b><i>Removed by Admin</i></b>

$500 USD in 1 day
(0 Reviews)

Easy and fun! I will gladly get this job done for you. I'm highly experienced with this kind of jobs.

$250 USD in 7 days
(0 Reviews)