Datascrape and Search Program

  • Status Closed
  • Budget $250 - $750 AUD
  • Total Bids 21

Project Description

Project requires: Datascrape pre-defined URLs, custom search of stored pages, google translate results

1) For a predefined list of URLs (obtain from database, by reference to specific fields in each record);

2) Scrape and store the entire website (text only) ("Level 0 Copy") - avoid duplication/recursion of pages generated from URL-local content databases - time and depth limits to be admin editable and flagged if reached;

3) Search stored Level 0 Copy of each page for certain keywords, keep only those pages for which there is a positive result on search ("Level 1 Copy");

4) Search remaining Level 1 Copy pages for a separate set of keywords, keep only those pages for which there is a positive result ("Level 2 Copy");

Up to 5 iterations of search and keep only positive results - the structure is of multiple filters, applied sequentially, to produce a small number of remaining pages that match all keyword sets;

5) Final step if keyword searches conducted in non-english languages (defined at beginning of process), send resulting pages (after all 5 filters) to google translate to english;

6) Output - Excel spreadsheet report on search meta-data and results (template will be provided); and pdf of all remaining pages at end of process with translated copy if applicable.

This is stage 1 of a larger project and must be completed quickly. Clean coding required due to long-term nature of project. Immediate follow on to next stage.

You must advise on any external processing (e.g. cloud) requirements. All testing and development to be on your own account. Final large scale test only to be conducted on my accounts.

Your bid must have at least a price estimate, subject to confirmation following discussion and provision of more detailed description of sources and process.

Get free quotes for a project like this

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online