web data extractor / filter

In Progress Posted Aug 8, 2007 Paid on delivery
In Progress Paid on delivery

I need a php application which does the following:

A) Extract

access a specified html page, consisting of 100 numbered pieces of data (each linked to a separate page) and

extract from the top level-page:

a1) data matching 5 specific endings

a2) data matching two other patterns (every of the 100 pieces of data which is a single word only, and every of the 100 pieces of data which is a group of two words only)

extract from sub pages:

follow every one of the 100 links on the top level page only one link deep (these links are dynamic - potentially different every time the top level page is accesses), and extract

a3) data matching 5 specific endings (same as above)

B) Data Manipulation:

Raw data retrieved matching a2) patterns as above will need to be manipulated: remove spaces, and append one specific ending

C) Store:

Store data in a database, with date of (first) retrieval (duplicates should not be stored), and an extra attribute if it is data which has been manipulated (for a2 with the added ending).

D) Output:

create/update two daily txt files with data retrieved that day: 1 for a1+a3 combined, one for a2 data

Other requirements:

* A simple web interface to create data output by date range and type (a1+a3 and/or a2)

* Script should run every X minutes/hours (cron job)

* Possibility to specify a list of proxies (with an option for username/pw) auth, which the script will cycle through for web-access (must be able to skip non-responding proxies. No proxy if list is empty.

* Development/Testing on your own server, complete installation on my server when finished (CentOS / WHM / Cpanel)

I was thinking about php/curl/mysql as I am familiar with these, but feel free to suggest other methods if you know far superior methods.

Thanks for looking :)

Data Processing Linux PHP

Project ID: #166201

About the project

10 proposals Remote project Active Aug 21, 2007

Awarded to:

krt

Please see PMB for detailed bid description.

$150 USD in 5 days
(12 Reviews)
3.9

10 freelancers are bidding on average $196 for this job

NishantBamb

Hello, please refer your PMB. Thank you.

$300 USD in 7 days
(144 Reviews)
7.3
gaffapi

if it is not solely for php&linux, feel free to contact me.

$100 USD in 0 days
(94 Reviews)
6.4
h114

hi, kindly glance through your pmb. regards, Rakesh

$200 USD in 4 days
(22 Reviews)
6.0
wasimsohail

Please see PM for details.

$200 USD in 10 days
(54 Reviews)
5.3
visionary7

Hi, I can do this easily and have worked on several similar projects. I agree with you that php/mysql/curl is probably the best way forward. I am 24, live in south east UK and take pride in the high quality of m More

$150 USD in 4 days
(2 Reviews)
3.0
garydtaylor

I have a script which nearly suits your requirement. Please see PM for details.

$300 USD in 5 days
(1 Review)
3.0
souravpkl

I am interested to work with you. I have the total experience of 4 years in PHP, mysql, javascript and AJAX. I will assure you that you will get 100% satisfaction. Hope to hear from you soon.

$275 USD in 5 days
(1 Review)
2.6
softlance

Please check PM

$180 USD in 4 days
(0 Reviews)
0.0
haseeb157

hi i can perform well while completing your project.

$100 USD in 10 days
(0 Reviews)
0.0