In Progress

Automated server-based web scraping application to develop

Developer needed to develop an "intelligent" server based automated web scraping application which

can identify from a large list of website URLs (over 200k), business websites from non-business websites.

(a business website is a website which belongs to a business providing services)

The proposed way to do this is to

1) develop a server-based application which will have the following instructions:

a) verify whether the URL corresponds to an active website

b) browse the website and identify "intra site" links (internal links)

c) determine whether the text of the link includes a particular keyword (from a pre-determined set of keywords - such

as "about us", "services", "company", "clients"...)

for example: www. website .com/[url removed, login to view] - this link will give a "positive" result since the word

"services" appears in the link. (the word "services" would have been pre-determined by the user)

2) a web interface with the following user features:

from the web interface, the user must be able to:

- upload a list of URLs to scrape (up to 200k or more if possible)

- add keyword/remove keyword

- start the "mining" process, pause it, stop it, resume it

A real-time count of URLs processed with count of active websites, positive results, negative results - needs

to be displayed.

- download the URL list of active websites, positive-identified websites and negative ones

IMPORTANT NOTES:

The application needs to be multi-threaded efficient for max processing speed

PLEASE ONLY BID IF YOU ARE THE DEVELOPER. (NO AGENCIES PLEASE)

PLEASE INDICATE IN PMB WHAT DEVELOPMENT LANGUAGE YOU INTEND TO USE

Thanks for your bid

Skills: ASP, Data Mining, MySQL, PHP, Web Scraping

See more: web services company, web services application development, web development language, web development agencies, web developer resume, web developer language, web developer features list, use html web development, server resume, php develop company, keywords website development company, download web development, download develop html, developer php develop website, clients needs website development, application develop website, web development notes, web development keywords, agencies please, download web developer, web development company list, web developer list, resume processing application, mining web, html based text

About the Employer:
( 102 reviews ) London, United Kingdom

Project ID: #2505024

Awarded to:

luisurraca

I would like to work on this project. Planning on using Ruby on Rails and MySQL for the web server and Nokogiri (very popular Ruby gem for web scrapping). I would use background jobs so the application is usable during More

$720 USD in 7 days
(2 Reviews)
1.6

5 freelancers are bidding on average $624 for this job

mantislin

Hi sir, please check PM, thx Kimi.

$750 USD in 6 days
(184 Reviews)
6.9
zk230182

Hi I hope you'll be fine. I've studied your project specifications and I'm ready to provide you solution that fits our requirement.I am best in coding. I will do my best to make your project more effective. I will giv More

$750 USD in 7 days
(45 Reviews)
6.5
phpXpertbd

I worked on many similar projects, I have big experience in data mining projects. I can finish this task in short time, with the best quality.

$750 USD in 15 days
(30 Reviews)
6.2
aoefmpes

pl check your inbox

$350 USD in 10 days
(44 Reviews)
5.1
johnrio

Let us get this done for you

$550 USD in 10 days
(21 Reviews)
3.7