Closed

Looking for a specialist in web site crawling

We are looking for freelancers who are specialized in web site crawling. We are working on several projects which require full crawling of web sites like e.g. http://www.parlament.ch. For large web sites we typically define several subsites which can serve as improved starting points for the crawler. The results should be the complete texts contained in the web site. Text in PDF files or in HTML-tables also need to be crawled and available in the result. Once the crawler is correctly set up for a specific site, we typically expect a periodic crawling of the site contents (e.g. once a week).

We are looking for someone who is experienced in this data gathering process and can manage all steps (setup crawler, improve crawler, manage document content update, transfer data to our server).

Crawling can be done with Apache Nutch or other crawling softwares which the specialist recommends.

Skills: Apache, XML

See more: crawling specialist, working of web crawler, web sites for freelancers, web site process, web site freelancers, web site for freelancers, We are looking for someone, steps for freelancers, specialized freelancers, nutch freelancers, looking for freelancers html, looking for a specialist, freelancers web sites, freelancers for complete site content, freelancers ch, data crawler freelancers, a site for freelancers, apache freelancers, all-in-web, web projects for freelancers, http freelancers in, specialist freelancers, sites for freelancers, nutch, looking for experienced web

About the Employer:
( 0 reviews ) Sweden

Project ID: #4136140

4 freelancers are bidding on average $16/hour for this job

sharpsoft

Please check PM for details

$12 USD / hour
(2 Reviews)
5.4
clivelim07

I have previously worked on a project like this by creating a java program to crawl websites like http://www.microsoft.com/en-us/download/default.aspx and https://www.bit9.com/. I was able to successfully extract infor More

$9 USD / hour
(0 Reviews)
0.0
johnhwardjr

I assume you are crawling your sites to generate indexes for an internal search engine. I can setup a crawler that will index your sites. This includes .pdf, .doc, .docx, .xls, .ppt, and other Microsoft formats More

$35 USD / hour
(0 Reviews)
0.0
poginato

Hi, We have a team of 06 developers who are expert in java with minimum 03 years to 8 years experience. We have good experience in Object Oriented commercial Software Design and Software Development. We are resul More

$8 USD / hour
(0 Reviews)
0.0