News Aggregator/Crawler for 27 Sites - Perl/python

CANCELLED
Bids
6
Avg Bid (USD)
N/A
Project Budget (USD)
$250 - $750

Project Description:
Hi there,

We need experienced professionals with following coding languages: Perl maybe Python and we are not sure if its can be done in PHP (we are though open to suggestions).

The work will involve three main aspects:

1) A boot to crawl each site (27 sites).

2) A script to find related/similar news by text linguistics -patter analysis- or any way that you know that can this be done.

3) What is crawled has then to be indexed to a database and made available by search, possibly using this open source search software; http://www.sphinxsearch.com

We have 27 news sites that we want to be crawled/spider with a boot; we assume that as the 27 sites are different the code for each might have to be slightly different as well.

Most sites will need to be crawler every 10 to 15 minutes and some other every 30 minutes. Only the front pages of each site are to be checked, but maybe we can in some sites juts check the RSS feed and get the data from there possibly.

What coding language to use here: we know that Perl by default a very good text based coding language; that is why we suggest that work would be done with Perl; that is the crawling + the related stories script. The other reason is that we know that sites like www.techmeme.com and www.megite.com have been coded in Perl and as you can see the stories under the “RELATED” for example in techmeme.com are very good. Therefore, it seems Perl as a text-based language can achieve a good results here. At the end of the day we leave it to your expertise. We also know that a web spider can be written also in PHP, but we are not sure of its capability!!!

Note: all work you will do has to be documented, because we want that if the future both you and another coder has to fix something they will understand the code and be able to read it through. THEREFORE, WE WANT VERY PROFESSIONAL WORK.

So let us know your experience/expertise with both scripts that can spider web pages and in addition scripts that can do text linguistics analysis and be able to find related news by text analysis on the news tiles and news description. we want to build a relation with you as well. Once of the reason why we will probably not work with previous suppliers is because they don’t have expertise in Perl or Python; so this is an opportunity to you or your company to join us as a future long term partner.

You will also be working on your local production server until we are ready to move the our live server that we still need to set up; so once we are happy with your work then we move to our server.

You will be only doing programming; another company will do all graphics and some other small things like users account section…etc; so only the three aspects mentioned above is what you are biding for. Crawling + related script + search: in other words you will do the core of the whole project.

Send your bid as soon as possible; we have some detail description about each section and we can provide it to you upon request. If possible, it would be nice to know if you have extensive experience with web crawler or aggregators and in special if you can accomplish the “Related” news script, which is very important to us.

We can probably use escrow account and pay as per steps are accomplished...

PB.

Skills required:
Perl, Python
Hire programmingbids
Project posted by:
programmingbids United Kingdom
Verified
Public Clarification Board
Bids are hidden by the project creator. Log in as the employer to view bids or to bid on this project.
You will not be able to bid on this project if you are not qualified in one of the job categories. To see your qualifications click here.