News Aggregator/Crawler for 27 Sites - Perl/python
- Status Closed
- Budget N/A
- Total Bids 6
We need experienced professionals with following coding languages: Perl maybe Python and we are not sure if its can be done in PHP (we are though open to suggestions).
The work will involve three main aspects:
1) A boot to crawl each site (27 sites).
2) A script to find related/similar news by text linguistics -patter analysis- or any way that you know that can this be done.
3) What is crawled has then to be indexed to a database and made available by search, possibly using this open source search software; [url removed, login to view]
We have 27 news sites that we want to be crawled/spider with a boot; we assume that as the 27 sites are different the code for each might have to be slightly different as well.
Most sites will need to be crawler every 10 to 15 minutes and some other every 30 minutes. Only the front pages of each site are to be checked, but maybe we can in some sites juts check the RSS feed and get the data from there possibly.
What coding language to use here: we know that Perl by default a very good text based coding language; that is why we suggest that work would be done with Perl; that is the crawling + the related stories script. The other reason is that we know that sites like [url removed, login to view] and [url removed, login to view] have been coded in Perl and as you can see the stories under the “RELATED” for example in [url removed, login to view] are very good. Therefore, it seems Perl as a text-based language can achieve a good results here. At the end of the day we leave it to your expertise. We also know that a web spider can be written also in PHP, but we are not sure of its capability!!!
Note: all work you will do has to be documented, because we want that if the future both you and another coder has to fix something they will understand the code and be able to read it through. THEREFORE, WE WANT VERY PROFESSIONAL WORK.
So let us know your experience/expertise with both scripts that can spider web pages and in addition scripts that can do text linguistics analysis and be able to find related news by text analysis on the news tiles and news description. we want to build a relation with you as well. Once of the reason why we will probably not work with previous suppliers is because they don’t have expertise in Perl or Python; so this is an opportunity to you or your company to join us as a future long term partner.
You will also be working on your local production server until we are ready to move the our live server that we still need to set up; so once we are happy with your work then we move to our server.
You will be only doing programming; another company will do all graphics and some other small things like users account section…etc; so only the three aspects mentioned above is what you are biding for. Crawling + related script + search: in other words you will do the core of the whole project.
Send your bid as soon as possible; we have some detail description about each section and we can provide it to you upon request. If possible, it would be nice to know if you have extensive experience with web crawler or aggregators and in special if you can accomplish the “Related” news script, which is very important to us.
We can probably use escrow account and pay as per steps are accomplished...
Get free quotes for a project like this
Looking to make some money?
- Set your budget and the timeframe
- Outline your proposal
- Get paid for your work
Hire Freelancers who also bid on this project
Looking for work?
Work on projects like this and make money from home!Sign Up Now
- The New York Times
- Wall Street Journal
- Times Online