I want to make a website like [url removed, login to view] (or [url removed, login to view]).
It's basically just a news aggregator script like google news (with an algorithm to detect duplicate news). Currently [url removed, login to view] contains 22 websites that are checked every 5 minutes for new news (most of the websites have a rss feed, but it might be necessary to parse html). I will provide you with a list of websites I want to be checked. It's also possible that an existing news is edited, so the script has to check if the content of older news have changed (e.g. for all news of the last 30 days).
It should look like [url removed, login to view] (see attachment).
The user should be able to search certain terms (search form on top of the page).
I also need a filter, to show only news from certain websites (via html checkbox).
In my opinion I need a script (in python or java or ...) that is running the whole time and checks if there are new news. If so, it should feed a mysql database with the content and time (just to mention one thing: since this is a german project the three special characters ä, ö and ü need to be encoded).
Another script with the duplication algorithm needs to scan the mysql database for duplicates, so that at the last step the news can be shown at the website (e.g. via php).
11 freelancers are bidding on average $505 for this job
Hello, My name is David Stanek (Google me!) and I'd like the opportunity to work on this project with you. I am a Python expert and can get this done quickly and efficiently.