This is a complex project (similar to google news), please bid only if you have a very good experience in scraping, data mining, databases (like mysql) and HTML.
*Scraping: I need to scrape 40 news websites continuously for news title, image and content. I want to be able to easily include new news websites myself (e.g. with XPath expressions). This means I will have to understand your code. A fancy frontend with nice-looking buttons is not needed, I'm a computer scientist with 5+ years industrial experience.
*Data mining: Similar news shoule be identified and clustered into a single news.
*Database: Everything should be stored in a database
*View: A website should show the clustered news with links to each news website. The layout of the website is very simple, it's no big deal.
This might sound easy to you, but it's more complex than I said. similar_text or levenshtein functions in PHP are *no* adequate solution, because they are very limited and produce very poor clustering results.
This is how it will go down: Place your bid and write ma a PM. I will write you back with all information about the project so you can see if you are up to it.
The budget is 3000-3500$. I could do the job myself but I don't have time. I don't care about user interface, I care about business logic. You will have to provide full source code. To show me that you read everything, write "Project1" when bidding, thanks.
22 freelancers are bidding on average $3858 for this job
"Project1" Please see PMB for examples of my previous projects related to web scraping. Very interested in this project, please provide details. Thank you, Zeke