I need a data extraction script. The data I want extracted are 'news article' links from any webpage. Now, extracting just links from a webpage isn’t much of a challenge (you get numerous scripts online to do it for you). I need you to recognize whether this link is a news article link or not.
Again news articles are a very broad category. I want more 'India' related news extracted. Anyway, after you reply to this bid request I will let you know about what kind of news I want extracted more in detail.
I want no fancy admin or anything, just something plain and simple where I can maybe pass to the script the page url as an argument and it can parse it and display the links. I need your code well documented and you must be willing to explain me parts of the code I cannot follow as I will need to understand it thoroughly to integrate it anywhere else.
Also, let me know how accurately you can manage to recognize article links. If you are claiming 100% accuracy please give it some thought and make sure it is possible. Let me know if you need any further details. If you furnish be with some information about how you are going to go about doing this you will increase your chances of me excepting your bid.
1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.
2) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables):
a) For web sites or other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.
b) For all others including desktop software or software the buyer intends to distribute: A software installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request.
3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).