I have a data scraping project that outputs data. It's built on a back end php threadding script that runs continuously on my computer, downloads data into a cached folder every ten mins, then WinSCP auto updates any changes in cache folder and automatically uploads it to the web. Then, when any user goes to my [url removed, login to view], site pulls data from the cache folder on the web.
I know how to scrape any basic text data, but not quite sure how to get some text that retains its original href link paths when clicked. In a beta test for a site I'm trying to scrape, I did discover how to get all contents in a given tag that had news headlines, and would output that data. It would output data like:
"Brady breaks silence on concussion comments
The Presidential hug: embracing personal diplomacy
Amtrak train derails in Washington State"
The cool thing is, I was also able to make it so those links were clickable on my site, and directed to the source website individual news site, which is perfect.
There's two issues however. 1, Images also are outputted from that site, which I don't want. So I need some code written in that IGNORES all scraped images. And 2, I can get the desired output on an individual .php file, but not quite sure how to format it in my php thread framework, and interacting with the cache file system I have set up.
So this project is quite simple, and small.
For a news website I wanted scraped, add php script that will pull the headlines from the main page, business news page, and technology news page. Remember I have some of this coding done already because I have a demonstration of scraping this site, but it doesn't work yet in my framework. I will hand over this code to you.
Additionally, this scrape must work with my php thread framework and cache folder system. The desired result is that on my site when user opens it, scraped headline news text appears on the site. User can click any of these links, and be directed to the actual news site story pages.
The framework is written, all you're doing is just modifying some code to make it fit, because I don't know how to do it properly myself.
Upon you possibly "applying" to this job, I will email you more details and my entire source code, so you can confirm you can accomplish the job before we agree. I will be paying via milestones. I will create the full milestone. The way we'll do it, is once you're done with the job, via team viewer you'll demonstrate in your environment that the project is working 100% and done. I'll then release 70% of the funds to you, and you send me the code. The last 30% will be released to you once I confirm through a day or two of testing that the new code 100% works in my environment. That is the way I worked on the last freelancer job I set up, and it worked out perfectly.