Screen Scraper - Gov't Website - Check MySQL for duplicates
- Status Completed
- Budget $30 - $250 USD
- Total Bids 8
We need one public gov't website scrapped. It's a simple scrape; nothing special like captcha, password, etc... The gov't site is updated every time there is new information. The (Screen-Scraper .sss) Scraping Session in java would need to be aware of new information, and write this information in tsv format.
Scraped data needs to
(1) Have unique ID, compared to db for duplicates (mysql)
(2) Write scraped data to tsv format (approx 10 fields and 1 image)
(3) Have resilient extractor patterns
(4) Have Java Codes // Commented/Documented
The unique ID is incremental, and this is how you get to the details page.
The extractor patterns are simple.
(1) Must check if there is new information (scrapable data) with in a short period, or it will no longer be available.
(2) Sometimes the image doesn't yet exist, and the data does exist. With that said, here is the challange, sometimes the image will never exist, at which point we need to keep the scraped data, (i.e. iterate - after so many tries - if img not exist, keep the scraped data)
(3) It may seem like a simple site to scrape at first glance, but please don't underestaimate it, and leave it for the last day the project is due, as it has to be production ready when you submit it.
(1) Please only bid if you have experience with [url removed, login to view]
Project Due Date:
3-4 days after bid acceptance
This is my first post here with [url removed, login to view], so please bear with me as I learn the ropes. I work for an attorney firm who specializes with clients in direct marketing, so I will have more projects similar to this. We need this right away and production ready, as this is an integral part of a larger pilot program we are launching.
Thanks for reading this. Look forward to the bids.Get free quotes for a project like this
Looking to make some money?
- Set your budget and the timeframe
- Outline your proposal
- Get paid for your work
Hire Freelancers who also bid on this project
Looking for work?
Work on projects like this and make money from home!Sign Up Now
- The New York Times
- Wall Street Journal
- Times Online