Many people post phone numbers on Craigslist with their posts. I'd like to know the percentage. I won't be calling people, but I'd like a log of the advertisements you scrape (urls) and a list of phone numbers found.
I would like this code written in Python and expect to be given the source and rights to the source. If you prefer, the code can be open-sourced and put on Github under a MIT license.
This project should meet the Craigslist TOU as it isn't display/distribution of their data- it's a statistical summary.
* license your code as above and provide it to me.
* Scrape the Los Angeles for sale by owner postings, at least 1000 posts.
* log the following, with a tab in between each entry: url, sale title, identified phone numbers (if any)
* ensure you are setting a user-agent that is identifiable to your email address. That way Craigslist can contact you if it's abusive.
* use the python-phonenumbers library, which makes scraping phone numbers very easy.
* save each page scraped so you can run it locally instead of scraping Craigslist again.
* test with a small number of pages until it is working correctly
This project should take a small number of hours- perhaps 2-4 hours. I'd love to get it done ASAP!