You have chosen to sponsor your bid up to a maximum amount of .
We are conducting early-stage research for a business idea relating to Wordpress.
We have identified a pre-crawled corpus of web documents here:
We are seeking someone who can make an Elastic Mapreduce application to search the corpus using techniques explained here: https://aws.amazon.com/amis/common-crawl-quick-start
We want to grep for the URLs that are Wordpress sites (perhaps by searching for the text wp-content in the HTML code or some other similar technique).
Once we have the large list of Wordpress sites there will be further jobs available but the first step is producing this list.