We are conducting early-stage research for a business idea relating to Wordpress.
We have identified a pre-crawled corpus of web documents here:
[url removed, login to view]
We are seeking someone who can make an Elastic Mapreduce application to search the corpus using techniques explained here: [url removed, login to view]
We want to grep for the URLs that are Wordpress sites (perhaps by searching for the text wp-content in the HTML code or some other similar technique).
Once we have the large list of Wordpress sites there will be further jobs available but the first step is producing this list.
9 freelancers are bidding on average $609 for this job
Hello Sir/Madam I read your whole description but didn't get what actually you want to extract. If you can explain me in PMB, I can do all kind of web scraping, I have 3+ years of web scraping. Thanks and Regards