I have a list of 5,000 names and countries of social media influencers and I want a script to automatically lookup their instagram account information. If I google "site:[login to view URL] [name]" 99% of the time the first search result returned is the correct one and if go their profile ([login to view URL][username]) Instagram returns inside the .html a JSON object with their information (name, follower count, etc). I want a script to do this automatically. I created a basic one myself with simply curl "[login to view URL][name]' and then pulled the first "[login to view URL]" link, fetched that URL from Instagram to get the json object with the profile information. However, Google blocked the script after about 50 requests, even though i could still search manually using Chrome. I assume there is a better way to do this so the script can run and not get blocked.
I will provide a spreadsheet with 2 columns, an ID and the names to search, and 5,000 rows for the influencers. The deliverable is:
1. A spreadsheet with 3 columns: the same ID from my spreadsheet, the instagram ID, and the JSON object with the profile information.
2. The script that does this or an explanation how I can do this again myself later if needed.
It is not important that this be completely accurate. You do not need to do any validation that the instagram account is the correct one, and if the search result does not return an instagram id or returns the wrong one, that is ok. I just need a script that will do this and correctly simulate a human searching with a browser or similar so that it gets past the bot filters. The script can be bash, python, perl, etc. I have access to Windows, Linux, and Cygwin. So whatever language and o/s you use is ok.
In your offer, please indicate what o/s and language you will use.
18 freelancers are bidding on average $127 for this job
I am experienced with web scraping using Python. I have similar jobs without getting blocked. Conversant with Excel data processing from python thus I will format the output in the required structure.
I have a Python script that is able to extract the JSON you are looking for, pretty much ready to go - this is the reason why I put the offer price as low as I did.
Hi ,I have already done something similar and indeed you do get blacklisted for scraping on some sites . I can try to scrape all you need python selenium /beutifulsoup ,and provide you a highly reusable script.