The goal of this project is to deliver a python script that can scrape wikipedia pages (input is a csv with links to wikipedia pages to scrape).
The script should scrape the following information (if exists), if the page is not in english, the information should still be scraped
- Location or headquarters
- Number of employees
- Number of volunteers
- Official Language
- Director general
- Board of directors
- Parent organization
- All the info in the infobox
>>Mandatory Technical requirements<<
- The script should be written in python
- Last step of the scripts, should be pandas dataframe(s) dropping the data to csvs.
- Outputs must always have the same numbers of columns (columns migbt be empty)
- The script should allow for stopping and restart from lastest completed scrape (via table of content for example).
- The script should allow for the usage of proxies
- Script should be commented
- No selenium
>>Important technical requirements<<
- Script should used some "stealth" techniques (sleep, user-agent switches, …)
- Api scraping is preferred
>>Nice to have technical requirements<<
- Possibility to do parallel sraping with different proxies
- 1(or more) Python scripts
- Csvs containing the data for the scripts scrapped (for verification)
21 freelancers are bidding on average €156 for this job
Hello, William. How are you today? I am a professional Python web scrapping developer so that I can help you perfectly. I can start work right now. Thanks Kateryna
Hello, My name is Caleb Sawe and I'm a python data scientist. I can build a web scraper for you to obtain the data you require. Kindly contact me @ +254762985779 Thank you.