In the scope of my work, I need to parse frequently a lot of data from three different websites.
So I would like to have a python script that can perform this task, and pull all the data into a CSV File.
Three CSV output : one for PTE Academic Tests Centers, one for IELTS Tests Centers and one for IELTS Institutions Accepting.
-- WEBSITES --
- PTE Academic Tests Centers
Countries by Countries, City by Cities (some countries have cities. Check for USA for exemple)
- IELTS Institutions Accepting
Go Countries by Countries, pages by pages (some countries have more than one page of Institutions Accepting, Check for USA for exemple)
- IELTS Tests Centers & Prices
Need to parse the global list of all the tests centers
AND need to go tests centers by tests centers to parse the price for each (please look at the files Enc.)
YOU NEED TO PAY ATTENTION TO THE SOURCE CODE OF THESE PAGES:
For PTE Academic Tests Centers (http://www.pearsonvue.com/pte/locate/) you will need to deal with strange iframe things
For IELTS Tests Centers & Institutions Accepting (http://bandscore.ielts.org/ + http://www.ielts.org/test_centre_search/search_results.aspx) you will need to deal with strange ASP.NET cryptic values like __VIEWSTATE__ & Cie.
THIS SCRIPT NEED TO BE RUN UNDER WINDOWS. I want a script that will work under windows, not only on Linux.
So the use of Grab module (http://packages.python.org/grab/) is forbidden, as far it is not working on windows).
Scrapy, Twisted and others are welcome.
This script would propose three options for the user :
1) Parsing IELTS (Tests Centers & Prices + Institutions Accepting)
2) Parsing PTE Tests Centers
4) Parsing 1) and 2)
You will find Enc. an archive with five CSV documents that represents the five output that I want for this script.
Please BID on this project ONLY if you have the skills to perform this job, and I will contact you by Private Message to know how you plan to do this job.
Thank you in advance :-)