I need someone that can write a program, script, crawler, ect of some sort that can crawl through an existing
website and extract a large amount of data from that site. The data that I need to
collect is very organized on the website and should not be that hard to [url removed, login to view] finished product should
produce at least 700 excel files organized in folders on a server that I will specify later. Also, once the
master list is organized, I will need to update the data inside the excel files when the information on the
website changes. I also will be adding new excel files when needed if it does not exist when scanning for
updates. Here is how it needs to go be: Of course this can change if you think you know a more efficient way
of doing it.
1. Go to url: [url removed, login to view]
2. Login (we will provide)
3. Choose State in middle
4. Choose School in middle
5. Choose Courses on Right
6. (record list of departments that school offers...these will be in alphabetical order: example below)
ADED Adult Education
AERO Aerospace Engineering
AFRI Africana Studies
AGEC Agric Economics
FOUN Foundations Of Educ
FOWS Forestry & Wildlife Sci.
Gdes Graphic Design
7. Choose First department in middle
8. (record the list of classes for each department: example below)
2110 Principles Of Financial Accounting
2117 Honors Principles Of Financial Accounting
2210 Principles Of Managerial Accounting
2810 Fundamentals Of Accounting
9. Repeat the process until all schools have been crawled and all departments and each department's classes.
As I said before, this is what needs to happen. I really don't care how it is programmed. The final product
should produce Excel files, or update them. If for some reason when updating an excel file you find that it
is no longer listed in the master list, it should just use the previous excel file and not delete the existing
one. There should also be a text file created that summaries what was completed, if an excel file was updated, or
a new one added. We need to know when a new excel file is added, updated, or deleted. This is very important
and is needed considering that this list will have at least 700 separate files.
A final Example Excel file will be provided as an example. We need this to be done by some type of program.
Doing this by hand normally takes about 3-4 hours per Excel file. Besides being unproductive, it is very boring.
We would like to get this done in one week. Please let us know if you have any questions. If you are really interested please
use the code IIcucucoolio in the message you send us. All other messages will be ignored. We don't want auto generated
messages. We will give a username and password that will be needed to enter site. Will send the website address in a private