Need a program with full source code made with Embarcadero (Delphi) XE, Indy or ICS componets.
There is a web directory of companies from different cities. The program have to produce Excel file with contacts of companies, scraped from website (name, address, phones etc).
1. The program should have a list of cities available on website with multiselect option. Near the list there must be a button to refresh (sync) the list of cities from website.
2. The program should have a list (or a tree) of categories available on website with multiselect option. Near the list (tree) there must be a button to refresh (sync) the list (tree) of categories from website.
3. The program should have a progress bar showing how many pages have been processed and how many pages left.
4. The program should support multithreading: several pages of web directory must be processed at the same time in concurrent threads. The maximum number of threads the program takes from .ini file.
5. The program should have a "START" button. When clicked, the program starts parsing/scraping contacts from selected cities and selected categories and put them into temp Excel file. When the work is done, the program opens temp Excel file and user can save it anywhere she/he likes.
1. The list of cities and button to refresh list.
2. The list of categories and button to refresh list.
3. The progressbar.
4. Multithreading with .ini for settings.
5. START button.
Comprehensive details about web directory will be sent to those who interested.
I expect you to have a similar scraper and you need only minor changes to adjust it to my website.
This is the first project of several more, you can consider it as a ticket to further work with me.
What the program supposed to do:
1. Determine City IDs from user's choice in list
2. Determine Category IDs from user's choice in list
3. Go to web directory with URL like http://site/CITYID/category/CATEGORYID
4. Go througn all available pages http://site/CITYID/category/CATEGORYID/page[1..N] and collect profile links http://site/CITYID/profile/SOMEID1, http://site/CITYID/profile/SOMEID2 and so on
5. Go through all profile urls, take phone/site/etc and write them down to excel.