1- Scrap through the XML file of one agency and iterate all the Ad listings.
2- For every agent store his name, mobile no and email id in a .vcf file.
3- The name should be written in .vcf file as Name @ Company Name - Area [No of listings of Highest Community Area]. As an example: Suppose Bilal Hameed works for a company called Engel Voelkers, and suppose he has 3 ads in JLT and 8 ads in Jumeirah Golf Estates. So since the highest no of ads are in Jumeirah Golf Estates, the name of bilal hameed in .vcf file should be written as Bilal Hameed @ Engel Voelkers - Jumeirah Golf Estates 8. You would have to iterate through all the ad listings to figure out the highest no of ads in the area for each agent. This process has to be repeated for all agents in the company.
4- For creating the names of the agents, use their FirstName @ Agency'sFirstName - Area
5- This process has to be repeated for cl=1 to cl=10,000.
6- If a url does not returns XML data and also does not return this message that "Client does not exist or is no longer active on Propspace.". Then the scrapper should iterate through the following urls to try and fetch the xml feed from one of the following urls. Please note that in the URLs below, the CL remains the same just the pid and acc changes to generate a separate xml feed for separate portals.
7- For every URL that returns an XML feed, store the feed in a folder.
[url removed, login to view] - Generic
[url removed, login to view] - Just Property
[url removed, login to view] - Dubizzle
[url removed, login to view] - Dubizzle Setup
[url removed, login to view] - Generic 2
[url removed, login to view] - Property Finder
[url removed, login to view] - Company Own Website
Attachment: sample .vcf file.
URL to iterate: [url removed, login to view] - Generic 2
1- .vcf file
2- .txt file with a list of all the urls that have XML feeds and the name of the companies having those Accounts. The name of the companies can be fetched from the email address in the XML feeds.
3- Folder containing all the XML feeds of different agencies.
NOTE: Propspace stops displaying the XML data to an IP after a certain no of xml fetches. So the script would have to continue changing its IP in order to be able to successfully fetch all the potential XML feeds.
Hi, I have worked on multiple similar tasks using PHP scripts and Python, I can complete your task in time and budget . Please contact me for discussion on my past projects Thanks, Ashwin