Web Scraping a Business Directory - repost

This project was successfully completed by ashok7925 for $105 CAD in 2 days.

Get free quotes for a project like this
Project Budget
$30 - $250 CAD
Completed In
2 days
Total Bids
Project Description

I've tried scraping the following website but can't parse the data for some reason [url removed, login to view]

I want the following information on Real Estate companies in Calgary, Alberta, Canada (here's an example [url removed, login to view]):

Company Name



Phone Number

Company Description

SIC Code


Contact Name

Contact Position

Location Type



Years in Business

This information can be selected using the following code:

item['company'] = [url removed, login to view]('//h1[@class="company-name"]/text()').extract()

item['address'] = [url removed, login to view]('//div[@itemprop="streetAddress"]/text()').extract()

item['website'] = [url removed, login to view]('//dl[@class="website_info"]/dd/span/text()').extract()

item['phone'] = [url removed, login to view]('//dd[@class="tel"]/text()').extract()

item['description'] = [url removed, login to view]('//p[@itemprop="description"]/text()').extract()

item['contact_name'] = [url removed, login to view]('//span[@itemprop="name"]/text()').extract()

item['contact_position'] = [url removed, login to view]('//em[@itemprop="jobTitle"]/text()').extract()

item['location_type'] = [url removed, login to view]('//table[@class="table-data"]/tr[1]/td/text()').extract()

item['SIC'] = [url removed, login to view]('//table[@class="table-data"]/tr[4]/td/text()').extract()

item['NAICS'] = [url removed, login to view]('//table[@class="table-data"]/tr[5]/td/text()').extract()

item['revenue'] = [url removed, login to view]('//table[@class="table-data"]/tr[2]/td/text()').extract()

item['employees'] = [url removed, login to view]('//table[@class="table-data"]/tr[3]/td/text()').extract()

item['years_business'] = [url removed, login to view]('//table[@class="table-data"]/tr[8]/td/text()').extract()

This MUST be done using Scrapy, the web crawling framework written in Python.

Your deliverable is an excel spreadsheet with the above information on each company. There are 1,355 Real Estate companies in Calgary, Canada, so I'm expecting that many rows.

For proof that you scraped the appropriate information I require a screenshot of the excel spreadsheet showing the last 20 rows. If the information on those last 20 companies matches what is found on Manta I will pay you the agreed price in exchange for the excel file.

I may have additional work for you if you complete this task successfully.

Completed by:
Skills Required

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online