Closed

Web Scraping Highly Secure Business Directory of Public Information

This project received 18 bids from talented freelancers with an average bid price of $156 CAD.

Get free quotes for a project like this
Employer working
Skills Required
Project Budget
N/A
Total Bids
18
Project Description

I want the following information on Real Estate companies in Calgary, Alberta, Canada (here's an example [url removed, login to view]):

Company Name
Address
Website
Phone Number
Company Description
SIC Code
NAICS Code
Contact Name
Contact Position
Location Type
Revenue
Employees
Years in Business

This information can be selected using the following code:

item['company'] = [url removed, login to view]('//h1[@class="company-name"]/text()').extract()
item['address'] = [url removed, login to view]('//div[@itemprop="streetAddress"]/text()').extract()
item['website'] = [url removed, login to view]('//dl[@class="website_info"]/dd/span/text()').extract()
item['phone'] = [url removed, login to view]('//dd[@class="tel"]/text()').extract()
item['description'] = [url removed, login to view]('//p[@itemprop="description"]/text()').extract()
item['contact_name'] = [url removed, login to view]('//span[@itemprop="name"]/text()').extract()
item['contact_position'] = [url removed, login to view]('//em[@itemprop="jobTitle"]/text()').extract()
item['location_type'] = [url removed, login to view]('//table[@class="table-data"]/tr[1]/td/text()').extract()
item['SIC'] = [url removed, login to view]('//table[@class="table-data"]/tr[4]/td/text()').extract()
item['NAICS'] = [url removed, login to view]('//table[@class="table-data"]/tr[5]/td/text()').extract()
item['revenue'] = [url removed, login to view]('//table[@class="table-data"]/tr[2]/td/text()').extract()
item['employees'] = [url removed, login to view]('//table[@class="table-data"]/tr[3]/td/text()').extract()
item['years_business'] = [url removed, login to view]('//table[@class="table-data"]/tr[8]/td/text()').extract()

This MUST be done using Scrapy, the web crawling framework written in Python.

Your deliverable is an excel spreadsheet with the above information on each company. There are 1,355 Real Estate companies in Calgary, Canada, so I'm expecting that many rows.

For proof that you scraped the appropriate information I require a screenshot of the excel spreadsheet showing the last 20 rows. If the information on those last 20 companies matches what is found on Manta I will pay you the agreed price in exchange for the excel file.
I may have additional work for you if you complete this task successfully.

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online