Make a crawler in Scrapy with the following features:
- Find the imprint or contact page of a website and extract addresses, contact data and company names
• Respect [login to view URL], the robots meta element and popular wording against the processing of contact data.
• The Crawing should be possible in many languages. The Crawler should import the relevant languages and wordings from a CSV File.
• Crawl https and https Sites.
• recognition of VAT IDs
• optional searching of emails on the website with variable search depth
• optional search of emails on the website with variable search depth
• crawl description of website, crawl main keywords
• search for social links on the website and then crawl the contact informations from the social website
- all should be stored in a database, which has to be part of the project and also should be exported as a CSV or Excel List.
Input fort he crawler should be:
• A CSV List with domains
• Keyword like „motorcyles white“ and a selection of one or more countries. Then the crawler should visit all results of a one or more popular search engines. Keyword should be searched only in the URL or full text. Then the crawler should search in the domain.
The sucess rate of getting the imprint should be at around 75%.
For all you have to provide a GUI for the Browser.
Of course clean code and documentation has to be provided.
19 freelancers are bidding on average €592 for this job
Hello! I am a python developer. I looked at your project and it seems interesting. I have all necessary skills required for this project. Ping me to discuss in detail.