I need a crawler/scrapper to scrap information from all the available channels for travel information.
1. The crawler to be able to take inputs
-- Keyword list for travel specific words
-- Location specific list For eg . Maharastra, India or Pune, Maharashtra , India. This can be multiple
-- Specific website wildcard list if any else search the whole web
2. The output has to be
-- My SQL table
-- The URL of the crawled website
-- Count of the Location specific keyword found in the URL content
-- Count of the travel specific keyword found in the URL content
-- Author name and contact, email id of the article if available / Social media contact if available
-- Heading of the article
-- Crawled/ Scraped status
-- Text file
-- Content of the article with all the details
-- Text file name to be tagged with the URL ID generated in the My SQL table for that record
3. The crawler to crawl till the end of the URL tree and to be searched in the content
4. The crawled URLs to be excluded when crawler is started again.
5. A simple interface to start the crawler once the inputs are uploaded in the server . This can be done manually
6. The program to be run in a hosted server.
I will not be available for discussions during the bidding. Any updates will be posted on the Message board. And the freelancer will not be selected if the criteria is not met.