+Start crawling from a list of the URLs specified by user;
+Supports wide range of character sets support with automated character set and language [url removed, login to view] character sets [url removed, login to view] phrase segmenting (tokenizing) for Chinese, Japanese, Korean and [url removed, login to view] SGML entities like 'à' and ISO-Latin-1 characters can be indexed and [url removed, login to view] problem to crawl any unicode character encoding (china symbol letter, japan, korea letter,arabic, hebrew, turkish, thailand, greek, baltic, cyrillic, utf-8 windows-12xx)
+Spider picture and video source code and extract right mysql file(create tables)
+Checks website source code and returns:Site Title,Site Meta Description,Site Keywords,Site page size,Search term site url and much more
+Reasonable duplicate domain and duplicate content detection to avoid re-crawling of identical sites on different domains. ([url removed, login to view] vs [url removed, login to view], and a million other sites that use multiple domains for the same content.)
+Understanding GET parameters, and what's a "search result" across many site-specific search engines. For example, some page may link to a search result page on another site's internal search with some GET parameters. Don't want to crawl these result pages.
+Block the unwanted [url removed, login to view] and cookies manage for anonymous access and cache crawled [url removed, login to view] caching gives significant time reduction in search [url removed, login to view] cleaning algorithm
+Detect broken links;(should automatically ignore broken links).Duplicate data detection and removal. Duplicate detection to stop web scraping when old data is reached.
+Crawling rules and multithreaded downloading (up to 50 threads).Can perform parallel and multi-threaded indexing for faster updating.
+Apply Regular Expressions (RegEx) on Text or HTML source of web pages and scrape the matching portion. Extract using XPath
+Update every N min - to specify how often the program will scrape the target website
+export (100;1000;10000;100000.......) results per file
+Crawled informations export to sql and mysql file(automatic mysql create table,insert into,values title,meta,keywords,page size,search term site url etc... and much more functionality in sql )
8 freelancers are bidding on average $193 for this job
We have created this website crawler! Its very similar and it only needs few modifications. Please contact us for a live demo. Waiting to hear from you. Regards, Akhila (SolutionInfinity.net).
Hi, This is my interesting field - web crawler. My thesis include a feature of crawling and extract data using xpath. I can show you it if you interest. Best regards, An