Web scraper crawler software

Closed

+Start crawling from a list of the URLs specified by user;

+Supports wide range of character sets support with automated character set and language [url removed, login to view] character sets [url removed, login to view] phrase segmenting (tokenizing) for Chinese, Japanese, Korean and [url removed, login to view] SGML entities like 'à' and ISO-Latin-1 characters can be indexed and [url removed, login to view] problem to crawl any unicode character encoding (china symbol letter, japan, korea letter,arabic, hebrew, turkish, thailand, greek, baltic, cyrillic, utf-8 windows-12xx)

+Spider picture and video source code and extract right mysql file(create tables)

+Checks website source code and returns:Site Title,Site Meta Description,Site Keywords,Site page size,Search term site url and much more

+Reasonable duplicate domain and duplicate content detection to avoid re-crawling of identical sites on different domains. ([url removed, login to view] vs [url removed, login to view], and a million other sites that use multiple domains for the same content.)

+Understanding GET parameters, and what's a "search result" across many site-specific search engines. For example, some page may link to a search result page on another site's internal search with some GET parameters. Don't want to crawl these result pages.

+Block the unwanted [url removed, login to view] and cookies manage for anonymous access and cache crawled [url removed, login to view] caching gives significant time reduction in search [url removed, login to view] cleaning algorithm

+Detect broken links;(should automatically ignore broken links).Duplicate data detection and removal. Duplicate detection to stop web scraping when old data is reached.

+Crawling rules and multithreaded downloading (up to 50 threads).Can perform parallel and multi-threaded indexing for faster updating.

+Apply Regular Expressions (RegEx) on Text or HTML source of web pages and scrape the matching portion. Extract using XPath

+Update every N min - to specify how often the program will scrape the target website

+export (100;1000;10000;100000.......) results per file

+Crawled informations export to sql and mysql file(automatic mysql create table,insert into,values title,meta,keywords,page size,search term site url etc... and much more functionality in sql )

Skills: C Programming, C# Programming, C++ Programming, MySQL

See more: xpath and or, wide 6 search, what's algorithm, what is parallel programming, what is a programming algorithm, what is algorithm in programming, what is a algorithm in programming, web search engines list, web programming or software programming, using regular expressions, using regex in c, use regular expressions, t sql programming, text search algorithm, text matching algorithm, search web for picture, search algorithm example, scraping web content, regular expressions list, regular expressions in c, regular expressions example, regular expressions c, regex in c, regex example, regex c

Project ID: #5482264

8 freelancers are bidding on average $193 for this job

SigmaVisual

Dear Client, I can help in your project. We have already experience of working on similar projects. Please see below to get idea of our experience: Amazon/Ebay Bots: http://sigma-dns.sigmavirtual.com/PDemo1/Am More

$144 USD in 3 days
(78 Reviews)
6.9
trustus

Hello, We have .NET professionals with experience and expert in below: MVC Version : 2,3,4. Visual studio : 2010, 2012, 2013 Expertise of MVC : • Used along with C#, vb.net languages. • Razor/aspx view engine More

$250 USD in 15 days
(32 Reviews)
6.6
akhila27

We have created this website crawler! Its very similar and it only needs few modifications. Please contact us for a live demo. Waiting to hear from you. Regards, Akhila (SolutionInfinity.net).

$257 USD in 7 days
(21 Reviews)
6.5
vietnamboy

Hi, This is my interesting field - web crawler. My thesis include a feature of crawling and extract data using xpath. I can show you it if you interest. Best regards, An

$250 USD in 3 days
(2 Reviews)
2.9
mmadi

Hi, I'll be happy to do that for you. I have rich experince in scrapping using curl regular expressions Dom and Selenium RC. I worked for travelfox.com and planeandtrain.com search engine where I gain my experience More

$206 USD in 3 days
(4 Reviews)
2.6
super2lao

A proposal has not yet been provided

$237 USD in 3 days
(4 Reviews)
2.5
Gogamers

Hi there! I'm experienced programmer in C#, java, python and databases (mssql, mysql) and I'm currently working on ERP systems which consists of web scraping and then inserting data into database. I have a lot of e More

$144 USD in 5 days
(1 Review)
2.5
zotiger

Hi, Dear sir! I am very interested in your project. Also, I have extensive experience in WEB scrapping. I had been make the program that scrap data from a Site to CSV file. If you want I can show it to you. I thin More

$200 USD in 5 days
(2 Reviews)
1.7
threadnix

A proposal has not yet been provided

$111 USD in 10 days
(0 Reviews)
0.0