Need a desktop application which can harvest URLs from Google search engine using the API. The following requirements must be met:
- Must be UNICODE capable, i.e. support for all UNICODE language characters such as Chinese, Japanese, Thai etc.
- Must be multi-threaded (up to 30 connections)
- Must have proxy support, and must support multiple AUTOMATICALLY rotating proxies, e.g. change proxy automatically for every X amount of URLs harvested.
- Must be able to remove duplicates on BOTH the domain level, and on the URL level
- Must be fast and must be able to harvest QUICKLY and for EXTENDED PERIODS of time
- Must be able to retrieve maximum 1000 URLs per keyword as specified by Google API
- Must be able to save URLs automatically in a text file as time progresses. Once a maximum of 1 million URLs are saved, a new text file will be created, etc.
- Must be able to adjust connection timeout settings
- Must be able to handle large lists of URLS, up to 1 million URLs, and large lists of keywords, up to 1 million lines
- Must show progress at the bottom of the screen, either as a progress bar or number of keywords, e.g. if there are 1000 keywords, progress would be 3/1000 when harvesting the third keyword
- Must have two boxes to enter keywords: 1 box for the custom footprint e.g. "Powered by Wordpress", and another box for the keywords, e.g. "dogs, cats, bears, computers" etc. etc.
- Must be willing to correct problems if software does not work or has problems in the future. However, additional payment would be made if there are any changes in Google search engine API which would require any major changes to the software.