Experienced developer needed to fix bugs in Ruby/Ruby on Rails web application and "fine tune" the script if needed.
The application has already been developed, but is not usable as bugs makes it impossible to generate exportable .csv files. (see below)
The application is an automated web scraping application which can identify from large lists of website URLs (over 500K), business websites from non-business websites, by checking all intranet links against a set of keywords that have been input before hand.
The application script works the following way:
1. It verifies whether each URL corresponds to an active website.
2. It browses the website homepages (only the homepage) and identifies "intra site" links (internal links)
3. It determines whether the text in any of the homepage internal links includes a particular keyword. (from a pre-determined set of keywords - such as "about us", "services", "company", "clients"...)
for example: www. website .com/[url removed, login to view] - this link will give a "positive" result since the word
"services" appears in the link. (the word "services" would have been pre-determined by the user)
The application is multi-threaded and runs on a high powered dedicated server for maximum processing speed
A web interface allows the user to
- upload a list of URLs to scrape (up to 500K URLs per list)
- add keyword/remove keyword
- start the "mining" process
- Display of the real-time count of URLs processed
- Download as .CSV the URL list of active websites, positive-identified websites and negative ones
This last feature has been developed, but it is not fully working. This makes the app not usable. This is what needs to be fixed.
To resolve the bugs, there may need to be a "fine-tuning" of the script
PLEASE ONLY BID IF YOU ARE THE DEVELOPER. (NO AGENCIES PLEASE)