Need some work done? Post a Project Today
URGENT - Fix and optimization of Ruby based web application
Developer needed to fix bug in Ruby/Ruby on Rails web application and optimize application speed.
The application has already been developed, but is not usable as a bug in it makes it impossible to generate .csv files. (see below)
The application is an "intelligent" automated web scraping application which can identify from large lists of website URLs (over 500K), business websites from non-business websites, by checking all intranet links against a set of keywords that have been input before hand.
The application script works the following way:
1. It verifies whether each URL corresponds to an active website.
2. It browses the website homepages (only the homepage) and identifies "intra site" links (internal links)
3. It determines whether the text in any of the homepage internal links includes a particular keyword. (from a pre-determined set of keywords - such as "about us", "services", "company", "clients"...)
for example: www. website .com/services.html - this link will give a "positive" result since the word
"services" appears in the link. (the word "services" would have been pre-determined by the user)
The application is multi-threaded to optimise the processing speed.
A web interface allows the user to
- upload a list of URLs to scrape (up to 500K Urls per list)
- add keyword/remove keyword
- start the "mining" process
- Display of the real-time count of URLs processed
- Download as .CSV the URL list of active websites, positive-identified websites and negative ones
This last feature has been developed, but it is not fully working. This makes the app not usable. This is what needs to be fixed.
PLEASE ONLY BID IF YOU ARE THE DEVELOPER. (NO AGENCIES PLEASE)