You have chosen to sponsor your bid up to a maximum amount of .
An automated webpage scanning application needs to gather information from a specific list of websites (about 10k) and store it in a Java object for further processing. The aim of this project is to give each programmer a group of 100 sites to implement the scanners for such sites, according to the HTML structure.
The required information is usually organized a highly structured manner, so that the operation of gathering such information can be easily implemented as an iteration on each entry.
The programmer is given a class library which the implemented scanners must comply to. Moreover, the provided library already contains an high-level API that abstracts and automates the scanning process. If the site is well-structured, the implementor simply needs to specify in a jQuery-like fashion where the required information is located. Ciononostante al programmatore è consentito di correggerli per ottenere un pagamento integrale.
The application is written in Java 7, so JDK 7 is required to compile the scanners.
The application depends on two other libraries: jsoup 1.7.2 (to parse the HTML pages) and Apache Commons Lang 3.0.1 (general purpose). In most cases, the implementor will not need to use directly either of them.
To ease the structural detection of the HTML pages before implementing the related scanner, the use of Firefox with the Firebug plugin is highly recommended.
A medium or good knowledge of Java is required, in order to produce good-quality code.
Since jQuery-like selectors are used to navigate through the HTML of the scanned page, the programmer must know how to write them; anyway, the jsoup library API docs contains a list of the supported selectors.
The websites to be scanned are in Italian, but no particular knowledge of this language is needed.
The programmer will be provided the API docs for the application library and some example scanners.
For “site” we mean a domain (i.e. www.freelancer.com); if a site contains more than a webpage with the information we look for (i.e. http://www.freelancer.com/sellers/, http://www.freelancer.com/jobs/), we call these “sub-sites”; if a sub-site with a long list of entries is divided into several numbered pages, such pages belong to the same sub-site and all of them must be scanned.
Hence, the programmer has to implement a scanner for a given list of sub-sites (coming from 100 sites, as said above), keeping in mind that the HTML structure of sub-sites within the same site is often the same.
As already said, the produced code must comply with the application library we give. The programmer must provide the source code of the implemented scanners and, optionally, the compiled class files.
We reserve the right to verify that the produced scanners actually work, and to pay only the amount equivalent to the working ones. However the programmer is allowed to rectify them to obtain a full payment.