This is a two part project to be written using mainly python (though C may also be used for calculations / libraries that cannot be done / do not exist in Python.)
1. Web scraper.
I need a web scraper that will regularly scrape data from a series of web pages. This part of the program should be able to take a list of proxies to be used if there is an error and access is denied to a website.
2. Data analysis.
Statistical analysis will be performed on the data gained from the web scraper. This will involve a regression analysis to predict sales using more than 20 independent variables.
The program overall should be able to run in the background on my Windows 7 machine. The refresh rate for the raw data gained from the scrape should be variable. When opened, I will like a very simple presentation of the statistical analysis. This GUI does not need to look pretty at all. It just needs to be easily read and functional (i.e. no colours, large enough text.)
Finally, the winning bidder will need to sign one of these electronic non-disclosure agreements (all costs covered):
[url removed, login to view]
Thanks for taking the time to consider this project. I look forward to further discussion with the successful bidder.
The following is a very simple example of the sort of statistical analysis I refer to in this project's description:
Using an SQLite database, the program will pull numerical information from various pages. For example, an item with comments on it:
The number of comments and number of sales Item X has had would then be scrapped from this site and saved under the particular item's ID number. The database logic for this would look like this:
When enough data has been gathered, there will be a facility within the program that will allow me to perform a statistical analysis to see how well NumberOfComments predicts or correlates with NumberOfSales and how significant this prediction is (p-value.)
The winning bidder will be given a highly detailed directive that will be extremely straight forward. I am also available nearly all the time to answer any queries at all through skype or IM.