Need some work done? Post a Project Today
We have a web based reporting system that we'd like to scrape. The system is built and hosted by another company, so we'd like to build a Django application that can do this.
So far I have built a quick proof of concept PHP script using the PHP cURL bindings to login and run these reports and return a print out of the results. So I have already uncovered some of the non standard things this web app does (like actually returning a response after returning a 301 header) and proven that we can automate pulling reports. I have also built some Django models for this project to get signoff from my stakeholders and also to communicate to the person doing the work what we want at the end. My proof of conecpt is just that. I did it to make sure that we could do what we needed to do.
So what you will need to do.
1. Use cURL for Python to do what I have done in PHP and build an object structure in Python to do what I have done structurally.
2. Take the response from the current web reporting system which is basically a HTML table and store the results in a MySQL database. I have been advised that BeautifulSoup is probably a good tool for this. See http://www.crummy.com/software/BeautifulSoup/. The only change that I'd make to how I currently do things is to chunk the reporting time frame down to a month at a time. So for example if we run a report that spans 2 years we actually run 24 reports against the reporting server and log each piece of data to the database and then return the whole 24 month period to the user.
3. Once the report is run it needs to be emailed to the report owner email address with an excel attachment.
4. Make these reports able to be executed as Django Admin tasks
5. Make a django admin command to import all customers into the system from the reporting system. There are probably < 1000 customers.
Notes this application must run under:
Apache 2x running mod_wsgi
Build using buildout (sample django app already does this)
I have included the sample models.py that I have built. It is pretty much complete. I'm happy to not use the generic relations if they prove difficult to implement.
Brief description of models.
reportingUserAccount: These are the user accounts that we use to connect to our reporting server.
These two tables are not complete but are just the fields that we pass up to the reporting server to get it to return data to us. I have a list of these fields I just have not got to making a DJango model for them yet. There are currently two types of reports that are run product and sales lists.
These tables specify the report jobs, where the data is sent to and which user we run the reports as. Reflecting on this it might be better to be modelled in one model.
customer: This model stores the name and customer id of the customers in the remote reporting system.
reportingRun: Every time a report is run there is an entry in this table.
These tables store the data returned from the remote reporting system. Because the data in the remote system changes over time we may store the same data across multiple runs. This is not going to be an issue.