My goal is to develop and maintain a database of energy data from the U.S. Energy Information Administration (EIA). The EIA has an API ([url removed, login to view]) allowing developers to easily access their data.
Here's a specific example I have in mind. The Python script would query EIA's website and get data for the following series:
*Monthly net generation for Colorado (ELEC.GEN.ALL-CO-99.M)
*Monthly net generation from coal for Colorado (ELEC.GEN.COW-CO-99.M)
*Monthly net generation from natural gas for Colorado (ELEC.GEN.NG-CO-99.M)
*Monthly net generation from hydroelectric for Colorado (ELEC.GEN.HYC-CO-99.M)
*Monthly net generation for all other renewables for Colorado (ELEC.GEN.AOR-CO-99.M)
The script would compare the data from EIA with a local PostgreSQL database and update the local database if necessary. This could be done by directly comparing the data, or by checking the "last updated" time stamp EIA has for each data series. (Using the SQLAlchemy library to interface with Postgres would be preferred.)
After updating the local database, a dialog would pop up allowing the user to select which data series to chart and allowing the user to choose the time window for the data (e.g., Jan 2005 to Mar 2011).
Based on the simple user input, the script would query the data from the PostgreSQL database, interface with R, and produce the charts using ggplot2. (Likely this will involve the use of the "pandas" python library and a library like "PypeR" or "rpy2".) In the example above, I would like to see time series charts for each of the series and then a stacked area chart for coal, natural gas, hydro, and all other renewables.
I would appreciate clearly-organized, well-documented code as I will have to maintain and modify the code to fit specific needs. Thanks!
I'm able to code this thing for you. The API seems strightforward. I have experience with graphing in R. I'm new here, so I'm offering a discount. I'm looking forward to working with you.