Project Description:
We need help with the task described below.
The application should be well commented and coded in an intuitive way so that if another developer wanted to add features to it later, it would be easy for them to understand how to do so. We have lots more tasks waiting for developers who can demonstrate their competence, so please take the time to do a thorough job.
If you are able to take this project on, please can you give me a rough estimate of how much you think it will cost to deliver and delivery timescale?
Please let me know if there's anything else you need.
Many thanks,
Ben Horton
Amazon Glacier Test
===================
We are considering using Amazon Glacier for disaster recovery server backups.
We would like to develop the tools that do the backup in Perl v5.8 or v5.10.
Where possible existing modules should be used. We have found a couple of incomplete modules (Net::Amazon::Glacier and WebService::Amazon::Glacier) which may be a good starting point. If these modules are not sufficiently complete to use (or as a better starting point for development) the Glacier API is a RESTful interface and may be relatively easy to use directly.
Test Outcomes
=============
The purpose of this task is to discover how easy and reliable Glacier is to use. We do not care about developing re-usable code at this stage.
We'd like to know:
* Does Glacier work (reliably, as documented)?
* Are there perl modules which we can use as-is (or with minor changes)
to easily work with Glacier?
* If no modules exist - are we able to use the RESTful interface easily
(is the documentation accurate and does the API work as expected)?
* Are there limits we need to consider - for example if restoring an entire
server we may have many archived objects in a vault and Amazon may limit
the number of archives that are available for download at any given time.
Test Scripts
============
All configuration (AWS credentials, the Glacier vault) can be hard-coded at the start of the scripts. The following scripts should be written:
Script 1:
* Given a list of local filepaths on stdin store each file into
the vault and record (to stdout) the ID required to retrieve
the file from the vault
(store the original path with the data somehow)
Script 2:
* Given the list of vault IDs (from script 1) in stdin,
request retrieval from the vault
Script 3:
* Given the list of vault IDs (from the script 1) in stdin,
report if they are available for download yet on stdout
The report should be simple - so it's possible to scan the
output and check if all files are available for download yet
Script 4:
* Given the list of vault IDs (from the script 1) in stdin,
download the files.
They should be stored to a different location to the original
path (ie a restore directory - hard coded in the script) but
preserving the path details. For example:
The original file: /usr/local/data/a/b/c
Should be restored to: /restore/path/usr/local/data/a/b/c
Intended Use
============
1. # Build the list of files to use as a test
find /path/to/test/files \
-type f > list
2. # Store files to vault
cat list | script1 > list.vaultids
3. # Schedule restore of all files
cat list.vaultids | script2
4. # Check if they are available
cat list.vaultids | script3
5. # Restore
cat list.vaultids | script4
6. # Validate
diff -ru /path/to/test/files \
/tmp/restore/path/to/test/files
References
==========
* Amazon Glacier
http://aws.amazon.com/glacier/
* Possible Perl modules
https://metacpan.org/module/Net::Amazon::Glacier
https://metacpan.org/module/WebService::Amazon::Glacier