We need an expert big data freelancer that could implement a data lake in AWS. The data lake components must include loading, transformation and cleaning capabilities.
The following task have to be completed:
-Configure a SQL data origin as an input of the data lake
-Configure an API data origin as an input of the data lake
-Configure flat file data origin as an input of the data lake
-Define a loading daily job from the previous origins defined
-Develop and define a data transformation job
-Develop and define a data cleaning job
-Develop and define a data destination as an output for the jobs
We also need that the freelancer gives us advice about which services are relevant for this kind of implementation (AWS EMR, AWS Glue, other services).
For this kind of advisoring we need to have:
-Architecture of the AWS components and why every service was selected
-Estimated monthly budget for this implementation for a quantity of data
-An explanation of the behaviour of the componentes and the scripts
Hands-on experience implementing data lakes in production environments is a must. Feel free to ask for more details.