1) Peform ETL S3 ( HDFS file) as source and Redshift as target
Technical Environment : EC2,EMR, Pyspark & Hive
2) Peform ETL across S3, AWS noSQL database using Glue
3) One sample Python AWS Lambda function to run AWS Redshift SQL scripts
4) One sample on realtime streaming ( preferably Kafka( producer) + Pyspark( Consumer)
5) Executing bash commands ( jobs ) via Airflow ( or AWS equivalent)
Need walkthrough of above at urgent basis.
Even if you can solve some points , that is fine.