We need to create a dockerizable ETL Pipeline for our data using Python and Dagster. Our requirement / data flowchart is very clear.
The basic flow is read data from CSV, save to Kafka. Apply multiple transforms using functions (function code will be provided by us). Use intermediate kafka queues after each transform. Write final output to Postgres DB. Use Dagster for the data orchestration.
We just need to implement it in a robust manner. We need :-
1. Docker-compose to setup the Kafka, Postgres and other services.
2. Create a DAG for orchestration of the pipeline using Dagster (using placeholders for the transform functions which our team will add later).