Home

Awesome

Data Engineering Zoomcamp

<p align="center"> <a href="https://airtable.com/shr6oVXeQvSI5HuWD"><img src="https://user-images.githubusercontent.com/875246/185755203-17945fd1-6b64-46f2-8377-1011dcb1a444.png" height="50" /></a> </p>

Syllabus

Taking the course

2025 Cohort

Self-paced mode

All the materials of the course are freely available, so that you can take the course at your own pace

Syllabus

We encourage Learning in Public

Note: NYC TLC changed the format of the data we use to parquet. In the course we still use the CSV files accessible here.

Module 1: Containerization and Infrastructure as Code

More details

Module 2: Workflow Orchestration

More details

Workshop 1: Data Ingestion

More details

Module 3: Data Warehouse

More details

Module 4: Analytics engineering

More details

Module 5: Batch processing

More details

Module 6: Streaming

More details

Project

Putting everything we learned to practice

More details

Overview

<img src="images/architecture/arch_v4_workshops.jpg" />

Prerequisites

To get the most out of this course, you should feel comfortable with coding and command line and know the basics of SQL. Prior experience with Python will be helpful, but you can pick Python relatively fast if you have experience with other programming languages.

Prior experience with data engineering is not required.

Instructors

Past instructors:

Asking for help in Slack

The best way to get support is to use DataTalks.Club's Slack. Join the #course-data-engineering channel.

To make discussions in Slack more organized:

Supporters and partners

Thanks to the course sponsors for making it possible to run this course

<p align="center"> <a href="https://kestra.io/"> <img height="120" src="images/kestra.svg"> </a> </p> <p align="center"> <a href="https://dlthub.com/"> <img height="90" src="images/dlthub.png"> </a> </p>

Do you want to support our course and our community? Please reach out to alexey@datatalks.club