Skip to content

Latest commit

 

History

History
42 lines (34 loc) · 6.11 KB

README.md

File metadata and controls

42 lines (34 loc) · 6.11 KB

Data Engineering Roadmap

A study path for Data Engineering. In this document, I will share websites, videos, courses, and newsletters that seem useful for the formation of a data engineer.

The Problem

The problem for those who want to enter the field of data engineering is defining what to study. Specifically, in what order to study. There is an ocean of tools and techniques, and a tsunami of videos about them. Beginners get lost.

The intention of this document is to help reduce confusion. In the future, it can be a map to guide your studies. But remember: this is a suggestion based on my personal studies. Do not limit yourself to this list.

Videos

Courses

It is interesting to analyze the structure of the courses to get an idea of the study order for data engineering.

  • Data Camp: the site is a treasure for all thing data. I've started learning data science with them some years ago! For data engineering they have a sequence of career tracks:
    • Associate Data Engineer in SQL: database design and data warehousing, PostgreSQL, Snowflake (30 hours).
    • Data Engineer in Python: Python, Git, Software Engineering, ETL, ELT and Airflow (57 hours).
    • Professional Data Engineer in Python: advanced data eng skills like NoSQL, PySpark, Docker and Streaming.
    • What i like most about Datacamp courses is all of them give you a certificate you can show on your Linkedin.
    • Besides that, in some courser you can win a certification after completing a test (like the one within the Associate Data Engineer in SQL course here).
    • And last but not least, Datacamp is an excellent way to step in a new area of knowledge in the data field. Do not expect deep lessons, but they are wide and organized, giving you a way to learn more deeply next in the journey.
    • And all of this is not very expensive: today (20/07/2024) it costs to me only R$ 34 (or $7 dollars) billed annualy. Real cheap.
  • Data Engineering Zoomcamp: covers the fundamentals of data engineering. A free 9-week course. In the first 6 weeks, you learn the fundamentals, and in the last 3 weeks, you develop a pipeline from scratch. And there are homework at the end of each week! You can check previous classes here and the detailed course syllabus here.
  • Data Engineer Camp: i found the curriculum very complete and dense, but the price tag is very expensive for non-US residents ($2900). Use the curriculum as a study guide, at least.
  • The Data Engineering Academy: from the fundamentals (computer science) to 10 hands-on projects on major cloud platforms, i liked this because its a one stop shop to individuals wanting to learn data engineering (and it gives you a certificate for Linkedin!). Right now it costs $220 for 1 year access or $399 for lifetime access. I plan to take this course.
  • Bootcamp Engenharia de Dados - Construa um Pipeline - 2024: a guided practical project. I bought it right now.
  • Do Zero a Engenheiro de Dados Azure: It's on my list. I haven't taken it yet, but it's highly rated. Just be aware it focuses on Azure.

Newsletters

  • Start Data Engineering: especially aimed at beginners, it teaches basic concepts and includes small projects, all through email. But beginners may find it hard to follow.

Books

Misc

  • How To Become a Data Engineer: a collection of tools, content and other resources for data engineers.
  • Is getting X certification going to help me get hired: short answer: No (in the US, at least). But it do not hurt to have one.
  • How can I transition into Data Engineering: useful posts from Data Engineering reddit.
  • Luciano Vasconcelo's Roadmap: he offers a live course called "Jornada de Dados," focused on data engineering. This repository has the course summary, which can help you organize your studies.
  • The Data Engineering Cookbook: in the author's own words: This book is intended to be a starting point for you. It is not a training! I want to help you to identify the topics to look into to become an awesome data engineer in the process. It hinges on my Data Science Platform Blueprint. Check it out below. Once you understand it, you can find in the book tools that fit into each key area of a Data Science platform (Connect, Buffer, Processing Framework, Store, Visualize). Select a few tools you are interested in, then research and work with them. To me is the most awesome roadmap right there (even if you dont purchase his paid material).