Below find the information on contents of this repository
This project contains all notebooks, code and analysis I've done during the Azure Databricks & Spark For Data Engineers Course on Udemy. To everyone reading this, I highly recommend taking this course, because it provides an extensive view on Azure, Databricks, PySpark, SQL and Data Analysis in general. It's also great to use as a refresher before Azure or Databricks Certifications. Moreover, the project revolves around Formula 1, which is one of my favourite sports (Go Max!).
- Azure Databricks Clusters, Cluster Pools and Policies
- Storage Accounts and Authentication
- Azure Data Lake Storage - Access Patterns, Secrets, Mounting to Databricks
- PySpark - Spark Architecture, Data Ingestion, Transformations (Filters, Joins), Aggregations (GroupBy, Window Functions)
- Spark SQL - Temporary Views, Databases, Tables, SQL flavour of all PySpark features
- Incremental Loads
- Delta Lake - Architecture, History, Time Travel, Vacuum, Merge/Upsert
- Azure Data Factory - Debugging Pipelines, Linked Services, Triggers
All notebooks are Azure Databricks Notebooks presented as Python files. Consequently, they contain syntactic parts, such as "COMMAND" or "MAGIC" comments.
Distributed under the MIT License. See LICENSE
for more information.