Data Engineering with Apache Spark, Delta Lake, and Lakehouse
Discover the roadblocks you may face in data engineering and keep up with the latest trends such as Delta Lake. This book will help you learn how to build data pipelines that can auto-adjust to changes. Using practical examples, you will implement a solid data engineering platform that will streamline data science, ML, and AI tasks.
Offered by
Difficulty Level
Intermediate
Completion Time
16h
Language
English
About Book
Who Is This Book For?
This book is for aspiring data engineers and data analysts who are new to the world of data engineering and are looking for a practical guide to building scalable data platforms. If you already work with PySpark and want to use Delta Lake for data engineering, you'll find this book useful. Basic knowledge of Python, Spark, and SQL is expected.
Data Engineering with Apache Spark, Delta Lake, and Lakehouse
- About Book
- Who Is This Book For?
- Book Content
Book content
chapters • 16h total length
The Story of Data Engineering and Analytics
Discovering Storage and Compute Data Lake Architectures
Data Engineering on Microsoft Azure
Understanding Data Pipelines
Data Collection Stage - The Bronze Layer
Understanding Delta Lake
Data Curation Stage - The Silver Layer
Data Aggregation Stage - The Gold Layer
Deploying and Monitoring Pipelines in Production
Solving Data Engineering Challenges
Infrastructure Provisioning
Continuous Integration and Deployment (CI/CD) of Data Pipelines
Related Resources
Access Ready-to-Use Books for Free!
Get instant access to a library of pre-built books—free trial, no credit card required. Start training your team in minutes!