Book

Data Engineering with Apache Spark, Delta Lake, and Lakehouse

Discover the roadblocks you may face in data engineering and keep up with the latest trends such as Delta Lake. This book will help you learn how to build data pipelines that can auto-adjust to changes. Using practical examples, you will implement a solid data engineering platform that will streamline data science, ML, and AI tasks.

Offered byPackt Logo

Difficulty Level

Intermediate

Completion Time

16h

Language

English

About Book

Who Is This Book For?

This book is for aspiring data engineers and data analysts who are new to the world of data engineering and are looking for a practical guide to building scalable data platforms. If you already work with PySpark and want to use Delta Lake for data engineering, you'll find this book useful. Basic knowledge of Python, Spark, and SQL is expected.

Book content

chapters 16h total length

The Story of Data Engineering and Analytics

Discovering Storage and Compute Data Lake Architectures

Data Engineering on Microsoft Azure

Understanding Data Pipelines

Data Collection Stage - The Bronze Layer

Understanding Delta Lake

Data Curation Stage - The Silver Layer

Data Aggregation Stage - The Gold Layer

Deploying and Monitoring Pipelines in Production

Solving Data Engineering Challenges

Infrastructure Provisioning

Continuous Integration and Deployment (CI/CD) of Data Pipelines

Related Resources

Access Ready-to-Use Books for Free!

Get instant access to a library of pre-built books—free trial, no credit card required. Start training your team in minutes!

No credit card required