Book

Simplify Big Data Analytics with Amazon EMR

This book will take you through the Amazon EMR architecture, features, and common use cases or problem statements it solves. You’ll discover how to configure it in production with scaling, monitoring, and security best practices, while also understanding different implementations of batch, real-time streaming, and interactive analytics workloads.

Offered byPackt Logo

Difficulty Level

Intermediate

Completion Time

14h20m

Language

English

About Book

Who Is This Book For?

This book is for data engineers, data analysts, data scientists, and solution architects who are interested in building data analytics solutions with the Hadoop ecosystem services and Amazon EMR. Prior experience in either Python programming, Scala, or the Java programming language and a basic understanding of Hadoop and AWS will help you make the most out of this book.

Book content

chapters 14h20m total length

An Overview of Amazon EMR

Exploring the Architecture and Deployment Options

Common Use Cases and Architecture Patterns

Big Data Applications and Notebooks Available in Amazon EMR

Setting Up and Configuring EMR Clusters

Monitoring, Scaling, and High Availability

Understanding Security in Amazon EMR

Understanding Data Governance in Amazon EMR

Implementing Batch ETL Pipeline with Amazon EMR and Apache Spark

Implementing Real-Time Streaming with Amazon EMR and Spark Streaming

Implementing UPSERT on S3 Data Lake with Apache Spark and Apache Hudi

Orchestrating Amazon EMR Jobs with AWS Step Functions and Apache Airflow/MWAA

Migrating On-Premises Hadoop Workloads to Amazon EMR

Best Practices and Cost Optimization Techniques

Related Resources

Access Ready-to-Use Books for Free!

Get instant access to a library of pre-built books—free trial, no credit card required. Start training your team in minutes!

No credit card required