Simplify Big Data Analytics with Amazon EMR
This book will take you through the Amazon EMR architecture, features, and common use cases or problem statements it solves. You’ll discover how to configure it in production with scaling, monitoring, and security best practices, while also understanding different implementations of batch, real-time streaming, and interactive analytics workloads.
Offered by
Difficulty Level
Intermediate
Completion Time
14h20m
Language
English
About Book
Who Is This Book For?
This book is for data engineers, data analysts, data scientists, and solution architects who are interested in building data analytics solutions with the Hadoop ecosystem services and Amazon EMR. Prior experience in either Python programming, Scala, or the Java programming language and a basic understanding of Hadoop and AWS will help you make the most out of this book.
Simplify Big Data Analytics with Amazon EMR
- About Book
- Who Is This Book For?
- Book Content
Book content
chapters • 14h20m total length
An Overview of Amazon EMR
Exploring the Architecture and Deployment Options
Common Use Cases and Architecture Patterns
Big Data Applications and Notebooks Available in Amazon EMR
Setting Up and Configuring EMR Clusters
Monitoring, Scaling, and High Availability
Understanding Security in Amazon EMR
Understanding Data Governance in Amazon EMR
Implementing Batch ETL Pipeline with Amazon EMR and Apache Spark
Implementing Real-Time Streaming with Amazon EMR and Spark Streaming
Implementing UPSERT on S3 Data Lake with Apache Spark and Apache Hudi
Orchestrating Amazon EMR Jobs with AWS Step Functions and Apache Airflow/MWAA
Migrating On-Premises Hadoop Workloads to Amazon EMR
Best Practices and Cost Optimization Techniques
Related Resources
Access Ready-to-Use Books for Free!
Get instant access to a library of pre-built books—free trial, no credit card required. Start training your team in minutes!