Book Content
chapters • 14h20m total length
1. An Overview of Amazon EMR
2. Exploring the Architecture and Deployment Options
3. Common Use Cases and Architecture Patterns
4. Big Data Applications and Notebooks Available in Amazon EMR
5. Setting Up and Configuring EMR Clusters
6. Monitoring, Scaling, and High Availability
7. Understanding Security in Amazon EMR
8. Understanding Data Governance in Amazon EMR
9. Implementing Batch ETL Pipeline with Amazon EMR and Apache Spark
10. Implementing Real-Time Streaming with Amazon EMR and Spark Streaming
11. Implementing UPSERT on S3 Data Lake with Apache Spark and Apache Hudi
12. Orchestrating Amazon EMR Jobs with AWS Step Functions and Apache Airflow/MWAA
13. Migrating On-Premises Hadoop Workloads to Amazon EMR
14. Best Practices and Cost Optimization Techniques














