Book Content
chapters • 10h44m total length
1. Distributed Computing Primer
2. Data Ingestion
3. Data Cleansing and Integration
4. Real-time Data Analytics
5. Scalable Machine Learning with PySpark
6. Feature Engineering – Extraction, Transformation, and Selection
7. Supervised Machine Learning
8. Unsupervised Machine Learning
9. Machine Learning Life Cycle Management
10. Scaling Out Single-Node Machine Learning Using PySpark
11. Data Visualization with PySpark
12. Spark SQL Primer
13. Integrating External Tools with Spark SQL
14. The Data Lakehouse














