Book Content
chapters • 6h4m total length
1. Installing Pyspark and Setting up Your Development Environment
2. Getting Your Big Data into the Spark Environment Using RDDs
3. Big Data Cleaning and Wrangling with Spark Notebooks
4. Aggregating and Summarizing Data into Useful Reports
5. Powerful Exploratory Data Analysis with MLlib
6. Putting Structure on Your Big Data with SparkSQL
7. Transformations and Actions
8. Immutable Design
9. Avoiding Shuffle and Reducing Operational Expenses
10. Saving Data in the Correct Format
11. Working with the Spark Key/Value API
12. Testing Apache Spark Jobs
13. Leveraging the Spark GraphX API














