Book

Hands-On Big Data Analytics with PySpark

In this book, you'll learn to implement some practical and proven techniques to improve aspects of programming and administration in Apache Spark. Techniques are demonstrated using practical examples and best practices. You will also learn how to use Spark and its Python API to create performant analytics with large-scale data.

Offered byPackt Logo

Difficulty Level
Intermediate
Completion Time
6h4m approx.
Language
English
Certification
Not available

About Course

Book Content

chapters 6h4m total length

1. Installing Pyspark and Setting up Your Development Environment
2. Getting Your Big Data into the Spark Environment Using RDDs
3. Big Data Cleaning and Wrangling with Spark Notebooks
4. Aggregating and Summarizing Data into Useful Reports
5. Powerful Exploratory Data Analysis with MLlib
6. Putting Structure on Your Big Data with SparkSQL
7. Transformations and Actions
8. Immutable Design
9. Avoiding Shuffle and Reducing Operational Expenses
10. Saving Data in the Correct Format
11. Working with the Spark Key/Value API
12. Testing Apache Spark Jobs
13. Leveraging the Spark GraphX API

On this page

Ready to Train Your Team?

Need training for your whole team? Get bulk pricing, LMS integration, and dedicated support.

Trusted by Leading Organizations Worldwide

Join thousands of companies that trust Calibr to power their learning and development initiatives.

Chalet Hotels logo
Pernod Ricard logo
ProMobi logo
Metrique logo
K Raheja Corp logo
Spyne.AI logo
VuNet Systems logo
Procurement Partners logo
vEngage.AI logo
1218 Global logo
TRADEJINI logo
Oben Electric logo
IIT STartups logo
EdTech Digit logo
MindSkillz logo
NewportMed logo

Request Access For Your Organization

Start training your team in minutes!

No credit card required

Related Resources