Book

Hands-On Big Data Analytics with PySpark

In this book, you'll learn to implement some practical and proven techniques to improve aspects of programming and administration in Apache Spark. Techniques are demonstrated using practical examples and best practices. You will also learn how to use Spark and its Python API to create performant analytics with large-scale data.

Offered byPackt Logo

Difficulty Level

Intermediate

Completion Time

6h4m

Language

English

About Book

Who Is This Book For?

This book is for developers, data scientists, business analysts, or anyone who needs to reliably analyze large amounts of large-scale, real-world data. Whether you're tasked with creating your company's business intelligence function or creating great data platforms for your machine learning models, or are looking to use code to magnify the impact of your business, this book is for you.

Book content

chapters 6h4m total length

Installing Pyspark and Setting up Your Development Environment

Getting Your Big Data into the Spark Environment Using RDDs

Big Data Cleaning and Wrangling with Spark Notebooks

Aggregating and Summarizing Data into Useful Reports

Powerful Exploratory Data Analysis with MLlib

Putting Structure on Your Big Data with SparkSQL

Transformations and Actions

Immutable Design

Avoiding Shuffle and Reducing Operational Expenses

Saving Data in the Correct Format

Working with the Spark Key/Value API

Testing Apache Spark Jobs

Leveraging the Spark GraphX API

Related Resources

Access Ready-to-Use Books for Free!

Get instant access to a library of pre-built books—free trial, no credit card required. Start training your team in minutes!

No credit card required