Hands-On Big Data Analytics with PySpark
In this book, you'll learn to implement some practical and proven techniques to improve aspects of programming and administration in Apache Spark. Techniques are demonstrated using practical examples and best practices. You will also learn how to use Spark and its Python API to create performant analytics with large-scale data.
Offered by
Difficulty Level
Intermediate
Completion Time
6h4m
Language
English
About Book
Who Is This Book For?
This book is for developers, data scientists, business analysts, or anyone who needs to reliably analyze large amounts of large-scale, real-world data. Whether you're tasked with creating your company's business intelligence function or creating great data platforms for your machine learning models, or are looking to use code to magnify the impact of your business, this book is for you.
Hands-On Big Data Analytics with PySpark
- About Book
- Who Is This Book For?
- Book Content
Book content
chapters • 6h4m total length
Installing Pyspark and Setting up Your Development Environment
Getting Your Big Data into the Spark Environment Using RDDs
Big Data Cleaning and Wrangling with Spark Notebooks
Aggregating and Summarizing Data into Useful Reports
Powerful Exploratory Data Analysis with MLlib
Putting Structure on Your Big Data with SparkSQL
Transformations and Actions
Immutable Design
Avoiding Shuffle and Reducing Operational Expenses
Saving Data in the Correct Format
Working with the Spark Key/Value API
Testing Apache Spark Jobs
Leveraging the Spark GraphX API
Related Resources
Access Ready-to-Use Books for Free!
Get instant access to a library of pre-built books—free trial, no credit card required. Start training your team in minutes!