Book

Apache Spark 2.x Cookbook

Apache Spark has become the hottest platform and sought after skill set when it comes to the fields of Big Data, Analytics and Data Science. Apache Spark 2.x comes with series of new improvements in the areas of performance, scalability, operational and production readiness for structured processing of massive datasets. This book brings in a systematic way of getting a practical hands on to using its improved programming APIs, expanded SQL functionalities and implement distributed machine learning applications with Spark ML. Through the course of chapters, you will have explored the power of Spark DataFrames/Datasets, harness MLLib for Data mining, analyze complex problems with iterative or multi-stage Spark scripts and other associated toolsets such as Spark SQL, Spark Streaming and GraphX .

Offered byPackt Logo

Difficulty Level

Intermediate

Completion Time

9h48m

Language

English

About Book

Who Is This Book For?

This book is for data engineers, data scientists, and Big Data professionals who want to leverage the power of Apache Spark 2.x for real-time Big Data processing. If you’re looking for quick solutions to common problems while using Spark 2.x effectively, this book will also help you. The book assumes you have a basic knowledge of Scala as a programming language.

Book content

chapters 9h48m total length

Getting Started with Apache Spark

Developing Applications with Spark

Spark SQL

Working with External Data Sources

Spark Streaming

Getting Started with Machine Learning

Supervised Learning with MLlib – Regression

Supervised Learning with MLlib – Classification

Unsupervised learning

Recommendations Using Collaborative Filtering

Graph Processing Using GraphX and GraphFrames

Optimizations and Performance Tuning

Related Resources

Access Ready-to-Use Books for Free!

Get instant access to a library of pre-built books—free trial, no credit card required. Start training your team in minutes!

No credit card required