Book

Mastering Apache Spark 2.x

Apache Spark is an in-memory cluster-based parallel processing system that provides a wide range of functionality like graph processing, machine learning, stream processing and more. This book will familiarize you with the newest features in Apache Spark 2.x, and take you through an exciting journey of complex Big Data processing, analytics, streaming analytics as well as advanced machine learning with Apache Spark. During the course of the book, you will leverage different functionalities and modules of Apache Spark such as Spark SQL, Spark MLlib, Spark Streaming, SparkML and more, to build efficient data processing solutions. By the end of this book, you will have all the necessary knowledge to use Apache Spark effectively in your day to day tasks.

Offered byPackt Logo

Difficulty Level

Intermediate

Completion Time

11h48m

Language

English

About Book

Who Is This Book For?

If you are an intermediate-level Spark developer looking to master the advanced capabilities and use-cases of Apache Spark 2.x, this book is for you. Big Data professionals who wish to know how to integrate and use the features of Apache Spark to build a strong Big Data pipeline will also find this book to be a useful resource. A fundamental knowledge of Apache Spark and the Scala programming language is assumed.

Book content

chapters 11h48m total length

A first taste and what’s new in ApacheSpark V2

Apache Spark SQL

The Catalyst Optimizer

Project Tungsten

Apache Spark Streaming

Structured Streaming

Apache Spark MLlib

Apache SparkML

Apache SystemML

DeepLearning on Apache Spark with DeepLearning4J, ApacheSystemML,H2O

Apache Spark GraphX

ApacheSpark GraphFrames

ApacheSpark with Jupyter Notebooks on IBM DataScience Experience

ApacheSpark on Kubernetes

Related Resources

Access Ready-to-Use Books for Free!

Get instant access to a library of pre-built books—free trial, no credit card required. Start training your team in minutes!

No credit card required