Course

Learning PySpark

In this tutorial, you will learn about different techniques for collecting data. You will distinguish between and understand techniques for processing data. Next, we provide an in-depth review of RDDs and contrast them with DataFrames. We provide examples showing how to read data from files and from HDFS and how to specify schemas using reflection or programmatically (in the case of DataFrames). The concept of lazy execution is described and we outline various transformations and actions specific to RDDs and DataFrames. Finally, we show you how to use SQL to interact with DataFrames. By the end of this tutorial, you will have learned how to process data using Spark DataFrames and mastered data collection techniques by distributed data processing.

Difficulty Level

Intermediate

Completion Time

2h 28m

Language

English

About Course

Course content

lessons • 2h 28m total length

Related Courses

Access Ready-to-Use Courses for Free!

Get instant access to a library of pre-built courses—free trial, no credit card required. Start training your team in minutes!

I accept calibr's Terms and Conditions and Privacy Policy
No credit card required