Learning PySpark
In this tutorial, you will learn about different techniques for collecting data. You will distinguish between and understand techniques for processing data. Next, we provide an in-depth review of RDDs and contrast them with DataFrames. We provide examples showing how to read data from files and from HDFS and how to specify schemas using reflection or programmatically (in the case of DataFrames). The concept of lazy execution is described and we outline various transformations and actions specific to RDDs and DataFrames. Finally, we show you how to use SQL to interact with DataFrames. By the end of this tutorial, you will have learned how to process data using Spark DataFrames and mastered data collection techniques by distributed data processing.
Offered by
Difficulty Level
Intermediate
Completion Time
2h28m
Language
English
About Course
Who Is This Course For?
If you are a Python developer keen to master hands-on techniques using the Apache Spark 2.x ecosystem in the best possible manner, this video is for you. A firm understanding of Python is expected to get the best out of the tutorial. Familiarity with Spark would also be helpful.
Learning PySpark
- About Course
- Who Is This Course For?
- Course Content
Course content
lessons • 2h28m total length
Related Resources
Access Ready-to-Use Courses for Free!
Get instant access to a library of pre-built courses—free trial, no credit card required. Start training your team in minutes!