Course

Learning PySpark

In this tutorial, you will learn about different techniques for collecting data. You will distinguish between and understand techniques for processing data. Next, we provide an in-depth review of RDDs and contrast them with DataFrames. We provide examples showing how to read data from files and from HDFS and how to specify schemas using reflection or programmatically (in the case of DataFrames). The concept of lazy execution is described and we outline various transformations and actions specific to RDDs and DataFrames. Finally, we show you how to use SQL to interact with DataFrames. By the end of this tutorial, you will have learned how to process data using Spark DataFrames and mastered data collection techniques by distributed data processing.

Offered byPackt Logo

Difficulty Level
Intermediate
Completion Time
2h28m approx.
Language
English
Certification
Not available

About Course

Course Content

lessons 2h28m total length

On this page

Ready to Train Your Team?

Need training for your whole team? Get bulk pricing, LMS integration, and dedicated support.

Trusted by Leading Organizations Worldwide

Join thousands of companies that trust Calibr to power their learning and development initiatives.

Chalet Hotels logo
Pernod Ricard logo
ProMobi logo
Metrique logo
K Raheja Corp logo
Spyne.AI logo
VuNet Systems logo
Procurement Partners logo
vEngage.AI logo
1218 Global logo
TRADEJINI logo
Oben Electric logo
IIT STartups logo
EdTech Digit logo
MindSkillz logo
NewportMed logo

Access Ready-to-Use Courses for Free!

Get instant access to a library of pre-built courses—free trial, no credit card required. Start training your team in minutes!

No credit card required

Related Resources