Book

Essential PySpark for Scalable Data Analytics

Essential PySpark for Scalable Data Analytics is an introduction for anyone new to the distributed computing model. You'll learn to unlock the analytics world by building end-to-end data processing pipelines, starting with data ingestion, cleansing, and integration, through to data visualization and building and operationalizing predictive models.

Offered byPackt Logo

Difficulty Level
Intermediate
Completion Time
10h44m approx.
Language
English
Certification
Not available

About Course

Book Content

chapters 10h44m total length

1. Distributed Computing Primer
2. Data Ingestion
3. Data Cleansing and Integration
4. Real-time Data Analytics
5. Scalable Machine Learning with PySpark
6. Feature Engineering – Extraction, Transformation, and Selection
7. Supervised Machine Learning
8. Unsupervised Machine Learning
9. Machine Learning Life Cycle Management
10. Scaling Out Single-Node Machine Learning Using PySpark
11. Data Visualization with PySpark
12. Spark SQL Primer
13. Integrating External Tools with Spark SQL
14. The Data Lakehouse

On this page

Ready to Train Your Team?

Need training for your whole team? Get bulk pricing, LMS integration, and dedicated support.

Trusted by Leading Organizations Worldwide

Join thousands of companies that trust Calibr to power their learning and development initiatives.

Chalet Hotels logo
Pernod Ricard logo
ProMobi logo
Metrique logo
K Raheja Corp logo
Spyne.AI logo
VuNet Systems logo
Procurement Partners logo
vEngage.AI logo
1218 Global logo
TRADEJINI logo
Oben Electric logo
IIT STartups logo
EdTech Digit logo
MindSkillz logo
NewportMed logo

Request Access For Your Organization

Start training your team in minutes!

No credit card required

Related Resources