Book

Reproducible Data Science with Pachyderm

Pachyderm enables you to create collaborative data science workflows and reproduce your experiments at scale. This book will help you leverage Pachyderm's data versioning and lineage features to build scalable end-to-end AI/ML pipelines and show you how to deploy Pachyderm in leading cloud platforms, use its SaaS offering PachHub, and much more.

Offered byPackt Logo

Difficulty Level

Intermediate

Completion Time

12h8m

Language

English

About Book

Who Is This Book For?

This book is for new as well as experienced data scientists and machine learning engineers who want to build scalable infrastructures for their data science projects. Basic knowledge of Python programming and Kubernetes will be beneficial. Familiarity with Golang will be helpful.

Book content

chapters 12h8m total length

The Problem of Data Reproducibility

Pachyderm Basics 

Pachyderm Pipeline Specification

Installing Pachyderm Locally

Installing Pachyderm on a Cloud Platform

Creating Your First Pipeline

Pachyderm Operations

Creating an End-to-End Machine Learning Workflow 

Distributed Hyperparameter Tuning with Pachyderm

Pachyderm Language Clients

Using Pachyderm Notebooks

Related Resources

Access Ready-to-Use Books for Free!

Get instant access to a library of pre-built books—free trial, no credit card required. Start training your team in minutes!

No credit card required