Reproducible Data Science with Pachyderm
Pachyderm enables you to create collaborative data science workflows and reproduce your experiments at scale. This book will help you leverage Pachyderm's data versioning and lineage features to build scalable end-to-end AI/ML pipelines and show you how to deploy Pachyderm in leading cloud platforms, use its SaaS offering PachHub, and much more.
Offered by
Difficulty Level
Intermediate
Completion Time
12h8m
Language
English
About Book
Who Is This Book For?
This book is for new as well as experienced data scientists and machine learning engineers who want to build scalable infrastructures for their data science projects. Basic knowledge of Python programming and Kubernetes will be beneficial. Familiarity with Golang will be helpful.
Reproducible Data Science with Pachyderm
- About Book
- Who Is This Book For?
- Book Content
Book content
chapters • 12h8m total length
The Problem of Data Reproducibility
Pachyderm Basics
Pachyderm Pipeline Specification
Installing Pachyderm Locally
Installing Pachyderm on a Cloud Platform
Creating Your First Pipeline
Pachyderm Operations
Creating an End-to-End Machine Learning Workflow
Distributed Hyperparameter Tuning with Pachyderm
Pachyderm Language Clients
Using Pachyderm Notebooks
Related Resources
Access Ready-to-Use Books for Free!
Get instant access to a library of pre-built books—free trial, no credit card required. Start training your team in minutes!