Book

Data Cleaning and Exploration with Machine Learning

Data scientists spend 80% of their time cleaning and manipulating data and only 20% of their time analyzing it. Efforts put into cleaning data are crucial, since analyzing dirty data can lead to inaccurate decisions. This is a critically timed book that will help you identify, diagnose, and treat data cleaning problems in Python, with advanced ML techniques.

Offered byPackt Logo

Difficulty Level

Intermediate

Completion Time

18h4m

Language

English

About Book

Who Is This Book For?

This book is for professional data scientists, particularly those in the first few years of their career, or more experienced analysts who are relatively new to machine learning. Readers should have prior knowledge of concepts in statistics typically taught in an undergraduate introductory course as well as beginner-level experience in manipulating data programmatically.

Book content

chapters 18h4m total length

Examining the Distribution of Features and Targets

Examining Bivariate and Multivariate Relationships between Features and Targets

Identifying and Fixing Missing Values

Encoding, Transforming, and Scaling Features

Feature Selection

Preparing for Model Evaluation

Linear Regression Models

Support Vector Regression

K-Nearest Neighbor, Decision Tree, Random Forest and Gradient Boosted Regression

Logistic Regression

Decision Trees and Random Forest Classification

K-Nearest Neighbors for Classification

Support Vector Machine Classification

Naive Bayes Classification

Principal Component Analysis

K-Means and DBSCAN Clustering

Related Resources

Access Ready-to-Use Books for Free!

Get instant access to a library of pre-built books—free trial, no credit card required. Start training your team in minutes!

No credit card required