Good Books for All Things Data

Hilary Parker and Brian Coffey
- San Francisco, CA

One of the greatest benefits of working among a diverse group of data scientists and data engineers at Stitch Fix is how much we can learn from our peers. Usually that means getting ad hoc help with specific questions from the resident expert(s). But it also means getting advice on how best to fill any gaps in our own skill sets or knowledge bases, or just what interesting data science materials to explore in our spare time. Our blog posts usually highlight the former; this post touches on the latter.

  xkcd

We’ve queried our data science team for some of their favorite data science books. This list is by no means exhaustive, but should keep any data scientist/engineer new or old learning and entertained for many an evening. Some of the suggestions included context from the data scientist, so for those we include that person’s name!


An Introduction to Statistical Learning
Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani Machine Learning

A great intro on the fundamentals of statistics and machine learning with R-based tutorials. It’s a great stand-alone text for someone interested in learning statistical modeling, but not necessarily hypothesis tests (usually the first big concept covered in introductory statistics textbooks). It is also a good to prep for the more advanced Elements of Statistical Learning.

Elements of Statistical Learning
Trevor Hastie, Robert Tibshirani and Jerome Friedman Machine Learning

A classic amongst the machine learning texts; concise and deep review on basics of machine learning from giants in the field.

Machine Learning, a probabilistic perspective
Kevin P. Murphy Machine Learning

Perfect reference book to find details about that probability distribution or that algorithm including examples. – Eli

Probability Theory: The Logic of Science
E. T. Jaynes Probability

Great book on probability theory - a must have on any bookshelf. – Eli

Introduction to Probability
Dimitri P. Bertsekas, John N. Tsitsiklis Probability

Great into to probability. Beautifully written, easy to read with great examples. Classic! (I may also be biased because my former advisor wrote it, but he’s an incredible teacher). – Hoda

Doing Bayesian Data Analysis
John K. Kruschke Bayesian Statistics

Basic book and light on the mathematics, but great for gaining a strong understanding of Bayesian statistics for data analysis.

Causality: Models, Reasoning and Inference
Judea Pearl Statistics

All intricacies and terms for causal models. Used in Lise Getoor’s class on advanced ML. – Natalia

Econometrics
Fumio Hayashi Inferential Statistics

Great depth and eloquence on econometric problems and time series. – Alex

Mostly Harmless Econometrics
Joshua Angrist, Jörn-Steffen Pischke Inferential Statistics

This is an approachable introduction to causal modeling through the econometrics lens. It sits somewhere between a textbook and a less techincal read in that it has formulas, but it also contextualizes with lots of intuitive examples. It’s a great introduction to the field! – John & Hilary

Data Analysis Using Regression and Multilevel/Hierarchical Models
Andrew Gelman, Jennifer Hill Statistics

A must-have for multilevel modeling. Good examples and easy to follow. – Songya

Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets
Nassim Taleb Statistics

It’s more conceptual (i.e. light on advanced topics in statistics). But it articulates extremely well all of our all-too-human tendencies to misuse statistics. The Black Swan and Anti-fragile are also very good.

Mining of Massive Datasets
Jure Leskovec, Anand Rajaraman, Jeff Ullman Data mining

A reference to a wide range of data mining algorithms, focusing on large scale problems. – Sky

Mining the social web
Matthew A. Russell Data Mining

Includes code and how to create applications using social network APIs. – Natalia

How To Speak
Patrick Winston Speaking

I present to you: the infamous Patrick Winston. Enough said. – Hoda

How to write a lot
Paul J. Silvia Writing

A really excellent (and funny!) book on writing and general academic success. Much of the advice ports over to any research role, where success is a Poisson process so you just have to keep trying! – Hilary

The Art of Data Science
Roger D. Peng, Elizabeth Matsui Coding / Analysis Style

This book is one of many produced by the professors who teach the Coursera Data Science track. Great resources for introductory-level material, with many examples in R.

Tidy Data (paper)
Hadley Wickham Coding / Analysis Style

In the absense of a good “tidy data analysis” book, this is a great primer for tidy data analysis that is the foundation of the “Hadleyverse” in R (tidyr, dplyr, ggplot2, etc.).

Kaggle problems
Many! General

Another good resource is just to try your hand at some Kaggle problems / read other people’s solutions/thought process. It’s good practice for real-life applications of sorts. Not a book, but still. – Hoda

Unlocking the Clubhouse
Allan Fisher, Jane Margolis General

An empirical look at the causes of the gender gap in Computer Science.

Nice Girls Just Don't Get It
Lois P. Frankel, Carol Frohlinger General

This book was a very helpful read for me early on in my career in entering a male-dominated field. Not everyone will be facing the same issues or need the same advice, and furthermore any one book will only present one lens to look at the myriad issues through. However, I was grateful I read it and regularly recommend it to friends. – Hilary

Whistling Vivaldi: How Stereotypes Affect Us and What We Can Do (Issues of Our Time)
Claude M. Steele General

Ideas in this book have stuck with me for years. Describes anyone’s performance can be affected just by being reminded of a stereotype that exists against them in the task at hand. Also offers individual and social strategies to reverse the impact.

Python Cookbook
David Beazley, Brian K. Jones Python

Amazing cookbook resource for all things Python related.

Python for Data Analysis
Wes McKinney Python

Introduction to pandas library, which enables programmatic data analysis in Python. Saves you a lot of time to read the text vs. fumbling around when first learning pandas. – Ceslee

Art of R Programming
Norman Matloff R

Great introduction for new R users. – Kyle

Advanced R
Hadley Wickham R

A deep and informative look at the R language.

Interactive Data Visualization for the Web
Scott Murray Visualization

One of the most fun books I’ve read since you can try the code examples real time (online version). Great primer to d3. – Ceslee

Tweet this post! Post on LinkedIn
Multithreaded

Come Work with Us!

We’re a diverse team dedicated to building great products, and we’d love your help. Do you want to build amazing products with amazing peers? Join us!