One of the greatest benefits of working among a diverse group of data scientists and data engineers at Stitch Fix is how much we can learn from our peers. Usually that means getting ad hoc help with specific questions from the resident expert(s). But it also means getting advice on how best to fill any gaps in our own skill sets or knowledge bases, or just what interesting data science materials to explore in our spare time. Our blog posts usually highlight the former; this post touches on the latter.
We’ve queried our data science team for some of their favorite data science books. This list is by no means exhaustive, but should keep any data scientist/engineer new or old learning and entertained for many an evening. Some of the suggestions included context from the data scientist, so for those we include that person’s name!
A great intro on the fundamentals of statistics and machine learning with R-based tutorials. It’s a great stand-alone text for someone interested in learning statistical modeling, but not necessarily hypothesis tests (usually the first big concept covered in introductory statistics textbooks). It is also a good to prep for the more advanced Elements of Statistical Learning.
A classic amongst the machine learning texts; concise and deep review on basics of machine learning from giants in the field.
Perfect reference book to find details about that probability distribution or that algorithm including examples. – Eli
Great book on probability theory - a must have on any bookshelf. – Eli
Great into to probability. Beautifully written, easy to read with great examples. Classic! (I may also be biased because my former advisor wrote it, but he’s an incredible teacher). – Hoda
Basic book and light on the mathematics, but great for gaining a strong understanding of Bayesian statistics for data analysis.
All intricacies and terms for causal models. Used in Lise Getoor’s class on advanced ML. – Natalia
Great depth and eloquence on econometric problems and time series. – Alex
This is an approachable introduction to causal modeling through the econometrics lens. It sits somewhere between a textbook and a less techincal read in that it has formulas, but it also contextualizes with lots of intuitive examples. It’s a great introduction to the field! – John & Hilary
A must-have for multilevel modeling. Good examples and easy to follow. – Songya
It’s more conceptual (i.e. light on advanced topics in statistics). But it articulates extremely well all of our all-too-human tendencies to misuse statistics. The Black Swan and Anti-fragile are also very good.
A reference to a wide range of data mining algorithms, focusing on large scale problems. – Sky
Includes code and how to create applications using social network APIs. – Natalia
I present to you: the infamous Patrick Winston. Enough said. – Hoda
A really excellent (and funny!) book on writing and general academic success. Much of the advice ports over to any research role, where success is a Poisson process so you just have to keep trying! – Hilary
This book is one of many produced by the professors who teach the Coursera Data Science track. Great resources for introductory-level material, with many examples in R.
In the absense of a good “tidy data analysis” book, this is a great primer for tidy data analysis that is the foundation of the “Hadleyverse” in R (tidyr, dplyr, ggplot2, etc.).
Another good resource is just to try your hand at some Kaggle problems / read other people’s solutions/thought process. It’s good practice for real-life applications of sorts. Not a book, but still. – Hoda
An empirical look at the causes of the gender gap in Computer Science.
This book was a very helpful read for me early on in my career in entering a male-dominated field. Not everyone will be facing the same issues or need the same advice, and furthermore any one book will only present one lens to look at the myriad issues through. However, I was grateful I read it and regularly recommend it to friends. – Hilary
Ideas in this book have stuck with me for years. Describes anyone’s performance can be affected just by being reminded of a stereotype that exists against them in the task at hand. Also offers individual and social strategies to reverse the impact.
Amazing cookbook resource for all things Python related.
Introduction to pandas library, which enables programmatic data analysis in Python. Saves you a lot of time to read the text vs. fumbling around when first learning pandas. – Ceslee
Great introduction for new R users. – Kyle
A deep and informative look at the R language.
One of the most fun books I’ve read since you can try the code examples real time (online version). Great primer to d3. – Ceslee