Thoughtfully writing a blog post

All blog posts

So, You Need a Statistically Significant Sample?

Although a commonly used phrase, there is no such thing as a “statistically significant sample” – it’s the result that can be statistically significant, not the sample. Word-mincing aside, for any study that requires sampling – e.g. surveys and A/B tests – making sure we have enough data to ensure confidence in results is absolutely critical.

Stitch Fix + Jupyter + Github = Awesome!

At Stitch Fix we are avid users of Jupyter for research at both the personal and team scales. At the personal level, Jupyter is a great interface to research the question at hand. It captures the workflow of the research where we can take detailed notes on the code and explain models with written content and mathematical equations.

Stitch Fix ❤ UNIX

I ❤ UNIX and using the command line; they help me solve problems at Stitch Fix. I’m not alone. Across the Data Science and Engineering teams, we’re constantly solving problems with UNIX and the command line.

Advice for Data Scientists on Where to Work

It’s a good time to be a data scientist. If you have the skills, experience, curiosity and passion, there is a vast and receptive market of companies to choose from. Yet there is much to consider when evaluating a prospective firm as a place to apply your talents. Even veterans may not have had the opportunity to experience different organizations, stages of maturity, cultures, technologies, or domains. We are amalgamating our combined experience here to offer some advice - three things to look for in a company that could make it a great place to work.

Congratulations on Becoming a Software Engineer

Now you know all about strings and ints and how Hash tables are ordered (or not). You know how to talk to a JSON API and build a Twitter clone in 3 different languages (or maybe 3 different JavaScript frameworks). Your aptitude, technical knowhow and general pluckiness have helped you land your first real Software Development job. Congratulations. This is no easy feat. I know that you put in many hours of study and many more of frustration to get where you are today. By knowing how to write software, you’ve opened up for yourself a whole world of possibilities.

The Grammar of Data Science

Python and R are popular programming languages used by data scientists. Until recently, I exclusively used Python for exploratory data analysis, relying on Pandas and Seaborn for data manipulation and visualization. However, after seeing my colleagues do some amazing work in R with dplyr and ggplot2, I decided to take the plunge and learn how the other side lives. I found that I could more easily translate my ideas into code and beautiful visualizations with R than with Python. In this post, I will elaborate on my experience switching teams by comparing and contrasting R and Python solutions to some simple data exploration exercises.

A Word is Worth a Thousand Vectors

Standard natural language processing (NLP) is a messy and difficult affair. It requires teaching a computer about English-specific word ambiguities as well as the hierarchical, sparse nature of words in sentences. At Stitch Fix, word vectors help computers learn from the raw text in customer notes. Our systems, composed of machines and human experts, need to recommend the maternity line when she says she’s in her ‘third trimester’, identify a medical professional when she writes that she ‘used to wear scrubs to work’, and distill ‘taking a trip’ into a Fix for vacation clothing.

Multithreaded Data - John Myles White on Julia

Last week we kicked off our first Multithreaded Data event, where John Myles White gave a talk about Julia, a new programming language that some of us love. It’s the first of many exciting talks to come at Stitch Fix. Our next invited speaker is Hadley Wickham, who will be talking about how to get data into R. If you’re in the SF bay area and the topic excites you, keep an eye out for our upcoming events!

ElasticSearch and Denormalization in Rails

All these JOINs are killing me

May Bayes Theorem Be with You

The frequentist paradigm enjoys the most widespread acceptance for statistical analysis. Frequentist concepts such as confidence intervals and p-values dominate introductory statistics curriculums from science departments to business schools, and frequentist methods are still the go-to tools for most practitioners.