See who's out in the wild for the month of October
See who's out in the wild for the month of October
See who's out in the wild for the month of September
How Stitch Fix designed and built a scalable, centralized and self-service data integration platform tailored to the needs of their Data Scientists.
If we could assign sounds to items of clothing, what would a Fix sound like?
See who's out in the wild for the month of August
This post explores the use of matrix factorization not just for recommendations, but for understanding style preference more broadly.
This post is an introduction to constrained optimization aimed at data scientists and developers fluent in Python, but without any background in operations research or applied math. We'll demonstrate how optimization modeling can be applied to real problems at Stitch Fix. At the end of this article, you should be able to start modeling your own business problems.
See who's out in the wild for the month of June
Experimenter beware: Running tests with low power risks much more than missing the detection of true effects.
See who's out in the wild for May
There's a large group out and about for the month of April. Come see where you might find them.
As data scientists tasked with segmenting clients and products, we find ourselves in the same boat with species taxonomists, straddling the line between lumping individuals into broad groups and splitting into small segments. The approach for drawing the boundaries needs to take into account signals from the data while maintaining sharp focus on the project needs. A balance between lumping and splitting allows us to make the best data-driven decisions we can with the resources we have...
Come see who's out and about in the wild for March.
Data scientists are not always equipped with the requisite engineering skills to deploy robust code to a production job execution and scheduling system. Yet, forcing reliance on data platform engineers will impede the scientists autonomy. If only there was another way. So today, we're excited to introduce Flotilla, our latest open source project...
Wow! We are so honored to be ranked #13 on Fast Company’s Most Innovative Companies List. And, we’re thrilled to be ranked #1 on Fast Company’s Data Science List.
It's really gratifying to see Data Science becoming a primary means of strategic differentiation ...
See who's out in the wild for February
See who's out in the wild for January
On the Stitch Fix Algorithms team, we’ve always been in awe of what professional stylists are able to do, especially when it comes to knowing a customer’s size on sight. It’s a magical experience to walk into a suit shop, have the professional shopping assistant look you over and without taking a measurement say, “you’re probably a 38, let’s try this one,” and pull out a perfect-fitting jacket. While this sort of experience has been impossible with traditional eCommerce, at Stitch Fix we’re making it a reality.
See who's out in the wild for December
See who's out in the wild for November
Counting and tensor decompositions are elegant and straightforward techniques. But these methods are grossly underepresented in business contexts. In this post we factorized an example made up of word skipgrams occurring within documents to arrive at word and document vectors simultaneously. This kind of analysis is effective, simple, and yields powerful concepts.
When I started playing with word2vec four years ago I needed (and luckily had) tons of supercomputer time. But because of advances in our understanding of word2vec, computing word vectors now takes fifteen minutes on a single run-of-the-mill computer with standard numerical libraries1. Word vectors are awesome but you don't need a neural network -- and definitely don't need deep learning -- to find them. So if you're using word vectors and aren't gunning for state of the art or a paper publication then stop using word2vec.
Today is the start of the 2017-2018 NBA Season. Basketball statistics have become a rich and intriguing domain of study, bringing new insights and advantages to the teams that embrace such empiricism. Of course, the framing and analytic techniques used to study basketball are generalizations - they also give intuition to problems in business or other domains (and vice versa). So, for all the basketball statistics enthusiasts out there, as well as those that are looking for inspirations for their own analytic challenges, we thought we’d share a compendium of our past basketball-related posts.
See who's out in the wild for October
In this post we’ll take a look at how we can model classification prediction with non-constant, time-varying coefficients. There are many ways to deal with time dependence, including Bayesian dynamic models (aka "state space" models), and random effects models. Each type of model captures the time dependence from a different angle; we will keep things simple and look at a time-varying logistic regression that is defined within a regularization framework. We found it quite intuitive, easy to implement, and observed good performances using this model.
See who's out in the wild for September
Here at Stitch Fix, we work on many fun and interesting areas of Data Science. One of the more unusual ones is drawing maps, specifically internal layouts of warehouses. These maps are extremely useful for simulating and optimising operational processes. In this post, we'll explore how we are combining ideas from recommender systems and structural biology to automatically draw layouts and track when they change.
This summer our community included four interns, all graduate students who are passionate about applying their academic expertise to help us leverage our rich data to better understand our clients, their preferences, and new trends in the industry. In this blog post you’ll meet the interns, who will tell you a bit about the problems they worked on and the strategies they used to solve them.
See who's out in the wild for this last part of August, and vote for our SXSW proposals.
Stitch Fix is a Data Science company that aspires to help you to find the style that you love. Data Science helps us make most of our business and strategic decisions.
Announcing Diamond, an open-source project for solving mixed-effects models
Solving mixed-effects models efficiently: the math behind Diamond
Analysis should be reproducible. This isn’t controversial, and yet irreproducible analysis is everywhere. I’ve certainly created plenty of it. Why does this happen, despite good intentions? Because, in the short term, it is easier and more expedient not to worry about reproducibility. But this isn’t a moral failing so much as a failing of our tools. Tools can, and should, help make reproducible analysis the natural thing to do. As a step towards encouraging reproducibility, this post introduces Nodebook, an extension to Jupyter notebook.
As a proudly data-driven company dealing with physical goods, Stitch Fix has put lots of effort into inventory management. Tracer, the inventory history service, is a new project we have been building to enable more fine-grained analytics by providing precise inventory state at any given point of time...
In this post aimed at SQL practitioners who would rather spend their time writing Python, we'll show how a web development tool can help your ETL stay DRY.
How to organize an office so everyone working there can be comfortable and productive is the topic of much discussion. A common strategy is to seat people by their team or sub-team membership. Another strategy which we have been employing is to simply allocate people randomly. Building upon these experiences we've developed a new seating allocation tool "seetd", that allows us to frame this as an optimization problem. We're now free to combine these and other approaches objectively.
R is an awesome tool for doing data science interactively, but has some defaults that make us worry about using it in production pipelines.
We have an innate and uncontrollable urge to explain things - even when there is nothing to explain. This post explores why we are prone to narrative fallacies. We start at an epic moment in sports history, Steph Curry breaking the record for most 3-pointers in a game, and draw conclusions for better decision making in business.
See who's out in the wild for the month of June.
Dora helps data scientists at Stitch Fix visually explore their data. Powered by React and Elasticsearch, it provides an intuitive UI for data scientists to take advantage of Elasticsearch's powerful functionality.
In this last installment of our Making of the Tour series, we look at some of the fun and random.
See who's out in the wild for the month of May.
In this post, we'll talk about some simulation-powered animations, provide some cleaned up code that you can use, and discuss these animations' genesis and utility for visualizing abstract systems and algorithms or for visualizing real historical data and projected futures.
See who's out in the wild for the month of April.
Earlier this month, we released an interactive animation describing how data science is woven into the fabric of Stitch Fix: our Algorithms Tour. It was a lot of fun to make and even more fun to see people’s responses to it. For those interested in how we did it, we thought we’d give a quick tour of what lies under that Tour.
Last summer, we wrote about Stitch Fix’s early experiments in data-driven fashion design. Since then, we’ve been studying, developing, and testing new ways to create clothes that delight our clients. Some of this work was featured yesterday in an article in The Wall Street Journal. As a companion to that piece, we wanted to highlight a few avenues that we have explored recently.
How data science is woven into the fabric of Stitch Fix. In this interactive tour we share ten “stories” of how data science is is integral to our operations and product.
Come see who's going to be presenting or speaking out in the wild for the month of March!
At first sight the difference between planets outside our solar system (exoplanets from now on) and fashion trends seems enormous, but all of us math lovers know that entirely different phenomena can have an almost identical mathematical description. In this very peculiar case, exoplanetary systems and certain fashion trends can be characterized as having a periodic nature, with certain magnitudes repeating cyclically, and this will allow us to use very similar techniques to study them.
Time series modeling sits at the core of critical business operations such as supply and demand forecasting and quick-response algorithms like fraud and anomaly detection. Small errors can be costly, so it’s important to know what to expect of different error sources. In this post I’ll go through alternative strategies for understanding the sources and magnitude of error in time series.
With January complete, see who's out in the wild for the month of February.
For those who attended my talk at Data Day Texas in Austin last weekend, you heard me talk about how Stitch Fix has reduced contention on: Access to data & Access to ad-hoc compute resources; to help scale Data Science. As attendees requested, I have posted my slides here, which you can find a link to...
Happy New Year! As always, members of our algorithms and engineering teams are out and about this January!
The outcome of the presidential election clearly indicated that the model used by FiveThirtyEight was closer to the truth than that of the Princeton Election Consortium in terms of the level of uncertainty in the predictions---but not by as much as you might think. I consider the question quantitatively: what are the odds that one or the other model is right given the state-by-state results?
When Donald Trump won the 2016 presidential election, both sides of the political spectrum were surprised. The prediction models didn't see it coming and the Nate Silvers of the world took some heat for that (although Nate Silver himself got pretty close). After this, a lot of people would probably agree that the world doesn't need another statistical prediction model.
So, should we turn our backs on forecasting models? No, we just need to revise our expectations. George Box once reminded us that statistical models are, at best, useful *approximations* of the real world. With the recent hype around data science and "money balling" this point is often overlooked.
On this day in 1936, Alan Turing stood before the London Mathematical Society and delivered a paper entitled "On Computable Numbers, with an Application to the Entscheidungsproblem", wherein he described an abstract mathematical device that he called a "universal computing engine" and which would later become known as a Turing machine. As a Stitch Fix tribute, we’ve melded a Turing machine and a 1936 Singer sewing machine.
Update, Dec 12, 2016: There is a follow up post discussing the outcome of all of this after the election results were known.
Plaid is for fall; red on Valentine’s Day; no white after Labor Day. These are fashion adages we’ve all heard before -- that even John Oliver promotes. But how true are they? The Stitch Fix Algorithms team is in a unique position to quantitatively answer these questions for the first time. Given the season, we’ve decided to first take a look at the “No white after Labor Day” claim. How real is it?
As always, members of our algorithms and engineering teams are out and about this month!
Data science begins not with data but with questions. And sometimes getting the data necessary to answer those questions requires some ingenuity.
On the algorithms team at Stitch Fix, we aim to give everyone enough autonomy to own and deploy all the code that they write, when and how they want to. This is challenging because the breadth of who is writing micro-services for what, covers a wide spectrum of use cases - from writing services to integrate with engineering applications, e.g. serving styling recommendations, to writing dashboards that consume and display data, to writing internal services to help make all of this function. After looking at the many deployment pipeline options out there, we settled on implementing the immutable server pattern.