Thoughtfully writing a blog post

Genie in a Box : Making Spark Easy for Stitch Fix Data Scientists

Stitch Fix is a Data Science company that aspires to help you to find the style that you love. Data Science helps us make most of our business and strategic decisions.

Diamond Part II

Announcing Diamond, an open-source project for solving mixed-effects models

Diamond Part I

Solving mixed-effects models efficiently: the math behind Diamond

The Biology of Code

It seems like the endless stream of data in a terminal is the farthest thing from a living, breathing being. Looking through the Stitch Fix codebase, what comes to mind is structures, information systems, methods of exchange – not a group of finches on a faraway Galapagos island.


Analysis should be reproducible. This isn’t controversial, and yet irreproducible analysis is everywhere. I’ve certainly created plenty of it. Why does this happen, despite good intentions? Because, in the short term, it is easier and more expedient not to worry about reproducibility. But this isn’t a moral failing so much as a failing of our tools. Tools can, and should, help make reproducible analysis the natural thing to do. As a step towards encouraging reproducibility, this post introduces Nodebook, an extension to Jupyter notebook.

Inventory Time Machine

As a proudly data-driven company dealing with physical goods, Stitch Fix has put lots of effort into inventory management. Tracer, the inventory history service, is a new project we have been building to enable more fine-grained analytics by providing precise inventory state at any given point of time...

This one weird trick will simplify your ETL workflow

In this post aimed at SQL practitioners who would rather spend their time writing Python, we'll show how a web development tool can help your ETL stay DRY.

Internal Software: Why Build Internal Software?

The opinion that software will soon dominate and radically change every aspect of everyone’s lives has become so commonplace, and repeated so frequently, that those of us in the tech industry treat it as a statement of fact. A less heralded but more concrete fact is that much of the productivity gains expected from the introduction of computer technologies have not been realized.

Be smarter. Be seetd.

How to organize an office so everyone working there can be comfortable and productive is the topic of much discussion. A common strategy is to seat people by their team or sub-team membership. Another strategy which we have been employing is to simply allocate people randomly. Building upon these experiences we've developed a new seating allocation tool "seetd", that allows us to frame this as an optimization problem. We're now free to combine these and other approaches objectively.

Patterns of Service-oriented Architecture: Denormalized Cache

Next up in our “Patterns of Service-oriented Architecture” series we’ll talk about dealing with highly normalized data that spans many tables and services, or otherwise has a large object graph that reaches beyond just a simple database, by caching a denormalized version of it.