Thoughtfully writing a blog post

The Skynet Salesman

Here at Stitch Fix we work on a wide and varied set of data science problems. One area that we are heavily involved with is operations. Operations covers a broad range of problems and can involve things like optimizing shipping, allocating items to warehouses, coordinating processes to ensure that our products arrive on time, or optimizing the internal workings of a warehouse.

Data-Driven Fashion Design

A core methodology at Stitch Fix is blending recommendations from machines with judgments of expert humans. Our machines produce recommendations via algorithms operating over structured data, while our human stylists curate and modify these recommendations on the basis of unstructured data and knowledge that isn’t yet reflected in our dataset (e.g., new fashion trends). This helps us choose the best 5 items to offer each client in each fix. The success of this strategy within our styling organization prompts consideration of how machines and humans might be brought together in the realm of fashion design. In this post we describe one implementation of such a system. In particular, we explore how the system could be implemented with respect to a target client segment and season.

Multithreaded in the Wild

Members of our Analytics & Algorithms team are out and about this month – come by and hear us speak!

More Human Humans

Machines are going to take over the world and leave us humans without jobs. This is the meme going around in mainstream business books on the topic of Artificial Intelligence (AI). This is understandable as the number of things that machines can do better than humans is increasing: diagnosing medical conditions, analyzing legal documents, making parole decisions, to name a few. But doing something better doesn’t necessarily make machines an alternative to humans. If machines and humans each contribute differently to a capability, then there is opportunity to combine their unique talents to produce an outcome that is better than either one could achieve on their own. This has real potential to change not only how we work, but also how we understand our experience of being human.

Good Books for All Things Data

One of the greatest benefits of working among a diverse group of data scientists and data engineers at Stitch Fix is how much we can learn from our peers. Usually that means getting ad hoc help with specific questions from the resident expert(s). But it also means getting advice on how best to fill any gaps in our own skill sets or knowledge bases, or just what interesting data science materials to explore in our spare time. Our blog posts usually highlight the former; this post touches on the latter.

Introducing our Hybrid lda2vec Algorithm

The goal of lda2vec is to make volumes of text useful to humans (not machines!) while still keeping the model simple to modify. It learns the powerful word representations in word2vec while jointly constructing human-interpretable LDA document representations.

Real-Time Event Visualization

Beautiful data visualizations reveal stories that mere numbers cannot tell. Using visualizations, we can get a sense of scale, speed, direction, and trend of the data. Additionally, we can draw the attention of the audience – the key to any successful presentation – in a way that’s impossible with dry tabulations. While a tabular view of new online signups is informative for tracking, a dynamic map would provide a more captivating view and reveal dimensions that a table cannot.

The Timeless Way of Building Software

In 1977, an important 20th-century architect and prolific author, Christopher Alexander, wrote a book, “A Pattern Language: Towns, Buildings, Construction” followed in 1979 by another book, “The Timeless Way of Building”. These were books about architectural thinking where patterns, a pattern language, and a “way” of building were discussed in depth.

Unsharing the Database

At Stitch Fix, we are currently tackling a pretty common problem among fast-growing startups in the process of scaling. Our applications are overdependent on a shared database, and in order for us to uncouple the various engineering teams from one another and to grow our applications to the next level, we need to unshare it. This blog post will talk about the problems we are trying to solve, and the stepwise approach we are taking to solve them.

Sorry ARIMA, but I’m Going Bayesian

When people think of “data science” they probably think of algorithms that scan large datasets to predict a customer’s next move or interpret unstructured text. But what about models that utilize small, time-stamped datasets to forecast dry metrics such as demand and sales? Yes, I’m talking about good old time series analysis, an ancient discipline that hasn’t received the cool “data science” rebranding enjoyed by many other areas of analytics.