Hello Stitch Fix followers, check out where our fellow Stitch Fixers are speaking in the month of April.
At Stitch Fix we have 130+ “Full Stack Data Scientists” who in addition to doing data science work, are also expected to engineer and own data pipelines for their production models. One data science team, the Forecasting, Estimation, and Demand team was in a bind. Their data generation process was causing them iteration & operational frustrations in delivering time-series forecasts for the business. In this talk I’ll present Hamilton, a novel python micro framework, that solved their pain points by changing their working paradigm.
Specifically, Hamilton enables a simpler paradigm for a Data Science team to create, maintain, and execute code for generating wide dataframes, especially when there are lots of intercolumn dependencies. Hamilton does this by building a DAG of dependencies directly from python functions defined in a special manner, which also makes unit testing and documentation easy; tune into the talk to find out how. I’ll also cover our experience migrating to it and using it in production for over a year, along with possible future directions.
At Stitch Fix we have a dedicated Data Science organization called Algorithms. It has over 130+ Full Stack Data Scientists that build & own a variety of models. These models span from your classic prediction & classification models, through to time-series forecasts, simulations, and optimizations. Rather than hand-off models for productionization to someone else, Data Scientists own and are on-call for that process; we love for our Data Scientists to have autonomy. That said, Data Scientists aren’t without engineering support, as there’s a Data Platform team dedicated to building tooling, services, and abstractions to increase their workflow velocity.
One data science task that we have been speeding up is getting models to production and increasing their usability and stability. This is a necessary task that can take a considerable chunk of a Data Scientist’s time, either in terms of developing, or debugging issues; historically everyone largely carved their own path in this endeavor, which meant many different approaches, implementations, and little to leverage across teams. In this talk I’ll cover how the Model Lifecycle team on Data Platform built a system dubbed the "Model Envelope” to enable “deployment for free”. That is, no code needs to be written by a data scientist to deploy any python model to production, where production means either a micro-service, or a batch python/spark job. With our approach we can remove the need for data scientists to have to worry about python dependencies, or instrumenting model monitoring since we can take care of it for them, in addition to other MLOps concerns.
Be sure to catch us at these events :)