Large Scale Data & ML Monitoring with whylogs
In the era of microservices, decentralized ML architectures and complex data pipelines, data quality has become a bigger challenge than ever. When data is involved in complex business processes and decisions, bad data can, and will, affect the bottom line. As a result, ensuring data quality across the entire ML pipeline is both costly, and cumbersome while data monitoring is often fragmented and performed ad hoc. An open source library called whylogs is built to address these challenges. It is a lightweight data profiling library that enables end-to-end data monitoring across the entire software stack. The library implements a language and platform agnostic approach to data quality and data monitoring. It’s been deployed at massive-scale data environments, on structured and unstructured data modalities, and across a range of points in the ML lifecycle. In this talk, we will provide an overview of the whylogs architecture, including its lightweight statistical data collection approach and we will show how users can apply this library to existing data and ML pipelines.
Date and Time:
The talk will be held on Tuesday, October 11th at 1:00PM PDT.
This talk was recorded live and is viewable below:
Alessya Visnjic is the CEO of WhyLabs, the AI Observability company building tools that power robust and responsible AI deployment. Prior to WhyLabs, Alessya was a CTO-in-residence at the Allen Institute for AI, where she evaluated commercial potential for the latest AI research. Earlier, Alessya spent 9 years at Amazon leading ML initiatives, including forecasting and data science platforms. Alessya is also the founder of Rsqrd AI, a global community of 1,000+ AI practitioners who are committed to making enterprise AI technology responsible.