Assistant Professor, Department of Computer Science, Stanford University
Snorkel: Ameliorating the Labeling Bottleneck in Machine Learning
Abstract: Machine learning is now an indispensable part of a wide range of applications including voice-recognition, image search, and natural language processing. The major bottleneck in deploying machine learning systems today, however, is that they require large training sets of hand-labeled data, that provide examples from which the system extrapolates. Creating these training sets is often the most laborious development task in many real-world use cases. This talk will describe some new ideas to reduce this bottleneck. In particular, we describe several novel techniques for accepting weaker or noisier, higher-level input from users to train machine learning models. Key to our approach are new methods automatically denoising this noisy input. We hope that this direction will make building machine learning systems easier for an increasingly large set of people.