HomeToolsMCPHow It WorksStoriesPhilosophyCommunityArchitectureStar on GitHub
All Tools
S
DataFreeOpen Source

SNORKEL

Programmatically build and manage training data

Apache-2.0

ABOUT

Manual data labeling is slow, expensive, and does not scale to the large training sets required by modern machine learning. Snorkel solves this by letting users write labeling functions—heuristic rules, distant supervision, and other weak signals—to programmatically generate large labeled datasets. It then uses probabilistic modeling to denoise and combine these labels into high-quality training data without requiring exhaustive manual annotation.

INSTALL
pip install snorkel

INTEGRATION GUIDE

1. Spam classification using weak supervision and labeling functions 2. Data augmentation for NLP tasks with transformation functions 3. Monitoring critical data subsets via slicing functions 4. Medical text labeling and clinical NLP at scale 5. Enterprise knowledge extraction at scale without manual annotation

TAGS

machine-learningaiweak-supervisiondata-labelingtraining-datapythonnlp