snorkel
skweak
Our great sponsors
snorkel | skweak | |
---|---|---|
1 | 8 | |
5,109 | 909 | |
- | 0.2% | |
6.2 | 6.2 | |
about 2 years ago | 6 months ago | |
Python | Python | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
snorkel
-
[P] Programmatic: Powerful Weak Labeling
Code for https://arxiv.org/abs/1605.07723 found: https://github.com/HazyResearch/snorkel
skweak
-
Entity Extraction with Predefined List
Thanks for pointing me in the right direction. Seems like there’s a few other approaches with weak supervision: https://github.com/NorskRegnesentral/skweak
-
[P] Programmatic: Powerful Weak Labeling
Code for https://arxiv.org/abs/2104.09683 found: https://github.com/NorskRegnesentral/skweak
-
Show HN: Programmatic – a REPL for creating labeled data
Hi Raza here, one of the other co-founders.
I know that HN likes to nerd out over technical details so thought I’d share a bit more on how we aggregate the noisy labels to clean them up.
At the moment we use the great Skweak [1] open source library to do this. Skweak uses an HMM to infer the most likely unobserved label given the evidence of the votes from each of the labelling functions.
This whole strategy of first training a label model and then training a neural net was pioneered by Snorkel. We’ve used this approach for now but we actually think there are big opportunities for improvement.
We’re working on an end-to-end approach that de-noises the labelling function and trains the model at the same time. So far we’ve seen improvements on the standard benchmarks [2] and are planning to submit to Neurips.
R
[1]: Skweak package: https://github.com/NorskRegnesentral/skweak
-
The hand-picked selection of the best Python libraries released in 2021
skweak.
- Skweak: Weak Supervision for NLP
-
Inevitable Manual Work involved in NLP
For more advanced unsupervised labeling, you should check skweak
-
How to get Training data for NER?
I'm the main developer behind skweak by the way, happy to hear you're interested in our toolkit :-) We do already have a small list of products (see https://github.com/NorskRegnesentral/skweak/blob/main/data/products.json) extracted from DBPedia and Wikidata, but it may not be exactly the type of products you're looking for.
What are some alternatives?
spaCy - 💫 Industrial-strength Natural Language Processing (NLP) in Python
snorkel - A system for quickly generating training data with weak supervision
catanatron - Settlers of Catan Bot Simulator and Strong AI Player
DearPy3D - Dear PyGui 3D Engine (prototyping)
compose - A machine learning tool for automated prediction engineering. It allows you to easily structure prediction problems and generate labels for supervised learning.
argilla - Argilla is a collaboration platform for AI engineers and domain experts that require high-quality outputs, full data ownership, and overall efficiency.
AugLy - A data augmentations library for audio, image, text, and video.
Text-Summarization-using-NLP - Text Summarization using NLP to fetch BBC News Article and summarize its text and also it includes custom article Summarization
pytorch-partial-crf - CRF, Partial CRF and Marginal CRF in PyTorch
evidently - Evaluate and monitor ML models from validation to production. Join our Discord: https://discord.com/invite/xZjKRaNp8b
jina - ☁️ Build multimodal AI applications with cloud-native stack
gradio - Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!