snorkel
skweak
Our great sponsors
snorkel | skweak | |
---|---|---|
5 | 8 | |
5,500 | 877 | |
0.5% | 0.7% | |
5.5 | 3.8 | |
about 1 month ago | 7 months ago | |
Python | Python | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
snorkel
-
[P] We are building a curated list of open source tooling for data-centric AI workflows, looking for contributions.
The paid product came out of an open source tool: https://github.com/snorkel-team/snorkel
- [Discussion] - "data sourcing will be more important than model building in the era of foundational model fine-tuning"
-
Can't use load_data from utils
Actually, I referenced it in my issue as well. There seems to be different utils.py file in different folders under the snorkel-tutorials repo but the utils file we get after importing snorkel has a different [file](https://github.com/snorkel-team/snorkel/blob/master/snorkel/utils/core.py) ,i.e. the utils file is different in the main snorkel repo
- [D] A hand-picked selection of the best Python ML Libraries of 2021
skweak
-
Entity Extraction with Predefined List
Thanks for pointing me in the right direction. Seems like there’s a few other approaches with weak supervision: https://github.com/NorskRegnesentral/skweak
-
[P] Programmatic: Powerful Weak Labeling
Code for https://arxiv.org/abs/2104.09683 found: https://github.com/NorskRegnesentral/skweak
-
The hand-picked selection of the best Python libraries released in 2021
skweak.
-
How to get Training data for NER?
I found this farmework: https://github.com/NorskRegnesentral/skweak and it looks great to automatically label data, but I would still need some kind of structured data in form of gazetters or another ML model to automatically annotate words.
I'm the main developer behind skweak by the way, happy to hear you're interested in our toolkit :-) We do already have a small list of products (see https://github.com/NorskRegnesentral/skweak/blob/main/data/products.json) extracted from DBPedia and Wikidata, but it may not be exactly the type of products you're looking for.
What are some alternatives?
argilla - ✨Argilla: the open-source data curation platform for LLMs
spaCy - 💫 Industrial-strength Natural Language Processing (NLP) in Python
weasel - Weakly Supervised End-to-End Learning (NeurIPS 2021)
caer - High-performance Vision library in Python. Scale your research, not boilerplate.
snorkel - A system for quickly generating training data with weak supervision [Moved to: https://github.com/snorkel-team/snorkel]
DearPy3D - Dear PyGui 3D Engine (prototyping)
snorkel-tutorials - A collection of tutorials for Snorkel
AugLy - A data augmentations library for audio, image, text, and video.
gradio - Create UIs for your machine learning model in Python in 3 minutes
evidently - Evaluate and monitor ML models from validation to production. Join our Discord: https://discord.com/invite/xZjKRaNp8b
pytorch-lightning - Build high-performance AI models with PyTorch Lightning (organized PyTorch). Deploy models with Lightning Apps (organized Python to build end-to-end ML systems). [Moved to: https://github.com/Lightning-AI/lightning]