argilla
skweak
Our great sponsors
argilla | skweak | |
---|---|---|
15 | 8 | |
3,081 | 909 | |
4.7% | 0.2% | |
9.8 | 6.2 | |
6 days ago | 6 months ago | |
Python | Python | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
argilla
-
Open-Source Data Collection Platform for LLM Fine-Tuning and RLHF
I'm Dani, CEO and co-founder of Argilla.
Happy to answer any questions you might have and excited to hear your thoughts!
More about Argilla
-
No training data, no problem! Few-shot NER with a practical example
Rubrix, the open-source tool for data-centric NLP: https://github.com/recognai/rubrix
-
[P] Small-Text: Active Learning for Text Classification in Python
I have already thought about providing an example of how to integrate small-text with one of the existing labeling tools, such as rubrix rubrix, but that hasn't been started yet.
- [P] Open-source tool for building NLP training sets with weak supervision and search queries
-
[P] Rubrix: Open-source Python framework for NLP data annotation, exploration, and monitoring
You can check the project and tutorials here: https://github.com/recognai/rubrix
skweak
-
Entity Extraction with Predefined List
Thanks for pointing me in the right direction. Seems like there’s a few other approaches with weak supervision: https://github.com/NorskRegnesentral/skweak
-
[P] Programmatic: Powerful Weak Labeling
Code for https://arxiv.org/abs/2104.09683 found: https://github.com/NorskRegnesentral/skweak
-
The hand-picked selection of the best Python libraries released in 2021
skweak.
-
How to get Training data for NER?
I found this farmework: https://github.com/NorskRegnesentral/skweak and it looks great to automatically label data, but I would still need some kind of structured data in form of gazetters or another ML model to automatically annotate words.
I'm the main developer behind skweak by the way, happy to hear you're interested in our toolkit :-) We do already have a small list of products (see https://github.com/NorskRegnesentral/skweak/blob/main/data/products.json) extracted from DBPedia and Wikidata, but it may not be exactly the type of products you're looking for.
What are some alternatives?
snorkel - A system for quickly generating training data with weak supervision
doccano - Open source annotation tool for machine learning practitioners.
label-studio - Label Studio is a multi-type data labeling and annotation tool with standardized output format
cleanlab - The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
trankit - Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
data-centric-ai - Resources for Data Centric AI
dalle-flow - 🌊 A Human-in-the-Loop workflow for creating HD images from text
dataqa - Labelling platform for text using weak supervision.
DearPy3D - Dear PyGui 3D Engine (prototyping)
weasel - Weakly Supervised End-to-End Learning (NeurIPS 2021)
snorkel - A system for quickly generating training data with weak supervision [Moved to: https://github.com/snorkel-team/snorkel]