argilla
data-centric-ai
argilla | data-centric-ai | |
---|---|---|
15 | 1 | |
3,122 | 1,070 | |
2.3% | 1.4% | |
9.8 | 0.0 | |
5 days ago | 5 months ago | |
Python | TeX | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
argilla
-
Open-Source Data Collection Platform for LLM Fine-Tuning and RLHF
I'm Dani, CEO and co-founder of Argilla.
Happy to answer any questions you might have and excited to hear your thoughts!
More about Argilla
GitHub: https://github.com/argilla-io/argilla
-
Meet Argilla: An Open-Source Data Curation Platform for Large Language Models (LLMs) and MLOps for Natural Language Processing
Github link: https://github.com/argilla-io/argilla
- Show HN: Argilla and AutoTrain – Train custom NLP models without code
- Rubrix release 0.17.0 with support for the spaCy training format
-
No training data, no problem! Few-shot NER with a practical example
Rubrix, the open-source tool for data-centric NLP: https://github.com/recognai/rubrix
- [D] Expert Advice is needed on designing a feedback Loop for a (Textual Classification + NER) task in Production.
-
[D] How should a former Web Developer, pursue career in Machine Learning?
E.g. https://github.com/recognai/rubrix
-
[P] Small-Text: Active Learning for Text Classification in Python
I have already thought about providing an example of how to integrate small-text with one of the existing labeling tools, such as rubrix rubrix, but that hasn't been started yet.
- Finding and correcting text classification label errors with cleanlab and Rubrix | https://rubrix.readthedocs.io/en/master/tutorials/find_label_errors.html
- Rubrix: Open-source tool for building NLP training sets (now with weak supervision)
data-centric-ai
-
[P] Rubrix: Open-source Python framework for NLP data annotation, exploration, and monitoring
In line with initiatives like Data-centric AI (https://https-deeplearning-ai.github.io/data-centric-comp/, https://github.com/HazyResearch/data-centric-ai), we firmly believe that iterating on datasets (finding label errors, dataset slicing, QA, etc.) will become more and more important, and tools for making this easier and involving different roles are needed.
What are some alternatives?
snorkel - A system for quickly generating training data with weak supervision
pytorch-lightning - The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate. [Moved to: https://github.com/PyTorchLightning/pytorch-lightning]
label-studio - Label Studio is a multi-type data labeling and annotation tool with standardized output format
prometheus-spec - Cryptoeconomically-safe trustless high-load computing on top of Bitcoin
doccano - Open source annotation tool for machine learning practitioners.
data-centric-AI - A curated, but incomplete, list of data-centric AI resources.
cleanlab - The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
pytorch-lightning - Build high-performance AI models with PyTorch Lightning (organized PyTorch). Deploy models with Lightning Apps (organized Python to build end-to-end ML systems). [Moved to: https://github.com/Lightning-AI/lightning]
trankit - Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
spaCy - 💫 Industrial-strength Natural Language Processing (NLP) in Python
dalle-flow - 🌊 A Human-in-the-Loop workflow for creating HD images from text
DikeDataset - Dataset with labeled benign and malicious files 🗃️