QDrant-NLP
argilla
QDrant-NLP | argilla | |
---|---|---|
1 | 15 | |
11 | 3,108 | |
- | 1.9% | |
10.0 | 9.8 | |
over 1 year ago | 5 days ago | |
Python | Python | |
- | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
QDrant-NLP
-
Vector Databases for Data-Centric AI (Part 2)
Just clone the repo QDrant-NLP and run: docker-compose up I would like to increase the number of datasets this can be tried on, either with GPU backed lambda functions or by saving many example datasets to S3. So far I've only made a 6K subset of ag_news available. ag_news ยท Datasets at Hugging Face This is the code snippet used to generate the embeddings via hugging-face:
argilla
-
Open-Source Data Collection Platform for LLM Fine-Tuning and RLHF
I'm Dani, CEO and co-founder of Argilla.
Happy to answer any questions you might have and excited to hear your thoughts!
More about Argilla
GitHub: https://github.com/argilla-io/argilla
-
Meet Argilla: An Open-Source Data Curation Platform for Large Language Models (LLMs) and MLOps for Natural Language Processing
Github link: https://github.com/argilla-io/argilla
- Show HN: Argilla and AutoTrain โ Train custom NLP models without code
- Rubrix release 0.17.0 with support for the spaCy training format
-
No training data, no problem! Few-shot NER with a practical example
Rubrix, the open-source tool for data-centric NLP: https://github.com/recognai/rubrix
- [D] Expert Advice is needed on designing a feedback Loop for a (Textual Classification + NER) task in Production.
-
[D] How should a former Web Developer, pursue career in Machine Learning?
E.g. https://github.com/recognai/rubrix
-
[P] Small-Text: Active Learning for Text Classification in Python
I have already thought about providing an example of how to integrate small-text with one of the existing labeling tools, such as rubrix rubrix, but that hasn't been started yet.
- Finding and correcting text classification label errors with cleanlab and Rubrix | https://rubrix.readthedocs.io/en/master/tutorials/find_label_errors.html
- Rubrix: Open-source tool for building NLP training sets (now with weak supervision)
What are some alternatives?
refinery - The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.
snorkel - A system for quickly generating training data with weak supervision
NeoGPT - Your Local AI Assistant: Seamlessly Chat, Execute Commands, and Interpret Code with Local Models for Ultimate Privacy.
label-studio - Label Studio is a multi-type data labeling and annotation tool with standardized output format
fiftyone - The open-source tool for building high-quality datasets and computer vision models
doccano - Open source annotation tool for machine learning practitioners.
Resume-Matcher - Resume Matcher is an open source, free tool to improve your resume. It works by using language models to compare and rank resumes with job descriptions.
cleanlab - The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
small-text - Active Learning for Text Classification in Python
data-centric-ai - Resources for Data Centric AI
trankit - Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
dalle-flow - ๐ A Human-in-the-Loop workflow for creating HD images from text