Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work. Learn more →
Top 7 Python weak-supervision Projects
-
cleanlab
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
I recently published a blog introducing Datalab and an open-source Python implementation that is easy-to-use for all data types (image, text, tabular, audio, etc). For data scientists, I’ve made a quick Jupyter tutorial to run Datalab on your own data.
-
Project mention: [P] We are building a curated list of open source tooling for data-centric AI workflows, looking for contributions. | reddit.com/r/MachineLearning | 2023-03-03
The paid product came out of an open source tool: https://github.com/snorkel-team/snorkel
-
CodiumAI
TestGPT | Generating meaningful tests for busy devs. Get non-trivial tests (and trivial, too!) suggested right inside your IDE, so you can code smart, create more value, and stay confident when you push.
-
Project mention: Meet Argilla: An Open-Source Data Curation Platform for Large Language Models (LLMs) and MLOps for Natural Language Processing | reddit.com/r/machinelearningnews | 2023-05-19
Github link: https://github.com/argilla-io/argilla
-
Project mention: Entity Extraction with Predefined List | reddit.com/r/LanguageTechnology | 2023-01-07
Thanks for pointing me in the right direction. Seems like there’s a few other approaches with weak supervision: https://github.com/NorskRegnesentral/skweak
-
-
-
-
ONLYOFFICE
ONLYOFFICE Docs — document collaboration in your environment. Powerful document editing and collaboration in your app or environment. Ultimate security, API and 30+ ready connectors, SaaS or on-premises
Python weak-supervision related posts
- [P] Datalab: A Linter for ML Datasets
- [N] Fine-Tuning OpenAI Language Models with Noisily Labeled Data (37% error reduction)
- How can you determine the presence and level of noise a dataset has? How much is good?
- Don't let your model suck --- clean your training and testing data!
- Can model still reach high training accuracy if data are bad?
- Handling mislabeled tabular data with xgboost
- Cleanlab: the standard framework for Data-centric AI hits 5,000 GitHub stars!
-
A note from our sponsor - Sonar
www.sonarsource.com | 29 May 2023
Index
What are some of the best open-source weak-supervision projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | cleanlab | 5,935 |
2 | snorkel | 5,495 |
3 | argilla | 1,971 |
4 | skweak | 877 |
5 | wrench | 191 |
6 | weasel | 142 |
7 | zeroshot_topics | 59 |