awesome-open-data-centric-ai
awesome-synthetic-data
awesome-open-data-centric-ai | awesome-synthetic-data | |
---|---|---|
1 | 1 | |
678 | 100 | |
- | 10.0% | |
5.8 | 10.0 | |
6 months ago | almost 2 years ago | |
Creative Commons Attribution 4.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
awesome-open-data-centric-ai
-
[P] We are building a curated list of open source tooling for data-centric AI workflows, looking for contributions.
Here is the link to the Github repo: https://github.com/Renumics/awesome-open-data-centric-ai Do you think there are tools missing? Please let me know or feel free to submit a pull request.
awesome-synthetic-data
-
Synthetic Data
Btw, there's this Awesome list here with some nice pointers to interesting material for synthetic data.
What are some alternatives?
internet-explorer - Internet Explorer explores the web in a self-supervised manner to progressively find relevant examples that improve performance on a desired target dataset.
spotlight - Interactively explore unstructured datasets from your dataframe.
Encord Active - Open source active learning toolkit to find failure modes in your computer vision models, prioritize data to label next, and drive data curation to improve model performance.
WhereIsAI - AI company, product, and tool collection.
Awesome-Learning-with-Label-Noise - A curated list of resources for Learning with Noisy Labels
cleanlab - The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
refinery - The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.