Top 4 Python training-data Projects
-
Project mention: [P] We are building a curated list of open source tooling for data-centric AI workflows, looking for contributions. | reddit.com/r/MachineLearning | 2023-03-03
The paid product came out of an open source tool: https://github.com/snorkel-team/snorkel
-
diffgram
Integrate Human Supervision into your Platform. For all Training Data Types, Image, Video, 3D, Text, Geo, Audio, Compound, Grid, LLM, GPT, Conversational, and more.
Project mention: Open source tool scrapes git commit email addresses to send spam to. | reddit.com/r/programming | 2022-07-29 -
InfluxDB
Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.
-
Project mention: Entity Extraction with Predefined List | reddit.com/r/LanguageTechnology | 2023-01-07
Thanks for pointing me in the right direction. Seems like there’s a few other approaches with weak supervision: https://github.com/NorskRegnesentral/skweak
-
compose
A machine learning tool for automated prediction engineering. It allows you to easily structure prediction problems and generate labels for supervised learning. (by alteryx)
Compose Compose targets labeling raw data, allowing you to set labeling functions for your data in Python in order to make the labeling process easier.
Index
What are some of the best open-source training-data projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | snorkel | 5,445 |
2 | diffgram | 1,638 |
3 | skweak | 870 |
4 | compose | 410 |