Top 4 Python training-data Projects
A system for quickly generating training data with weak supervisionProject mention: [P] We are building a curated list of open source tooling for data-centric AI workflows, looking for contributions. | reddit.com/r/MachineLearning | 2023-03-03
The paid product came out of an open source tool: https://github.com/snorkel-team/snorkel
Integrate Human Supervision into your Platform. For all Training Data Types, Image, Video, 3D, Text, Geo, Audio, Compound, Grid, LLM, GPT, Conversational, and more.Project mention: Open source tool scrapes git commit email addresses to send spam to. | reddit.com/r/programming | 2022-07-29
Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.
skweak: A software toolkit for weak supervision applied to NLP tasksProject mention: Entity Extraction with Predefined List | reddit.com/r/LanguageTechnology | 2023-01-07
Thanks for pointing me in the right direction. Seems like there’s a few other approaches with weak supervision: https://github.com/NorskRegnesentral/skweak
A machine learning tool for automated prediction engineering. It allows you to easily structure prediction problems and generate labels for supervised learning. (by alteryx)Project mention: 20+ Free Tools & Resources for Machine Learning | dev.to | 2022-03-31
Compose Compose targets labeling raw data, allowing you to set labeling functions for your data in Python in order to make the labeling process easier.
What are some of the best open-source training-data projects in Python? This list will help you: