Top 8 Python synthetic-dataset-generation Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
bonito
A lightweight library for generating synthetic instruction tuning datasets for your data without GPT. (by BatsResearch)
-
DoppelGANger
[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
discus
A data-centric AI package for ML/AI. Get the best high-quality data for the best results. Discord: https://discord.gg/t6ADqBKrdZ
Project mention: PyGraft: Configurable Generation of Schemas and Knowledge Graphs | news.ycombinator.com | 2023-09-13
Project mention: Show HN: VQASynth – pipelines to synthesize VQA datasets | news.ycombinator.com | 2024-02-23
Project mention: an open source package helping developers generate data for LLMs | /r/mlops | 2023-08-02
Index
What are some of the best open-source synthetic-dataset-generation projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | AutoPrompt | 1,637 |
2 | pygraft | 639 |
3 | DataDreamer | 632 |
4 | bonito | 470 |
5 | DoppelGANger | 275 |
6 | DeFMO | 164 |
7 | VQASynth | 71 |
8 | discus | 62 |
Sponsored