SaaSHub helps you find the best software and product alternatives Learn more →
Top 15 synthetic-dataset-generation Open-Source Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
bonito
A lightweight library for generating synthetic instruction tuning datasets for your data without GPT. (by BatsResearch)
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
DoppelGANger
[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
-
REaLTabFormer
A suite of auto-regressive and Seq2Seq (sequence-to-sequence) transformer models for tabular and relational synthetic data generation.
-
discus
A data-centric AI package for ML/AI. Get the best high-quality data for the best results. Discord: https://discord.gg/t6ADqBKrdZ
-
nist-crc-2023
NIST Collaborative Research Cycle on Synthetic Data. Learn about Synthetic Data week by week!
-
synthetic-dataset-object-detection
How to Create Synthetic Dataset for Computer Vision (Object Detection) (Article on Medium)
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: PyGraft: Configurable Generation of Schemas and Knowledge Graphs | news.ycombinator.com | 2023-09-13
Project mention: Show HN: VQASynth – pipelines to synthesize VQA datasets | news.ycombinator.com | 2024-02-23
Project mention: an open source package helping developers generate data for LLMs | /r/mlops | 2023-08-02
Project mention: Assessing the Quality of Synthetic Data with Data-centric AI | /r/ArtificialInteligence | 2023-07-13Data Quality is key for all applications and models, and LLMs are no exception :) I've been working on a small community project with synthetic data using ydata-synthetic, and it really shows! Underrepresentation (category imbalance) and missing data are two of the main issues!
synthetic-dataset-generation related posts
-
The cute demo if you want to generate test data for your DB
-
The cute demo if you want to generate test data for your DB
-
World Bank Researchers Open Source REaLTabFormer: A Tabular and Relational Synthetic Data Generation Model
-
REaLTabFormer: Generating realistic synthetic data using GPT in Python
-
Show HN: REaLTabFormer – GPT-based synthetic data generator
-
A note from our sponsor - SaaSHub
www.saashub.com | 21 May 2024
Index
What are some of the best open-source synthetic-dataset-generation projects? This list will help you:
Project | Stars | |
---|---|---|
1 | AutoPrompt | 1,716 |
2 | com.unity.perception | 878 |
3 | DataDreamer | 681 |
4 | pygraft | 641 |
5 | bonito | 527 |
6 | SynthDet | 352 |
7 | PeopleSansPeople | 295 |
8 | DoppelGANger | 277 |
9 | REaLTabFormer | 184 |
10 | DeFMO | 164 |
11 | VQASynth | 82 |
12 | discus | 60 |
13 | nist-crc-2023 | 27 |
14 | synthetic-dataset-object-detection | 20 |
15 | tdk-demo | 17 |
Sponsored