Top 7 Python synthetic-data-generation Projects
-
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
-
-
MOSTLY AI has open-sourced its powerful Synthetic Data SDK, enabling you to create privacy-preserving, AI-generated synthetic data directly from your existing datasets—all within your secure environments.
Key Features:
Broad Data Support: Handle mixed data types (categorical, numerical, geospatial, text), single/multi-table datasets & time-series data.
Multiple Model Types: Leverage TabularARGN (SOTA for tabular data), fine-tuned HuggingFace models, and efficient LSTM for text generation.
Advanced Training Options: CPU/GPU support, differential privacy, and real-time progress monitoring.
Automated Quality Assurance: Built-in fidelity & privacy metrics with detailed HTML reports for visual data analysis.
Flexible Sampling: Upsample data, generate conditionally, rebalance segments, impute context-aware values, ensure fairness, and control outputs via temperature adjustments.
Seamless Integration: Connect effortlessly to external databases & cloud storage with a fully permissive open-source license.
Check out the SDK on GitHub: https://github.com/mostly-ai/mostlyai
-
DoppelGANger
[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
-
-
I made a tool because getting datasets for NLP or tabular data is tough. It uses an AI API to generate synthetic data. You can define columns with names, types, and prompts, and set the number of rows, up to 50,000 or more as much as you need. It’s in Python with a basic interface. It’s on GitHub here: https://github.com/VoxDroid/Zylthra. I needed it for some work, and it does the job. If anyone tries it, let me know what’s off.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives