Python synthetic-dataset-generation

Open-source Python projects categorized as synthetic-dataset-generation

Top 8 Python synthetic-dataset-generation Projects

  • AutoPrompt

    A framework for prompt tuning using Intent-based Prompt Calibration

  • Project mention: FLaNK 04 March 2024 | dev.to | 2024-03-04
  • pygraft

    Configurable Generation of Synthetic Schemas and Knowledge Graphs at Your Fingertips

  • Project mention: PyGraft: Configurable Generation of Schemas and Knowledge Graphs | news.ycombinator.com | 2023-09-13
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • DataDreamer

    DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models.   🤖💤

  • Project mention: FLaNK AI - 01 April 2024 | dev.to | 2024-04-01
  • bonito

    A lightweight library for generating synthetic instruction tuning datasets for your data without GPT. (by BatsResearch)

  • Project mention: FLaNK AI for 11 March 2024 | dev.to | 2024-03-11
  • DoppelGANger

    [IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions

  • DeFMO

    [CVPR 2021] DeFMO: Deblurring and Shape Recovery of Fast Moving Objects

  • VQASynth

    Compose multimodal datasets 🎹

  • Project mention: Show HN: VQASynth – pipelines to synthesize VQA datasets | news.ycombinator.com | 2024-02-23
  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • discus

    A data-centric AI package for ML/AI. Get the best high-quality data for the best results. Discord: https://discord.gg/t6ADqBKrdZ

  • Project mention: an open source package helping developers generate data for LLMs | /r/mlops | 2023-08-02
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Index

What are some of the best open-source synthetic-dataset-generation projects in Python? This list will help you:

Project Stars
1 AutoPrompt 1,637
2 pygraft 639
3 DataDreamer 632
4 bonito 470
5 DoppelGANger 275
6 DeFMO 164
7 VQASynth 71
8 discus 62

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com