data-generation

Top 19 data-generation Open-Source Projects

  • Grounded-Segment-Anything

    Grounded-SAM: Marrying Grounding-DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

  • Project mention: Tooling for bulk image data set manipulation? | /r/computervision | 2023-06-27
  • generatedata

    A powerful, feature-rich, random test data generator.

  • Project mention: Generate any sort of random data in any format | news.ycombinator.com | 2023-12-05
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • SDV

    Synthetic data generation for tabular data

  • Project mention: Synthetic data generation for tabular data | news.ycombinator.com | 2024-02-27

    Can someone help me understand the licensing of this?

    https://github.com/sdv-dev/SDV/blob/main/LICENSE

    It was MIT licensed up until 2022 where it was changed to what it is now, where they say that it will become MIT again 4 years after release... but is that from when the license was changed or the first release of the software in GitHub?

  • CTGAN

    Conditional GAN for generating synthetic tabular data.

  • Project mention: Ctgan: Generating synthetic data in Python using GANs | news.ycombinator.com | 2024-02-05
  • StreamData

    Data generation and property-based testing for Elixir. 🔮

  • Mockneat

    MockNeat - the modern faker lib.

  • regexp-examples

    Generate strings that match a given regular expression

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • Copulas

    A library to model multivariate data using copulas.

  • genalog

    Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.

  • REaLTabFormer

    A suite of auto-regressive and Seq2Seq (sequence-to-sequence) transformer models for tabular and relational synthetic data generation.

  • rapiddweller-benerator-ce

    BENERATOR is a leading software solution to generate, obfuscate, pseudonymize and migrate data for development, testing, and training purposes with a model-driven approach.

  • awesome-synthetic-data

    📖 A curated list of resources dedicated to synthetic data (by gretelai)

  • DeepEcho

    Synthetic Data Generation for mixed-type, multivariate time series.

  • Project mention: DeepEcho: Synthetic Data Generation Library | news.ycombinator.com | 2024-02-05
  • mockingbird

    Mockingbird is a mock streaming data generator (by tinybirdco)

  • Project mention: Streaming analytics | /r/dataengineering | 2023-07-06

    You could generate synthetic data to build your dashboard, either with normal Python or something like https://github.com/tinybirdco/mockingbird. Or, get some old data and have a script push it row by row into Kafka to emulate a stream.

  • hypothesis-graphql

    Generate arbitrary queries matching your GraphQL schema, and use them to verify your backend implementation.

  • trainer

    Simple interface to synthesize complex and highly dimensional datasets using Gretel APIs. (by gretelai)

  • tdk-demo

    This is a collection of TDK demo projects that use different databases and options

  • data-caterer

    Data generation and validation tool for any data source

  • Project mention: Show HN: Data Caterer – Data generation and validation tool | news.ycombinator.com | 2024-03-22
  • dummPy

    An application that produces fake (dummy) data for Data analysis practice.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

data-generation related posts

Index

What are some of the best open-source data-generation projects? This list will help you:

Project Stars
1 Grounded-Segment-Anything 13,481
2 generatedata 2,177
3 SDV 2,141
4 CTGAN 1,140
5 StreamData 841
6 Mockneat 523
7 regexp-examples 520
8 Copulas 504
9 genalog 295
10 REaLTabFormer 183
11 rapiddweller-benerator-ce 128
12 awesome-synthetic-data 100
13 DeepEcho 88
14 mockingbird 73
15 hypothesis-graphql 40
16 trainer 28
17 tdk-demo 16
18 data-caterer 14
19 dummPy 0

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com