synthetic-data

Top 23 synthetic-data Open-Source Projects

  • machine-learning-for-trading

    Code for Machine Learning for Algorithmic Trading, 2nd edition.

  • Project mention: Machine Learning for Trading: Notebooks, resources and references accompanying the book Machine Learning for Algorithmic Trading. Courses - star count:10678.0 | /r/algoprojects | 2023-11-20
  • Mimesis

    Mimesis is a powerful Python library that empowers developers to generate massive amounts of synthetic data efficiently.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • BlenderProc

    A procedural Blender pipeline for photorealistic training image generation

  • SDV

    Synthetic data generation for tabular data

  • Project mention: Synthetic data generation for tabular data | news.ycombinator.com | 2024-02-27

    Can someone help me understand the licensing of this?

    https://github.com/sdv-dev/SDV/blob/main/LICENSE

    It was MIT licensed up until 2022 where it was changed to what it is now, where they say that it will become MIT again 4 years after release... but is that from when the license was changed or the first release of the software in GitHub?

  • synthea

    Synthetic Patient Population Simulator

  • Project mention: Survey on Synthea Use to Shape the Future of Open Source Medical Records | news.ycombinator.com | 2023-06-21
  • unrealcv

    UnrealCV: Connecting Computer Vision to Unreal Engine

  • ydata-synthetic

    Synthetic data generators for tabular and time-series data

  • Project mention: Coding Wonderland: Contribute to YData Profiling and YData Synthetic in this Advent of Code | dev.to | 2023-12-05

    Send us your North ⭐️: "On the first day of Christmas, my true contributor gave to me..." a star in my GitHub tree! 🎵 If you love these projects too, star ydata-profiling or ydata-synthetic and let your friends know why you love it so much!

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • CTGAN

    Conditional GAN for generating synthetic tabular data.

  • Project mention: Ctgan: Generating synthetic data in Python using GANs | news.ycombinator.com | 2024-02-05
  • SkinDeep

    Get Deinked!!

  • awesome-open-data-centric-ai

    Curated list of open source tooling for data-centric AI on unstructured data.

  • pygraft

    Configurable Generation of Synthetic Schemas and Knowledge Graphs at Your Fingertips

  • Project mention: PyGraft: Configurable Generation of Schemas and Knowledge Graphs | news.ycombinator.com | 2023-09-13
  • DataDreamer

    DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models.   🤖💤

  • Project mention: FLaNK AI - 01 April 2024 | dev.to | 2024-04-01
  • gretel-synthetics

    Synthetic data generators for structured and unstructured text, featuring differentially private learning.

  • Project mention: Ask HN: If we train an LLM with “data” instead of “language” tokens | news.ycombinator.com | 2023-08-16

    Hey there! Co-founder of Gretel.ai here, and I think I can provide some insights on this topic.

    Firstly, the concept you're hinting at is not purely traditional ML. In traditional machine learning, we often prioritize feature extraction and engineering specific to a given problem space before training.

    What you're describing and what we've been working on at Gretel.ai, is leveraging the power of models like Large Language Models (LLMs) to understand and extrapolate from vast amounts of diverse data without the need for time-consuming feature engineering. Here's a link to our open-source library https://github.com/gretelai/gretel-synthetics for synthetic data generation (currently supporting GAN and RNN-based language models), and also our recent announcement around a Tabular LLM we're training to help people build with data https://gretel.ai/tabular-llm

    A few areas where we've found tabular or Large Data Models to be really useful are:

  • Copulas

    A library to model multivariate data using copulas.

  • bonito

    A lightweight library for generating synthetic instruction tuning datasets for your data without GPT. (by BatsResearch)

  • Project mention: FLaNK AI for 11 March 2024 | dev.to | 2024-03-11
  • synthcity

    A library for generating and evaluating synthetic tabular data for privacy, fairness and data augmentation.

  • SynthDet

    SynthDet - An end-to-end object detection pipeline using synthetic data

  • awesome-data-centric-ai

    Open-Source Software, Tutorials, and Research on Data-Centric AI 🤖

  • Project mention: Thoughts: Continue current degree with one year left, or start anew with degree apprenticeship | /r/cscareerquestionsuk | 2023-07-13

    I would finish the degree anyway. It's only one year left. If teachers miss classes, I would disregard that and try to learn on my own, and then yes, I would move on to an internship (or even do It at the same time if it's possible). If you like, come as meet us at the Data-Centric AI Community and we can do some projects together :)

  • genalog

    Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.

  • PeopleSansPeople

    Unity's privacy-preserving human-centric synthetic data generator

  • zpy

    Synthetic data for computer vision. An open source toolkit using Blender and Python.

  • DoppelGANger

    [IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions

  • Robotics-Object-Pose-Estimation

    A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

synthetic-data related posts

Index

What are some of the best open-source synthetic-data projects? This list will help you:

Project Stars
1 machine-learning-for-trading 11,797
2 Mimesis 4,304
3 BlenderProc 2,544
4 SDV 2,141
5 synthea 2,002
6 unrealcv 1,829
7 ydata-synthetic 1,292
8 CTGAN 1,140
9 SkinDeep 930
10 awesome-open-data-centric-ai 677
11 pygraft 639
12 DataDreamer 646
13 gretel-synthetics 533
14 Copulas 504
15 bonito 482
16 synthcity 354
17 SynthDet 350
18 awesome-data-centric-ai 302
19 genalog 295
20 PeopleSansPeople 293
21 zpy 288
22 DoppelGANger 275
23 Robotics-Object-Pose-Estimation 263

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com