SaaSHub helps you find the best software and product alternatives Learn more →
Top 13 Python dataset-generation Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
bpycv
Computer vision utils for Blender (generate instance annoatation, depth and 6D pose by one line code)
-
DoppelGANger
[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
-
stopes
A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB team.
-
Bamboo
Bamboo: 4 times larger than ImageNet; 2 time larger than Object365; Built by active learning. (by ZhangYuanhan-AI)
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
pyreports
pyreports is a python library that allows you to create complex report from various sources
-
crypto-trading-strategy-backtester
Easy-to-use cryptocurrency trading strategy simulator and backtester
-
docker-packing-box
Docker image gathering packers and tools for making datasets of packed executables and training machine learning models for packing detection
-
Rackfocus
Schedulable command line utility to download and compile IMDb datasets in a highly browsable SQLite database file
Project mention: Bpycv: Computer Vision and Deep Learning Utils for Blender | news.ycombinator.com | 2023-09-03
Project mention: Show HN: VQASynth – pipelines to synthesize VQA datasets | news.ycombinator.com | 2024-02-23
Python dataset-generation related posts
- [R] [P] I generated a 30K-utterance dataset by making GPT-4 prompt two ChatGPT instances to converse.
- datasetGPT is a command-line interface and a Python library for inferencing Large Language Models to generate textual datasets. (Regenerative feedback loops)
- [P] two copies of gpt-3.5 (one playing as the oracle, and another as the guesser) performs poorly on the game of 20 Questions (68/1823).
- DatasetGPT - A command-line interface to generate textual and conversational datasets with LLMs.
- DatasetGPT – an open-source command line tool for generating datasets with LLMs
- Web scraper ideas
- DoppelGANger: NEW Data - star count:187.0
-
A note from our sponsor - SaaSHub
www.saashub.com | 25 Apr 2024
Index
What are some of the best open-source dataset-generation projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | nfstream | 1,042 |
2 | MaskTheFace | 552 |
3 | bpycv | 453 |
4 | DoppelGANger | 275 |
5 | datasetGPT | 272 |
6 | stopes | 238 |
7 | Bamboo | 161 |
8 | pyreports | 97 |
9 | VQASynth | 71 |
10 | crypto-trading-strategy-backtester | 66 |
11 | docker-packing-box | 42 |
12 | clean-discord | 22 |
13 | Rackfocus | 8 |
Sponsored