SaaSHub helps you find the best software and product alternatives Learn more →
Top 10 data-versioning Open-Source Projects
-
wandb
🔥 A tool for visualizing and tracking your machine learning experiments. This repo contains the CLI and Python API.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
awesome-open-data-centric-ai
Curated list of open source tooling for data-centric AI on unstructured data.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Project mention: A MySQL compatible database engine written in pure Go | news.ycombinator.com | 2024-04-09Hi, this is my project :)
For us this package is most important as the query engine that powers Dolt:
https://github.com/dolthub/dolt
We aren't the original authors but have contributed the vast majority of its code at this point. Here's the origin story if you're interested:
https://www.dolthub.com/blog/2020-05-04-adopting-go-mysql-se...
Project mention: A list of SaaS, PaaS and IaaS offerings that have free tiers of interest to devops and infradev | dev.to | 2024-02-05Weights & Biases — The developer-first MLOps platform. Build better models faster with experiment tracking, dataset versioning, and model management. Free tier for personal projects only, with 100 GB of storage included.
# Download the LakeFS binary wget https://github.com/treeverse/lakeFS/releases/latest/download/lakefs # Make the binary executable chmod +x lakefs # Initialize LakeFS with S3 as the storage backend ./lakefs init --backend s3 --s3-gateway-endpoint --s3-region --s3-force-path-style --s3-access-key --s3-secret-key
Hey I'm one of the maintainers of https://github.com/BemiHQ/bemi that was recommended in the article. Please feel free to ask me any questions!
Project mention: Feedback needed: building Git for data that commits only diffs (for storage efficiency on large repositories), even without full checkouts of the datasets | /r/datascience | 2023-05-27This is was attempted in an R package called gittargets
Project mention: Implementing system-versioned tables in Postgres | news.ycombinator.com | 2024-02-07This is actually what we're @ Bemi https://bemi.io/
We're hoping to make it so that this becomes automatic with Postgres vs moving to specialized databases.
We recently open sourced the core tech that implements system versioned tables https://github.com/BemiHQ/bemi, check it out if interested :)
Project mention: Show HN: Demo of using DVC and MLFlow for ML experiments | news.ycombinator.com | 2024-01-29
data-versioning related posts
- The Great Migration from MongoDB to PostgreSQL
- Implementing system-versioned tables in Postgres
- Dolt – Git for Data
- Dolt: A version-controlled SQL database
- A Step-by-Step Guide to Implementing Data Version Control
- Transactions in Spark / Delta lake?
- Database branching: three-way merge for schema changes
-
A note from our sponsor - SaaSHub
www.saashub.com | 27 Apr 2024
Index
What are some of the best open-source data-versioning projects? This list will help you:
Project | Stars | |
---|---|---|
1 | dolt | 16,971 |
2 | wandb | 8,190 |
3 | lakeFS | 4,066 |
4 | quilt | 1,313 |
5 | awesome-open-data-centric-ai | 677 |
6 | bemi | 135 |
7 | awesome-data-temporality | 96 |
8 | gittargets | 81 |
9 | bemi-prisma | 64 |
10 | bunny-party | 10 |
Sponsored