The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →
Top 10 data-versioning Open-Source Projects
-
Project mention: What I Talk About When I Talk About Query Optimizer (Part 1): IR Design | news.ycombinator.com | 2024-01-29
We implemented a query optimizer with a flexible intermediate representation in pure Go:
https://github.com/dolthub/go-mysql-server
Getting the IR correct so that it's both easy to use and flexible enough to be useful is a really interesting design challenge. Our primary abstraction in the query plan is called a Node, and is way more general than the IR type described in the article from OP. This has probably hurt us: we only recently separated the responsibility to fetch rows into its own part of the runtime, out of the IR -- originally row fetching was coupled to the Node type directly.
This is also the query engine that Dolt uses:
https://github.com/dolthub/dolt
But it has a plug-in architecture, so you can use the engine on any data source that implements a handful of Go interface.
-
wandb
🔥 A tool for visualizing and tracking your machine learning experiments. This repo contains the CLI and Python API.
Project mention: A list of SaaS, PaaS and IaaS offerings that have free tiers of interest to devops and infradev | dev.to | 2024-02-05Weights & Biases — The developer-first MLOps platform. Build better models faster with experiment tracking, dataset versioning, and model management. Free tier for personal projects only, with 100 GB of storage included.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
# Download the LakeFS binary wget https://github.com/treeverse/lakeFS/releases/latest/download/lakefs # Make the binary executable chmod +x lakefs # Initialize LakeFS with S3 as the storage backend ./lakefs init --backend s3 --s3-gateway-endpoint --s3-region --s3-force-path-style --s3-access-key --s3-secret-key
-
-
awesome-open-data-centric-ai
Curated list of open source tooling for data-centric AI on unstructured data.
-
Hey I'm one of the maintainers of https://github.com/BemiHQ/bemi that was recommended in the article. Please feel free to ask me any questions!
-
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
Project mention: Feedback needed: building Git for data that commits only diffs (for storage efficiency on large repositories), even without full checkouts of the datasets | /r/datascience | 2023-05-27
This is was attempted in an R package called gittargets
-
Project mention: Implementing system-versioned tables in Postgres | news.ycombinator.com | 2024-02-07
This is actually what we're @ Bemi https://bemi.io/
We're hoping to make it so that this becomes automatic with Postgres vs moving to specialized databases.
We recently open sourced the core tech that implements system versioned tables https://github.com/BemiHQ/bemi, check it out if interested :)
-
Project mention: Show HN: Demo of using DVC and MLFlow for ML experiments | news.ycombinator.com | 2024-01-29
data-versioning related posts
- Implementing system-versioned tables in Postgres
- Dolt – Git for Data
- Dolt: A version-controlled SQL database
- A Step-by-Step Guide to Implementing Data Version Control
- Transactions in Spark / Delta lake?
- Database branching: three-way merge for schema changes
- Tell me you suck at SQL without telling me you suck at SQL
-
A note from our sponsor - WorkOS
workos.com | 28 Mar 2024
Index
What are some of the best open-source data-versioning projects? This list will help you:
Project | Stars | |
---|---|---|
1 | dolt | 16,676 |
2 | wandb | 8,036 |
3 | lakeFS | 4,022 |
4 | quilt | 1,310 |
5 | awesome-open-data-centric-ai | 664 |
6 | bemi | 101 |
7 | awesome-data-temporality | 96 |
8 | gittargets | 80 |
9 | bemi-prisma | 61 |
10 | bunny-party | 10 |