SaaSHub helps you find the best software and product alternatives Learn more →
Top 9 data-version-control Open-Source Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
deeplake
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Project mention: A MySQL compatible database engine written in pure Go | news.ycombinator.com | 2024-04-09Hi, this is my project :)
For us this package is most important as the query engine that powers Dolt:
https://github.com/dolthub/dolt
We aren't the original authors but have contributed the vast majority of its code at this point. Here's the origin story if you're interested:
https://www.dolthub.com/blog/2020-05-04-adopting-go-mysql-se...
Collaboration and version control are crucial in AI/ML development projects due to the iterative nature of model development and the need for reproducibility. GitHub is the leading platform for source code management, allowing teams to collaborate on code, track issues, and manage project milestones. DVC (Data Version Control) complements Git by handling large data files, data sets, and machine learning models that Git can't manage effectively, enabling version control for the data and model files used in AI projects.
# Download the LakeFS binary wget https://github.com/treeverse/lakeFS/releases/latest/download/lakefs # Make the binary executable chmod +x lakefs # Initialize LakeFS with S3 as the storage backend ./lakefs init --backend s3 --s3-gateway-endpoint --s3-region --s3-force-path-style --s3-access-key --s3-secret-key
Project mention: Show HN: Loofi – Our AI-Powered SQL Query Builder | news.ycombinator.com | 2023-05-21
Project mention: Feedback needed: building Git for data that commits only diffs (for storage efficiency on large repositories), even without full checkouts of the datasets | /r/datascience | 2023-05-27This is was attempted in an R package called gittargets
data-version-control related posts
- The Great Migration from MongoDB to PostgreSQL
- Dolt – Git for Data
- Dolt: A version-controlled SQL database
- Git Version Controlled Datasets in S3
- Ask HN: How do your ML teams version datasets and models?
- Show HN: Loofi – Our AI-Powered SQL Query Builder
- How do you sync dev databases across multiple devices?
-
A note from our sponsor - SaaSHub
www.saashub.com | 27 Apr 2024
Index
What are some of the best open-source data-version-control projects? This list will help you:
Project | Stars | |
---|---|---|
1 | dolt | 16,971 |
2 | dvc | 13,116 |
3 | deeplake | 7,708 |
4 | lakeFS | 4,066 |
5 | quilt | 1,313 |
6 | sgr | 326 |
7 | awesome-data-temporality | 96 |
8 | gittargets | 81 |
9 | ZnTrack | 41 |
Sponsored