|about 20 hours ago||6 days ago|
|Apache License 2.0||Apache License 2.0|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
I created an in-memory SQL database called MemSQL as a learning project
3 projects | /r/golang | 30 Mar 2023
Might be interested in https://github.com/dolthub/go-mysql-server, which also does this
Implementing the MySQL server protocol for fun and profit
2 projects | news.ycombinator.com | 22 Dec 2022
One item under "Scope of this project":
Provide a runnable server speaking the MySQL wire protocol, connected to data sources of your choice.
MySQL-mimic - Python implementation of the MySQL server wire protocol.
4 projects | /r/Python | 31 Oct 2022
7 projects | news.ycombinator.com | 22 Aug 2022
Litetree – SQLite with Branches
3 projects | news.ycombinator.com | 22 Jul 2022
I just wanted to say thanks for https://github.com/dolthub/go-mysql-server
This is incredibly useful for anyone who wants to build their own DB or wrap another datasource so it's queryable via MySQL protocol.
Dolt Is Git for Data
a very cool project they also maintain is a MySQL server framework for arbitrary backends (in Go): https://github.com/dolthub/go-mysql-server
You can define a "virtual" table (schema, how to retrieve rows/columns) and then a MySQL client can connect and execute arbitrary queries on your table (which could just be an API or other source)
The world of PostgreSQL wire compatibility
3 projects | news.ycombinator.com | 10 Feb 2022
Thanks for this write up! I've been really interested in postgres compatibility in the context of a tool I maintain (https://github.com/mergestat/mergestat) that uses SQLite. I've been looking for a way to expose the SQLite capabilities over a more commonly used wire-protocol like postgres (or mysql) so that existing BI and visualization tools can access the data.
This project is an interesting one: https://github.com/dolthub/go-mysql-server that provides a MySQL interface (wire and SQL) to arbitrary "backends" implemented in go.
It's really interesting how compatibility with existing protocols has become an important feature of new databases - there's so much existing tooling that already speaks postgres (or mysql), being able to leverage that is a huge advantage IMO
calling Format() on a time struct in a golang program changes the default Location's timezone information in the rest of the program
4 projects | /r/programming | 3 Sep 2021
Let's write a compiler, part 5: A code generator
14 projects | news.ycombinator.com | 19 Aug 2021
Jujutsu: A Git-compatible DVCS that is both simple and powerful
11 projects | news.ycombinator.com | 31 Jul 2023
Might want to look at purpose built tools for that such as lakeFS (https://github.com/treeverse/lakeFS/)
* Disclaimer: I'm one of the creators/maintainers of the project.
Data diffs: Algorithms for explaining what changed in a dataset (2022)
8 projects | news.ycombinator.com | 26 Jul 2023
Might want to checkout lakeFS: https://github.com/treeverse/lakeFS
(full disclosure: I'm one of the creators)
Dolt Is Git for Data
Also in the same vein, check out https://lakefs.io/
[P] ArtiV: Version control system for large files
2 projects | /r/MachineLearning | 8 Mar 2022
Data Science Workflows — Notebook to Production
7 projects | dev.to | 8 Feb 2022
Git was designed for managing software development projects and for versioning text/code files. Therefore, Git doesn’t handle large files. Git released Git LFS (Large File System) to overcome large file versioning, which is better than Git, but fails when scaling. Also, both Git and Git LFS are not optimized for data science workflow. To overcome this challenge, many powerful tools emerged in recent years, such as DVC, Delta Lake, LakeFS, and more.
Unstructured Data Governance for ML
4 projects | /r/dataengineering | 31 Dec 2021
LakeFS Turns 1 and Raises 15M in a Week: (Enable Git for Large-Scale Data Lakes)
2 projects | news.ycombinator.com | 8 Aug 2021
We're Oz and Einat, co-founders of lakeFS (https://lakefs.io/), an open-source project that allows the creation of performant git-like repositories over an object store (i.e. S3).
Prior to starting lakeFS we were VP of R&D and CTO at SimilarWeb, a (now-public) Israeli web analytics company whose business model is based on the collection and analysis of the internet's activity.2 projects | news.ycombinator.com | 8 Aug 2021
Recovering from a pernicious error in a million S3 files shouldn't require a full day or even week of work to fix… instead let's make it an instantaneous revert operation to a previous commit.
The challenge to implement this type of functionality is a technical one, one we took it upon ourselves to solve. It's been 1 year since the first public commit on lakeFS and we've now raised a $15M Series A to continue building and improving our vision.
We've evolved a ton in the past year, completely refactoring the data model to remove dependency on Postgres. Fittingly, we now use rocksDB on the object store to persist the metadata lakeFS manages (with the added benefit of simplifying the installation process). Check out the roadmap to follow our progress on building out native integrations with other important technologies in the open data stack such as Spark, Hive Metastore, and Delta Lake.
We encourage you to check out our Github repo: (https://github.com/treeverse/lakeFS) and documentation pages: (https://docs.lakefs.io/).
We're proud of how far we've come, but know there's lots more work to do. Please do let us know your thoughts!
Gopher Gold #14 - Wed Oct 07 2020
22 projects | dev.to | 7 Oct 2020
treeverse/lakeFS (Go): An open source platform that delivers resilience and manageability to object-storage based data lakes
What are some alternatives?
dvc - 🦉 ML Experiments Management with Git
vitess-sqlparser - simply SQL Parser for Go ( powered by vitess and TiDB )
delta - An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
git-lfs - Git extension for versioning large files
Ory Kratos - Next-gen identity server (think Auth0, Okta, Firebase) with Ory-hardened authentication, PassKeys, MFA, FIDO2, TOTP, WebAuthn, profile management, identity schemas, social sign in, registration, account recovery, passwordless. Golang, headless, API-only - without templating or theming headaches. Available as a cloud service.
MLflow - Open source platform for the machine learning lifecycle
duf - Disk Usage/Free Utility - a better 'df' alternative
helm-operator - Successor: https://github.com/fluxcd/helm-controller — The Flux Helm Operator, once upon a time a solution for declarative Helming.
spark-on-k8s-operator - Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Hey - HTTP load generator, ApacheBench (ab) replacement
quilt - Quilt is a data mesh for connecting people with actionable data