mergestat-lite
dolt
mergestat-lite | dolt | |
---|---|---|
10 | 93 | |
3,419 | 16,971 | |
0.3% | 1.7% | |
6.3 | 10.0 | |
3 days ago | 7 days ago | |
Go | Go | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
mergestat-lite
-
SQLite Doesn't Use Git
You can query git with this: https://github.com/mergestat/mergestat if you like the idea.
-
A SQLite extension for reading large files line-by-line
Hey, author here, happy to answer any questions! Also checkout this notebook for a deeper dive into sqlite-lines, along with a slick WASM demonstration and more thoughts on the codebase itself https://observablehq.com/@asg017/introducing-sqlite-lines
I really dig SQLite, and I believe SQLite extensions will push it to another level. I rarely reach for Pandas or other "traditional" tools and query languages, and instead opt for plain ol' SQLite and other extensions. As a shameless plug, I recently started a blog series on SQLite and related tools and extensions if you want to learn more! Next week I'll be publishing more SQLite extensions for parsing HTML + making HTTP requests https://observablehq.com/@asg017/a-new-sqlite-blog-series
A few other SQLite extensions:
- xlite, for reading Excel files, in Rust https://github.com/x2bool/xlite
- sqlean, several small SQLite extensions in C https://github.com/nalgeon/sqlean
- mergestat, several SQLite extensions for developers (mainly Github's API) in Go https://github.com/mergestat/mergestat
- Show HN: Contribution Graph as a Git Command
-
Exploring Git Repos With MergeStat 🔬
mergestat is an open-source tool that allows users to run SQL queries on the contents and history of git repositories.
-
The world of PostgreSQL wire compatibility
Thanks for this write up! I've been really interested in postgres compatibility in the context of a tool I maintain (https://github.com/mergestat/mergestat) that uses SQLite. I've been looking for a way to expose the SQLite capabilities over a more commonly used wire-protocol like postgres (or mysql) so that existing BI and visualization tools can access the data.
This project is an interesting one: https://github.com/dolthub/go-mysql-server that provides a MySQL interface (wire and SQL) to arbitrary "backends" implemented in go.
It's really interesting how compatibility with existing protocols has become an important feature of new databases - there's so much existing tooling that already speaks postgres (or mysql), being able to leverage that is a huge advantage IMO
-
Go library for printing human readable, relative time differences 🕰️
timediff is a Go package for printing human readable, relative time differences. Output is based on ranges defined in the Day.js JavaScript library, and can be customized if needed. It's currently used by the mergestat command-line interface.
- Askgit: Command-line tool for running SQL queries on Git repositories
-
Semantic Git Commit Messages
Assuming committers adhere to it, there could be some interesting use cases when combined with a tool like AskGit (https://github.com/askgitdev/askgit) for understanding what "categories" of work is being done in a codebase.
Maybe even what directories/files tend to see `fix` or `refactor` more frequently (signs of a poorly design or "hot" area?)
-
Git as a NoSql Database
I've been very curious to explore this type of use case with askgit (https://github.com/augmentable-dev/askgit) which was designed for running simple "slice and dice" queries and aggregations on git history (and change stats) for basic analytical purposes. I've been curious about how this could be applied to a small text+git based "db". Say, for a regular json or CSV dumps.
This also reminds me of Dolt: https://github.com/dolthub/dolt which I believe has been on HN a couple times
dolt
-
A MySQL compatible database engine written in pure Go
Hi, this is my project :)
For us this package is most important as the query engine that powers Dolt:
https://github.com/dolthub/dolt
We aren't the original authors but have contributed the vast majority of its code at this point. Here's the origin story if you're interested:
https://www.dolthub.com/blog/2020-05-04-adopting-go-mysql-se...
-
The Great Migration from MongoDB to PostgreSQL
It's a pretty good default stance, yeah.
We have been trying to convince people to use our new database [1] for several years and it's an uphill battle, because Postgres really is the best choice for most people. They really have to need our unique feature (version control) to even consider it over Postgres, and I don't blame them.
[1] https://github.com/dolthub/dolt
-
What I Talk About When I Talk About Query Optimizer (Part 1): IR Design
We implemented a query optimizer with a flexible intermediate representation in pure Go:
https://github.com/dolthub/go-mysql-server
Getting the IR correct so that it's both easy to use and flexible enough to be useful is a really interesting design challenge. Our primary abstraction in the query plan is called a Node, and is way more general than the IR type described in the article from OP. This has probably hurt us: we only recently separated the responsibility to fetch rows into its own part of the runtime, out of the IR -- originally row fetching was coupled to the Node type directly.
This is also the query engine that Dolt uses:
https://github.com/dolthub/dolt
But it has a plug-in architecture, so you can use the engine on any data source that implements a handful of Go interface.
- Dolt – Git for Data
- Dolt: A version-controlled SQL database
-
Show HN: DoltgreSQL – Version-Controlled Database, Like Git and PostgreSQL
Just want to point out that we're announcing development on the project. It's absolutely not ready for mainstream use yet! We have Dolt (https://github.com/dolthub/dolt) which is production-ready and widely in use, but it uses MySQL's syntax and wire protocol. We are building the Dolt equivalent for PostgreSQL, which is DoltgreSQL, but it's only pre-alpha.
-
Pg_branch: Pre-alpha Postgres extension brings Neon-like branching
Interesting that branching is now better supported and almost free. I wonder if merging can be simplified or whether it already is as simple and as fast as it can be?
I guess I am inspired by Dolt’s ability to branch and merge: https://github.com/dolthub/dolt
-
SQLedge: Replicate Postgres to SQLite on the Edge
#. SQLite WAL mode
From https://www.sqlite.org/isolation.html https://news.ycombinator.com/item?id=32247085 :
> [sqlite] WAL mode permits simultaneous readers and writers. It can do this because changes do not overwrite the original database file, but rather go into the separate write-ahead log file. That means that readers can continue to read the old, original, unaltered content from the original database file at the same time that the writer is appending to the write-ahead log
#. superfly/litefs: aFUSE-based file system for replicating SQLite https://github.com/superfly/litefs
#. sqldiff: https://www.sqlite.org/sqldiff.html https://news.ycombinator.com/item?id=31265005
#. dolthub/dolt: https://github.com/dolthub/dolt
> Dolt can be set up as a replica of your existing MySQL or MariaDB database using standard MySQL binlog replication. Every write becomes a Dolt commit. This is a great way to get the version control benefits of Dolt and keep an existing MySQL or MariaDB database.
#. pganalyze/libpg_query: https://github.com/pganalyze/libpg_query :
> C library for accessing the PostgreSQL parser outside of the server environment
#. Ibis + Substrait [ + DuckDB ]
> ibis strives to provide a consistent interface for interacting with a multitude of different analytical execution engines, most of which (but not all) speak some dialect of SQL.
> Today, Ibis accomplishes this with a lot of help from `sqlalchemy` and `sqlglot` to handle differences in dialect, or we interact directly with available Python bindings (for instance with the pandas, datafusion, and polars backends).
> [...] `Substrait` is a new cross-language serialization format for communicating (among other things) query plans. It's still in its early days, but there is already nascent support for Substrait in Apache Arrow, DuckDB, and Velox.
#. benbjohnson/postlite: https://github.com/benbjohnson/postlite
> postlite is a network proxy to allow access to remote SQLite databases over the Postgres wire protocol. This allows GUI tools to be used on remote SQLite databases which can make administration easier.
> The proxy works by translating Postgres frontend wire messages into SQLite transactions and converting results back into Postgres response wire messages. Many Postgres clients also inspect the pg_catalog to determine system information so Postlite mirrors this catalog by using an attached in-memory database with virtual tables. The proxy also performs minor rewriting on these system queries to convert them to usable SQLite syntax.
> Note: This software is in alpha. Please report bugs. Postlite doesn't alter your database unless you issue INSERT, UPDATE, DELETE commands so it's probably safe. If anything, the Postlite process may die but it shouldn't affect your database.
#. > "Hosting SQLite Databases on GitHub Pages" (2021) re: sql.js-httpvfs, DuckDB https://news.ycombinator.com/item?id=28021766
#. awesome-db-tools https://github.com/mgramin/awesome-db-tools
- How do you sync dev databases across multiple devices?
-
Ask HN: Data Management for AI Training
If you are just looking for data versioning there is Dolt:
https://github.com/dolthub/dolt
And that has a user-friendly UI in DoltHub:
https://www.dolthub.com/
You wouldn't store the images themselves in Dolt, those would likely be links to S3 but al the labels and surrounding metadata could be stored in Dolt?
DISCLAIMER: I'm the CEO of DoltHub so this is self-promotion.
What are some alternatives?
git-xargs - git-xargs is a command-line tool (CLI) for making updates across multiple Github repositories with a single command.
liquibase - Main Liquibase Source
crux - General purpose bitemporal database for SQL, Datalog & graph queries. Backed by @juxt [Moved to: https://github.com/xtdb/xtdb]
absurd-sql - sqlite3 in ur indexeddb (hopefully a better backend soon)
flan - A tasty tool that lets you save, load and share postgres snapshots with ease
noms - The versioned, forkable, syncable database
sqlite-plus - The ultimate set of SQLite extensions
TimescaleDB - An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
csv-sql - Command-line tool to load csv and excel (xlsx) files and run sql commands
vitess - Vitess is a database clustering system for horizontal scaling of MySQL.
datasette-lite - Datasette running in your browser using WebAssembly and Pyodide
temporal_tables - Temporal Tables PostgreSQL Extension