mergestat-lite
datasette-lite
mergestat-lite | datasette-lite | |
---|---|---|
10 | 10 | |
3,419 | 308 | |
0.3% | - | |
6.3 | 5.4 | |
3 days ago | about 1 month ago | |
Go | HTML | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
mergestat-lite
-
SQLite Doesn't Use Git
You can query git with this: https://github.com/mergestat/mergestat if you like the idea.
-
A SQLite extension for reading large files line-by-line
Hey, author here, happy to answer any questions! Also checkout this notebook for a deeper dive into sqlite-lines, along with a slick WASM demonstration and more thoughts on the codebase itself https://observablehq.com/@asg017/introducing-sqlite-lines
I really dig SQLite, and I believe SQLite extensions will push it to another level. I rarely reach for Pandas or other "traditional" tools and query languages, and instead opt for plain ol' SQLite and other extensions. As a shameless plug, I recently started a blog series on SQLite and related tools and extensions if you want to learn more! Next week I'll be publishing more SQLite extensions for parsing HTML + making HTTP requests https://observablehq.com/@asg017/a-new-sqlite-blog-series
A few other SQLite extensions:
- xlite, for reading Excel files, in Rust https://github.com/x2bool/xlite
- sqlean, several small SQLite extensions in C https://github.com/nalgeon/sqlean
- mergestat, several SQLite extensions for developers (mainly Github's API) in Go https://github.com/mergestat/mergestat
- Show HN: Contribution Graph as a Git Command
-
Exploring Git Repos With MergeStat ๐ฌ
mergestat is an open-source tool that allows users to run SQL queries on the contents and history of git repositories.
-
The world of PostgreSQL wire compatibility
Thanks for this write up! I've been really interested in postgres compatibility in the context of a tool I maintain (https://github.com/mergestat/mergestat) that uses SQLite. I've been looking for a way to expose the SQLite capabilities over a more commonly used wire-protocol like postgres (or mysql) so that existing BI and visualization tools can access the data.
This project is an interesting one: https://github.com/dolthub/go-mysql-server that provides a MySQL interface (wire and SQL) to arbitrary "backends" implemented in go.
It's really interesting how compatibility with existing protocols has become an important feature of new databases - there's so much existing tooling that already speaks postgres (or mysql), being able to leverage that is a huge advantage IMO
-
Go library for printing human readable, relative time differences ๐ฐ๏ธ
timediff is a Go package for printing human readable, relative time differences. Output is based on ranges defined in the Day.js JavaScript library, and can be customized if needed. It's currently used by the mergestat command-line interface.
- Askgit: Command-line tool for running SQL queries on Git repositories
-
Semantic Git Commit Messages
Assuming committers adhere to it, there could be some interesting use cases when combined with a tool like AskGit (https://github.com/askgitdev/askgit) for understanding what "categories" of work is being done in a codebase.
Maybe even what directories/files tend to see `fix` or `refactor` more frequently (signs of a poorly design or "hot" area?)
-
Git as a NoSql Database
I've been very curious to explore this type of use case with askgit (https://github.com/augmentable-dev/askgit) which was designed for running simple "slice and dice" queries and aggregations on git history (and change stats) for basic analytical purposes. I've been curious about how this could be applied to a small text+git based "db". Say, for a regular json or CSV dumps.
This also reminds me of Dolt: https://github.com/dolthub/dolt which I believe has been on HN a couple times
datasette-lite
-
Sqlime: Online SQLite Playground
Also see: https://github.com/simonw/datasette-lite
- Use SQL Without Databases
-
GitHub โ GSA/code-gov: An informative repo for all Code.gov repos
https://github.com/simonw/datasette-lite :
> You can use this tool to open any SQLite database file that is hosted online and served with a `access-control-allow-origin: ` CORS header. Files served by GitHub Pages automatically include this header, as do database files that have been published online using `datasette publish`.*
> [...] You can paste in the "raw" URL to a file, but Datasette Lite also has a shortcut: if you paste in the URL to a page on GitHub or a Gist it will automatically convert it to the "raw" URL for you
> To load a Parquet file, pass a URL to `?parquet=`
> [...] https://lite.datasette.io/?parquet=https://github.com/Terada...*
There are various *-to-sqlite utilities that load data into a SQLite database for use with e.g. datasette. E.g. Pandas with `dtype_backend='arrow'` saves to Parquet.
datasette plugins are written in Python and/or JS w/ pluggy:
-
[SQLlite] Is there any online SQL editor I can host on my website? Maybe something in JS or php
Datasette Lite might be even better for this - you can construct URLs that link directly to examples: https://github.com/simonw/datasette-lite
-
SQLite WASM Official
There are some amazing things for SQLite in the browser especially if you're looking for ways to host queryable data for cheap.
I have a hacked up POC experimental version of datasette-lite to be able to look at multi-GB databases at https://github.com/simonw/datasette-lite/pull/49. It uses a hacked up chunk'd lazyFile implementation from emscripten and others to grap pages from Cloudflare R2.
It's a test with california's unclaimed property records (https://www.sco.ca.gov/upd_download_property_records.html) of a 28GB searching up that guy who owns Twitter: https://datasette-lite-lab.mindflakes.com/index.html?url=htt...
I think there may be a space for super-large multi-GB files served from static storage being accessible from SQlite as well. Another one would be this full-text search of a 43GB SQLite database of Wikipedia's full text search: http://static.wiki/ . Hearing there's official support for this is awesome and I hope they also might add some provisions for those sticking with POSIX/Emscripten as well.
-
Hosting SQLite Databases on GitHub Pages
I grafted the enhanced lazyFile implementation of this to datasette-lite relatively recently. Threw in a 18GB CSV from
https://www.sco.ca.gov/upd_download_property_records.html
into a FTS5 Sqlite Database which came out to about 28GB after processing:
POC, non-merging Draft PR for the hack:
https://github.com/simonw/datasette-lite/pull/49
You can run queries through it if you URL hack into it and just get to the query dialog, browsing is kind of a dud at the moment since datasette runs a count(*) which downloads everything.
- Learn Postgres at the Playground
-
A SQLite extension for reading large files line-by-line
Oh wow! I wonder how hard it would be to load that module into https://github.com/simonw/datasette-lite
-
This Week in Python
datasette-lite โ Datasette running in your browser using WebAssembly and Pyodide
-
Datasette Lite: a server-side Python web application running in a browser
I have an open issue for that here: https://github.com/simonw/datasette-lite/issues/28
My initial hunch is that this will be really difficult - probably require a fork of something like https://github.com/coleifer/pysqlite3 then compiled for WebAssembly.
I'm confident it's feasible, but I don't have the skills to figure it out myself.
What are some alternatives?
git-xargs - git-xargs is a command-line tool (CLI) for making updates across multiple Github repositories with a single command.
pyscript - Try PyScript: https://pyscript.com Examples: https://tinyurl.com/pyscript-examples Community: https://discord.gg/HxvBtukrg2
crux - General purpose bitemporal database for SQL, Datalog & graph queries. Backed by @juxt [Moved to: https://github.com/xtdb/xtdb]
sqlite-plus - The ultimate set of SQLite extensions
flan - A tasty tool that lets you save, load and share postgres snapshots with ease
file-system-access - Expose the file system on the userโs device, so Web apps can interoperate with the userโs native applications.
datastation - App to easily query, script, and visualize data from every database, file, and API.
csv-sql - Command-line tool to load csv and excel (xlsx) files and run sql commands
pyodide - Pyodide is a Python distribution for the browser and Node.js based on WebAssembly
xlite - Query Excel spredsheets (.xlsx, .xls, .ods) using SQLite
pysqlite3 - SQLite3 DB-API 2.0 driver from Python 3, packaged separately, with improvements