seafowl
litefs
seafowl | litefs | |
---|---|---|
11 | 38 | |
355 | 3,636 | |
2.5% | 2.5% | |
9.3 | 8.0 | |
6 days ago | 3 months ago | |
Rust | Go | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
seafowl
-
Gcsfuse: A user-space file system for interacting with Google Cloud Storage
In case you're interested in scale-to-zero database hosting, a few months ago I paired gcsfuse with Seafowl [0][1], an early stage open source database written in Rust. Was a lot of fun balancing tradeoffs that are usually not possible with classical databases e.g. Postgres. Thank you gcsfuse contributors.
[0] https://seafowl.io
-
DuckDB 0.8.0
> why someone would start something in a memory unsafe language these days
You might like what we (Splitgraph) are building with Seafowl [0], a new database which is written in Rust and based on Datafusion and delta-rs [1]. It's optimized for running at the edge and responding to queries via HTTP with cache-friendly semantics.
[0] https://seafowl.io
[1] https://www.splitgraph.com/blog/seafowl-delta-storage-layer
-
We made a newsfeed for tracking new and deleted datasets across 200+ open data portals (and they're all queryable with SQL)
For example, here's the IPInfo dataset, and here's a some commodities data from Trase which is proxying to their live Postgres database, and powering their interactive dashboard. Also, here's the repository of Socrata metadata powering the newsfeed - we scrape it nightly and then push it to Seafowl, our new open-source database optimized for running cache-friendly queries "at the edge." The code for Open Data Monitor is on GitHub, if you're curious.
-
Quicker Serverless Postgres Connections
This is basically how we do authentication in the Splitgraph DDN [0], which is kind of like a multi-tenant serverless Postgres.
We implement the Postgres frontend with a forked version of PgBouncer, and we changed the authentication method such that when the user authenticates, we issue them a JWT which we store as a session variable. That session variable has the same security properties as a cookie in a web browser (the user can change/manipulate it, but if it's signed by us we can trust its claims).
That's the simple explanation that skips over the multi-tenant part. I don't want to derail from the thread - Neon is very cool, and we are actually experimenting with it right now, for storing the Seafowl [1] catalog when deploying to "scale to zero" services like Google Cloud Run or AWS Lambda, which don't have persistent storage.
[0] https://www.splitgraph.com/connect/query
[1] https://seafowl.io
-
Show HN: Free IP to Country and ASN Downloads from Ipinfo.io
This is really cool! I've always found IP data to be a compelling example of a data product, especially when talking about Splitgraph, a company of which I'm a co-founder (and btw - I also met my co-founder on HN!).
So, I exported the CSV files for country and asn data, and then uploaded them to Splitgraph. You can see some sample queries in the readme of the repository [0]. Since Splitgraph is built on Postgres, it's possible to use all the `inet` and `cidr` tools available from Postgres, so you can make range queries easily. One sample query also demonstrates a join between the two tables, resulting in the equivalent of your combined country_asn.csv.
Another idea: We have a newer project called Seafowl [1], which is an open-source analytical database optimized for running "at the edge," with cache-friendly semantics making it ideal for querying from Web applications. We don't have a self-hosted version of this yet, but perhaps the next thing to try would be loading this data into Seafowl and querying it "at the edge" - I've been thinking about ways that we could package Seafowl along as an OpenResty module, which could allow for true "at the edge" use cases like querying IP data in your reverse proxy. (Although the .mmdb format already solves this particular problem pretty efficiently and interoperably, although I'd be curious to measure the difference).
[0] https://www.splitgraph.com/miles/ipinfo-country-asn
[1] https://seafowl.io/
-
I Migrated from a Postgres Cluster to Distributed SQLite with LiteFS
You can indeed run LiteFS by yourself, without Consul, as a sidecar / wrapper around your application. We do it in our project and have a Docker Compose example at [0]. In this case, you specify a specific known leader node. We haven't tried getting it running independently with Consul to do leader election / failover.
[0] https://github.com/splitgraph/seafowl/blob/main/examples/lit...
-
Ask HN: Serverless SQLite or Closest DX to Cloudflare D1?
This is the vision of what we're building at Splitgraph. [0] You might be most interested in our recent project Seafowl [1] which is an open-source analytical database optimized for running "at the edge," with cache-friendly semantics making it ideal for querying from Web applications. It's built in Rust using DataFusion and incorporates many of the lessons we've learned building the Data Delivery Network [2] for Splitgraph.
[0] https://www.splitgraph.com
[1] https://seafowl.io
[2] https://www.splitgraph.com/connect
-
PostgREST – Serve a RESTful API from Any Postgres Database
> why not just accept SQL and cut out all the unnecessary mapping?
You might be interested in what we're building: Seafowl, a database designed for running analytical SQL queries straight from the user's browser, with HTTP CDN-friendly caching [0]. It's a second iteration of the Splitgraph DDN [1] which we built on top of PostgreSQL (Seafowl is much faster for this use case, since it's based on Apache DataFusion + Parquet).
The tradeoff for allowing the client to run any SQL vs a limited API is that PostgREST-style queries have a fairly predictable and low overhead, but aren't as powerful as fully-fledged SQL with aggregations, joins, window functions and CTEs, which have their uses in interactive dashboards to reduce the amount of data that has to be processed on the client.
There's also ROAPI [2] which is a read-only SQL API that you can deploy in front of a database / other data source (though in case of using databases as a data source, it's only for tables that fit in memory).
[0] https://seafowl.io/
[1] https://www.splitgraph.com/connect
[2] https://github.com/roapi/roapi
-
Show HN: Socrata Roulette – run random SQL on a random government dataset
It's possible! Currently this is running GROUP BY queries using Socrata's query API on the original government data portal. We're adding the ability to import data from these sources into a columnar format in the future, either into Splitgraph itself or syncing the data out into Seafowl (https://seafowl.io/) which uses Parquet and is much faster.
Technically, the ability is already there (you can add a dataset to Splitgraph and select Socrata as a source if you know the dataset ID), but it's not as turnkey as landing on a dataset page and clicking a button. More to come!
-
Welcome to InfluxDB IOx: InfluxData’s New Storage Engine
Just wanted to give a shout out to Apache DataFusion[0] that IOx relies on a lot (and contributes to as well!).
It's a framework for writing query engines in Rust that takes care of a lot of heavy lifting around parsing SQL, type casting, constructing and transforming query plans and optimizing them. It's pluggable, making it easy to write custom data sources, optimizer rules, query nodes etc.
It's has very good single-node performance (there's even a way to compile it with SIMD support) and Ballista [1] extends that to build it into a distributed query engine.
Plenty of other projects use it besides IOx, including VegaFusion, ROAPI, Cube.js's preaggregation store. We're heavily using it to build Seafowl [2], an analytical database that's optimized for running SQL queries directly from the user's browser (caching, CDNs, low latency, some WASM support, all that fun stuff).
[0] https://github.com/apache/arrow-datafusion
[1] https://github.com/apache/arrow-ballista
[2] https://github.com/splitgraph/seafowl
litefs
-
Handle Incoming Webhooks with LiteJob for Ruby on Rails
Firstly, LiteJob's reliance on SQLite inherently restricts its horizontal scaling capabilities. Unlike other databases, SQLite is designed for single-machine use, making it challenging to distribute workload across multiple servers. This can certainly be done using novel technologies like LiteFS, but it is far from intuitive.
-
Experimenting on the Edge with Turso (and Go)
Im curious to know if others have tried out Turso or LiteFS or any of the newer edge db providers that are popping up in 'real world' applications and what your experiences have been?
-
Skip the API, Ship Your Database
Author here. I think we could have set better expectations with our Postgres docs. It wasn't meant to be a managed service but rather some tooling to help streamline setting up a database and replicas. I'm sorry about the troubles you've had and that it's come off as us being disingenuous. We blog about things that we're working on and find interesting. It's not meant say that we've figured everything out but rather this is what we've tried.
As for this post, it's not managed SQLite but rather an open source project called LiteFS [1]. You can run it anywhere that runs Linux. We use it in few places in our infrastructure and found that sharing the underlying database for internal tooling was really helpful for that use case.
[1]: https://github.com/superfly/litefs
-
SQLedge: Replicate Postgres to SQLite on the Edge
#. SQLite WAL mode
From https://www.sqlite.org/isolation.html https://news.ycombinator.com/item?id=32247085 :
> [sqlite] WAL mode permits simultaneous readers and writers. It can do this because changes do not overwrite the original database file, but rather go into the separate write-ahead log file. That means that readers can continue to read the old, original, unaltered content from the original database file at the same time that the writer is appending to the write-ahead log
#. superfly/litefs: aFUSE-based file system for replicating SQLite https://github.com/superfly/litefs
#. sqldiff: https://www.sqlite.org/sqldiff.html https://news.ycombinator.com/item?id=31265005
#. dolthub/dolt: https://github.com/dolthub/dolt
> Dolt can be set up as a replica of your existing MySQL or MariaDB database using standard MySQL binlog replication. Every write becomes a Dolt commit. This is a great way to get the version control benefits of Dolt and keep an existing MySQL or MariaDB database.
#. pganalyze/libpg_query: https://github.com/pganalyze/libpg_query :
> C library for accessing the PostgreSQL parser outside of the server environment
#. Ibis + Substrait [ + DuckDB ]
> ibis strives to provide a consistent interface for interacting with a multitude of different analytical execution engines, most of which (but not all) speak some dialect of SQL.
> Today, Ibis accomplishes this with a lot of help from `sqlalchemy` and `sqlglot` to handle differences in dialect, or we interact directly with available Python bindings (for instance with the pandas, datafusion, and polars backends).
> [...] `Substrait` is a new cross-language serialization format for communicating (among other things) query plans. It's still in its early days, but there is already nascent support for Substrait in Apache Arrow, DuckDB, and Velox.
#. benbjohnson/postlite: https://github.com/benbjohnson/postlite
> postlite is a network proxy to allow access to remote SQLite databases over the Postgres wire protocol. This allows GUI tools to be used on remote SQLite databases which can make administration easier.
> The proxy works by translating Postgres frontend wire messages into SQLite transactions and converting results back into Postgres response wire messages. Many Postgres clients also inspect the pg_catalog to determine system information so Postlite mirrors this catalog by using an attached in-memory database with virtual tables. The proxy also performs minor rewriting on these system queries to convert them to usable SQLite syntax.
> Note: This software is in alpha. Please report bugs. Postlite doesn't alter your database unless you issue INSERT, UPDATE, DELETE commands so it's probably safe. If anything, the Postlite process may die but it shouldn't affect your database.
#. > "Hosting SQLite Databases on GitHub Pages" (2021) re: sql.js-httpvfs, DuckDB https://news.ycombinator.com/item?id=28021766
#. awesome-db-tools https://github.com/mgramin/awesome-db-tools
- Fly.io Postgres cluster went down for 3 days, no word from them about it
-
LiteFS Cloud: Distributed SQLite with Managed Backups
LiteFS works sorta like that. It provides read replicas on all your application servers so you can use it just like vanilla SQLite for queries.
Write transactions have to occur on the primary node but that's mostly because of latency. SQLite operates in serializable isolation so it only allows one transaction at a time. If you wanted to have all nodes write then you'd need to acquire a lock on one node and then update it and then release the lock. We actually allow this on LiteFS using something called "write forwarding" but it's pretty slow so I wouldn't suggest it for regular use.
We're adding an optional a query API over HTTP [1] soon as well. It's inspired by Turso's approach. That'll let you issue one or more queries in a batch over HTTP and they'll be run in a single transaction.
[1]: https://github.com/superfly/litefs/issues/326
-
We Raised a Bunch of Money
Basically, LiteFS: https://github.com/superfly/litefs
And then some load balancer cleverness that reroutes writes to a specific VM: https://fly.io/blog/globally-distributed-postgres/
- Mycelite: SQLite extension to synchronize changes across SQLite instances
- Database suggestion to store and retrieve data
-
Key-value store has been added to Deno API
But my guess is they'll have an alternate implementation or something like LiteFS in Deno Deploy that will make this substantially more interesting when running in the Cloud.
What are some alternatives?
marmot - A distributed SQLite replicator built on top of NATS
litestream - Streaming replication for SQLite.
datafusion-ballista - Apache Arrow Ballista Distributed Query Engine
sqlite-s3vfs - Python writable virtual filesystem for SQLite on S3
azurefs - Mount Microsoft Azure Blob Storage as local filesystem in Linux (inactive)
dqlite - Embeddable, replicated and fault-tolerant SQL engine.
annuaire-entreprises-sirene-api
mvsqlite - Distributed, MVCC SQLite that runs on FoundationDB.
mindcastle.io - Massively scalable, cloud-backed distributed block device for Linux and VMs
Bedrock - Rock solid distributed database specializing in active/active automatic failover and WAN replication
Prisma - Next-generation ORM for Node.js & TypeScript | PostgreSQL, MySQL, MariaDB, SQL Server, SQLite, MongoDB and CockroachDB