seafowl vs sgr

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

seafowl		sgr
	Project
11	Mentions	22
355	Stars	326
2.5%	Growth	0.6%
9.3	Activity	1.5
6 days ago	Latest Commit	5 days ago
Rust	Language	Python
Apache License 2.0	License	GNU General Public License v3.0 or later

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

seafowl

Posts with mentions or reviews of seafowl. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-09-06.

Gcsfuse: A user-space file system for interacting with Google Cloud Storage
15 projects | news.ycombinator.com | 6 Sep 2023

In case you're interested in scale-to-zero database hosting, a few months ago I paired gcsfuse with Seafowl [0][1], an early stage open source database written in Rust. Was a lot of fun balancing tradeoffs that are usually not possible with classical databases e.g. Postgres. Thank you gcsfuse contributors.
[0] https://seafowl.io
DuckDB 0.8.0
5 projects | news.ycombinator.com | 17 May 2023

> why someone would start something in a memory unsafe language these days
You might like what we (Splitgraph) are building with Seafowl [0], a new database which is written in Rust and based on Datafusion and delta-rs [1]. It's optimized for running at the edge and responding to queries via HTTP with cache-friendly semantics.
[0] https://seafowl.io
[1] https://www.splitgraph.com/blog/seafowl-delta-storage-layer
We made a newsfeed for tracking new and deleted datasets across 200+ open data portals (and they're all queryable with SQL)
2 projects | /r/datasets | 13 Apr 2023

For example, here's the IPInfo dataset, and here's a some commodities data from Trase which is proxying to their live Postgres database, and powering their interactive dashboard. Also, here's the repository of Socrata metadata powering the newsfeed - we scrape it nightly and then push it to Seafowl, our new open-source database optimized for running cache-friendly queries "at the edge." The code for Open Data Monitor is on GitHub, if you're curious.
Quicker Serverless Postgres Connections
1 project | news.ycombinator.com | 28 Mar 2023

This is basically how we do authentication in the Splitgraph DDN [0], which is kind of like a multi-tenant serverless Postgres.
We implement the Postgres frontend with a forked version of PgBouncer, and we changed the authentication method such that when the user authenticates, we issue them a JWT which we store as a session variable. That session variable has the same security properties as a cookie in a web browser (the user can change/manipulate it, but if it's signed by us we can trust its claims).
That's the simple explanation that skips over the multi-tenant part. I don't want to derail from the thread - Neon is very cool, and we are actually experimenting with it right now, for storing the Seafowl [1] catalog when deploying to "scale to zero" services like Google Cloud Run or AWS Lambda, which don't have persistent storage.
[0] https://www.splitgraph.com/connect/query
[1] https://seafowl.io
Show HN: Free IP to Country and ASN Downloads from Ipinfo.io
1 project | news.ycombinator.com | 1 Mar 2023

This is really cool! I've always found IP data to be a compelling example of a data product, especially when talking about Splitgraph, a company of which I'm a co-founder (and btw - I also met my co-founder on HN!).
So, I exported the CSV files for country and asn data, and then uploaded them to Splitgraph. You can see some sample queries in the readme of the repository [0]. Since Splitgraph is built on Postgres, it's possible to use all the `inet` and `cidr` tools available from Postgres, so you can make range queries easily. One sample query also demonstrates a join between the two tables, resulting in the equivalent of your combined country_asn.csv.
Another idea: We have a newer project called Seafowl [1], which is an open-source analytical database optimized for running "at the edge," with cache-friendly semantics making it ideal for querying from Web applications. We don't have a self-hosted version of this yet, but perhaps the next thing to try would be loading this data into Seafowl and querying it "at the edge" - I've been thinking about ways that we could package Seafowl along as an OpenResty module, which could allow for true "at the edge" use cases like querying IP data in your reverse proxy. (Although the .mmdb format already solves this particular problem pretty efficiently and interoperably, although I'd be curious to measure the difference).
[0] https://www.splitgraph.com/miles/ipinfo-country-asn
[1] https://seafowl.io/
I Migrated from a Postgres Cluster to Distributed SQLite with LiteFS
4 projects | news.ycombinator.com | 5 Jan 2023

You can indeed run LiteFS by yourself, without Consul, as a sidecar / wrapper around your application. We do it in our project and have a Docker Compose example at [0]. In this case, you specify a specific known leader node. We haven't tried getting it running independently with Consul to do leader election / failover.
[0] https://github.com/splitgraph/seafowl/blob/main/examples/lit...
Ask HN: Serverless SQLite or Closest DX to Cloudflare D1?
2 projects | news.ycombinator.com | 2 Jan 2023

This is the vision of what we're building at Splitgraph. [0] You might be most interested in our recent project Seafowl [1] which is an open-source analytical database optimized for running "at the edge," with cache-friendly semantics making it ideal for querying from Web applications. It's built in Rust using DataFusion and incorporates many of the lessons we've learned building the Data Delivery Network [2] for Splitgraph.
[0] https://www.splitgraph.com
[1] https://seafowl.io
[2] https://www.splitgraph.com/connect
PostgREST – Serve a RESTful API from Any Postgres Database
22 projects | news.ycombinator.com | 29 Dec 2022

> why not just accept SQL and cut out all the unnecessary mapping?
You might be interested in what we're building: Seafowl, a database designed for running analytical SQL queries straight from the user's browser, with HTTP CDN-friendly caching [0]. It's a second iteration of the Splitgraph DDN [1] which we built on top of PostgreSQL (Seafowl is much faster for this use case, since it's based on Apache DataFusion + Parquet).
The tradeoff for allowing the client to run any SQL vs a limited API is that PostgREST-style queries have a fairly predictable and low overhead, but aren't as powerful as fully-fledged SQL with aggregations, joins, window functions and CTEs, which have their uses in interactive dashboards to reduce the amount of data that has to be processed on the client.
There's also ROAPI [2] which is a read-only SQL API that you can deploy in front of a database / other data source (though in case of using databases as a data source, it's only for tables that fit in memory).
[0] https://seafowl.io/
[1] https://www.splitgraph.com/connect
[2] https://github.com/roapi/roapi
Show HN: Socrata Roulette – run random SQL on a random government dataset
1 project | news.ycombinator.com | 9 Dec 2022

It's possible! Currently this is running GROUP BY queries using Socrata's query API on the original government data portal. We're adding the ability to import data from these sources into a columnar format in the future, either into Splitgraph itself or syncing the data out into Seafowl (https://seafowl.io/) which uses Parquet and is much faster.
Technically, the ability is already there (you can add a dataset to Splitgraph and select Socrata as a source if you know the dataset ID), but it's not as turnkey as landing on a dataset page and clicking a button. More to come!
Welcome to InfluxDB IOx: InfluxData’s New Storage Engine
5 projects | news.ycombinator.com | 26 Oct 2022

Just wanted to give a shout out to Apache DataFusion[0] that IOx relies on a lot (and contributes to as well!).
It's a framework for writing query engines in Rust that takes care of a lot of heavy lifting around parsing SQL, type casting, constructing and transforming query plans and optimizing them. It's pluggable, making it easy to write custom data sources, optimizer rules, query nodes etc.
It's has very good single-node performance (there's even a way to compile it with SIMD support) and Ballista [1] extends that to build it into a distributed query engine.
Plenty of other projects use it besides IOx, including VegaFusion, ROAPI, Cube.js's preaggregation store. We're heavily using it to build Seafowl [2], an analytical database that's optimized for running SQL queries directly from the user's browser (caching, CDNs, low latency, some WASM support, all that fun stuff).
[0] https://github.com/apache/arrow-datafusion
[1] https://github.com/apache/arrow-ballista
[2] https://github.com/splitgraph/seafowl

sgr

Posts with mentions or reviews of sgr. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-02-01.

Show HN: Loofi – Our AI-Powered SQL Query Builder
1 project | news.ycombinator.com | 21 May 2023
Release engineering is exhausting so here's cargo-dist
12 projects | news.ycombinator.com | 1 Feb 2023

I wrote up the details of this in a PR [0] where I last dealt with it.
[0] https://github.com/splitgraph/sgr/pull/656
Ask HN: Serverless SQLite or Closest DX to Cloudflare D1?
2 projects | news.ycombinator.com | 2 Jan 2023

This is the vision of what we're building at Splitgraph. [0] You might be most interested in our recent project Seafowl [1] which is an open-source analytical database optimized for running "at the edge," with cache-friendly semantics making it ideal for querying from Web applications. It's built in Rust using DataFusion and incorporates many of the lessons we've learned building the Data Delivery Network [2] for Splitgraph.
[0] https://www.splitgraph.com
[1] https://seafowl.io
[2] https://www.splitgraph.com/connect
Postgres Auditing in 150 lines of SQL
10 projects | news.ycombinator.com | 9 Mar 2022

You might like what we're doing with Splitgraph. Our command line tool (sgr) installs an audit log into Postgres to track changes [0]. Then `sgr commit` can write these changes to delta-compressed objects [1], where each object is a columnar fragment of data, addressable by the LTHash of rows added/deleted by the fragment, and attached to metadata describing its index [2].
I haven't explored sirix before, but at first glance it looks like we have some similar ideas — thanks for sharing, I'm excited to learn more, especially about its application of ZFS.
[0] https://www.splitgraph.com/docs/working-with-data/tracking-c...
[1] https://www.splitgraph.com/docs/concepts/objects
[2] https://github.com/splitgraph/splitgraph/blob/master/splitgr...
The world of PostgreSQL wire compatibility
3 projects | news.ycombinator.com | 10 Feb 2022

Shameless plug, but your list is missing Splitgraph [0] :)
We’ve been based on Postgres from the beginning, and although the backend is a bit more complex at this point, we’ve kept the wire protocol intact. We’re also heavily invested in FDWs, not only for federated queries (e.g. querying data at Snowflake – btw, you might enjoy our blog post on achieving a 100x speedup with aggregation pushdown), but also for queries on warehoused data stored as Splitgraph images. By keeping Postgres compatibility as our guiding constraint, we’ve been able to build a lot of functionality on top of just a few simple abstractions. The result is something akin to a magic Postgres database – you can connect dozens of live sources to it using FDW plugins, or you can ingest from hundreds data sources using Airbyte connectors, ultimately storing the data as immutable Splitgraph images in object storage.
As for the wire protocol, our implementation is heavily reliant on (a forked version of) PgBouncer. Basically, a query arrives, we parse it for references to tables (which look like Docker image tags), and the proxy layer performs whatever orchestration is necessary to satisfy the query. That could mean instantiating a foreign server to a saved connection, loading some data from object storage, or even lazily loading only the requisite data (we call this “layered querying” since it’s implemented similarly to AUFS). In the future, it could also mean delegating the query to a more specialized engine like Presto.
Point is, by keeping the frontend intact, we’re able to retain compatibility with all Postgres clients, but we’re free to implement the backend in more scalable or domain specific ways. For example, we’re able to horizontally scale our query capacity by simply adding more “cache nodes” that perform the layered querying.
We are definitely all-in on the Postgres wire protocol, and all the ecosystem compatibility that comes along with it. You can read our blog for more in depth discussions of this, but I don’t want to spam too many links here. :)
[0] https://www.splitgraph.com
[1] https://www.splitgraph.com/blog/postgresql-fdw-aggregation-p...
Scalable PostgreSQL Connection Pooler
11 projects | news.ycombinator.com | 12 Nov 2021

We are building a solution for this problem at Splitgraph [0] – it sounds like we could probably help with your use case. You can get it to work yourself with our open source code [1], but our (private beta, upcoming public) SaaS service will put all your schemas on a more scalable “data delivery network,” which incidentally, happens to be implemented with PgBouncer + rewriting + ephemeral instances. In a local engine (just a Postgres DB managed by Splitgraph client to add extra stuff), there is no PgBouncer, but we use Foreign Data Wrappers to accomplish the same.
On Splitgraph, every dataset – and every version of every dataset – has an address. Think of it like tagged Docker images. The address either points to an immutable “data image” (in which case we can optionally download objects required to resolve a query on-the-fly, although loading up-front is possible too) or to a live data source (in which case we proxy directly to it via FDW translation). This simple idea of _addressable data products_ goes a long way – for example, it means that computing a diff is now as simple as joining across two tables (one with the previous version, one with the new).
Please excuse the Frankenstein marketing site – we’re in the midst of redesign / rework of info architecture while we build out our SaaS product.
Feel free to reach out if you’ve got questions. And if you have a business case, we have spots available in our private pilot. My email is in my profile – mention HN :)
[0] https://www.splitgraph.com/connect
[1] examples: https://github.com/splitgraph/splitgraph/tree/master/example...
Ask HN: How to get compeitors to use our open source interop-prototcol?
4 projects | news.ycombinator.com | 4 Oct 2021

Federated data sharing is the core use case of the magic Postgres database we’re building at Splitgraph [0]. We’d love to help you solve these problems! The ideas you’re describing are exactly what we want to achieve – data sharing should be as easy as changing a connection string in a SQL client. It sounds like your use case would be a good fit for what we’re building. If you’d like to learn more, please send me a note – email in profile.
[0] https://www.splitgraph.com
Cloudera taken private for $5.3b, acquires Datacoral and Cazena
2 projects | news.ycombinator.com | 1 Jun 2021

The data industry continues to hype this idea of “multi-cloud,” but then the “modern data stack” is centralized around a single warehouse and nobody sees any irony in that.
The big bet we’re making at Splitgraph [0] is that the next wave of data engineering will take a more decentralized, “data mesh” type approach to enterprise architecture. “Data gravity” really does exist -expensive to move, in terms of both cost and operational complexity. So instead of bringing the data to the query, why not bring the query to the data? All we need for that is a set of read only credentials.
Cloudera mentions they bought DataCoral to help with data integration and connectors. They’ve correctly identified the problem - data sprawl and fragmentation will inevitably grow - but I’m not sure they have the right solution.
Data integration is important, but it’s a moving target, which is why it calls for a collaborative open source solution. This is why so many new startups, like AirByte most recently, are coalescing around the Singer taps that Stitch left behind after its acquisition by Talend.
We also support using Singer taps to ingest data into versioned Splitgraph images [1], so we’re excited to see more collaboration on maintenance of taps. For us it’s a useful feature, but it should be just that — a feature. Is there really a need to replicate all of your data before you can even query it? Or would you rather experiment by directly querying its source?
[0] https://www.splitgraph.com
[1] unreleased and undocumented atm, but it does work. We’re hiring, especially on the frontend if you want to help build the web UI. See profile.
Google Dataset Search
1 project | news.ycombinator.com | 6 May 2021

On the public DDN (data.splitgraph.com:5432), we enforce a (currently arbitrary) 10k row limit on responses. You can construct multiple queries using LIMIT and OFFSET, or you can run a local Splitgraph engine without a limit. We also have a private beta program if you want a managed or self-hosted deployment. And we are planning to ship some features for "export to csv" type use cases (potentially other output formats too).
For live/external data, we proxy the query to the data source, so there is no theoretical data size limit except for any defined by the upstream.
For snapshotted data, we store the data as fragments in object storage. Any size limit depends on the machine where Splitgraph's Postgres engine is running, and how you choose to materialize the data when downloading it from object storage. You can "check out" an entire image to materialize it locally, at which point it will be like any other Postgres schema. Or you can use "layered querying" which will return a result set while only materializing the fragments necessary to answer the query.
Regarding ClickHouse, you could watch this presentation [0] my co-founder Artjoms gave at a recent ClickHouse meet-up on the topic of your question. We also have specific documentation for using the ClickHouse ODBC client with the DDN [1], as well as an example reference implementation. [2]
[0] https://www.youtube.com/watch?v=44CDs7hJTho
[1] https://www.splitgraph.com/connect
[2] https://github.com/splitgraph/splitgraph/tree/master/example...
Ask HN: Who is hiring? (April 2021)
21 projects | news.ycombinator.com | 1 Apr 2021

Splitgraph (https://www.splitgraph.com) | Remote | Full-time
Splitgraph is reshaping how organizations interact with data. We provide a unified interface to discover and query data. In practice, this means we're building a data catalog (a web app) and query layer (implemented with the Postgres wire protocol).
We're a seed-stage, venture-funded startup hiring our initial team. The two co-founders are looking to grow the team by adding multiple engineers across the stack. This is an opportunity to make a big impact on an agile team while working closely with the founders.
Splitgraph is a remote-first organization. The founders are based in the UK, and the company is incorporated in both USA and UK. Candidates are welcome to apply from any geography. We want to work with the most talented, thoughtful and productive engineers in the world.
Open positions:
* Senior Software Engineer - Frontend. Responsible for the web stack, mainly involving Typescript, React, Next.js, Postgraphile, etc.
* Senior Software Engineer - Backend. Responsible for a variety of core services, using Python, Poetry, Postgres, C, Lua, and a ton of other technologies.
Learn more & apply: https://www.notion.so/splitgraph/Splitgraph-is-Hiring-25b421...

What are some alternatives?

When comparing seafowl and sgr you can also consider the following projects:

marmot - A distributed SQLite replicator built on top of NATS

haystack - :mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.

datafusion-ballista - Apache Arrow Ballista Distributed Query Engine

dremio-oss - Dremio - the missing link in modern data

azurefs - Mount Microsoft Azure Blob Storage as local filesystem in Linux (inactive)

parabol - Free online agile retrospective meeting tool

annuaire-entreprises-sirene-api

Baserow - Open source no-code database and Airtable alternative. Create your own online database without technical experience. Performant with high volumes of data, can be self hosted and supports plugins

mindcastle.io - Massively scalable, cloud-backed distributed block device for Linux and VMs

django-pgviews - Fork of django-postgres that focuses on maintaining and improving support for Postgres SQL Views.

Prisma - Next-generation ORM for Node.js & TypeScript | PostgreSQL, MySQL, MariaDB, SQL Server, SQLite, MongoDB and CockroachDB

pgbouncer-fast-switchover - Adds query routing and rewriting extensions to pgbouncer

seafowl vs marmot sgr vs haystack seafowl vs datafusion-ballista sgr vs dremio-oss seafowl vs azurefs sgr vs parabol seafowl vs annuaire-entreprises-sirene-api sgr vs Baserow seafowl vs mindcastle.io sgr vs django-pgviews seafowl vs Prisma sgr vs pgbouncer-fast-switchover

Compare seafowl vs sgr and see what are their differences.

seafowl

sgr

seafowl

sgr

What are some alternatives?