sgr vs sirix

sgr

sgr (command line client for Splitgraph) and the splitgraph Python library (by splitgraph)

Source Code

splitgraph.com

Suggest alternative

Edit details

SirixDB is an an embeddable, bitemporal, append-only database system and event store, storing immutable lightweight snapshots. It keeps the full history of each resource. Every commit stores a space-efficient snapshot through structural sharing. It is log-structured and never overwrites data. SirixDB uses a novel page-level versioning approach. (by sirixdb)

Xquery Java temporal-data Storage Snapshot Comparison Ssd JSON Versioning Hashing Diffing Diff XML Kotlin Vertx Coroutines diff-algorithm Keycloak HacktoberFest jsoniq

Source Code

sirix.io

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

sgr		sirix
	Project
22	Mentions	44
326	Stars	1,086
0.6%	Growth	1.2%
1.5	Activity	9.1
22 days ago	Latest Commit	9 days ago
Python	Language	Java
GNU General Public License v3.0 or later	License	BSD 3-clause "New" or "Revised" License

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

sgr

Posts with mentions or reviews of sgr. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-02-01.

Show HN: Loofi – Our AI-Powered SQL Query Builder
1 project | news.ycombinator.com | 21 May 2023
Release engineering is exhausting so here's cargo-dist
12 projects | news.ycombinator.com | 1 Feb 2023

I wrote up the details of this in a PR [0] where I last dealt with it.
[0] https://github.com/splitgraph/sgr/pull/656
Ask HN: Serverless SQLite or Closest DX to Cloudflare D1?
2 projects | news.ycombinator.com | 2 Jan 2023

This is the vision of what we're building at Splitgraph. [0] You might be most interested in our recent project Seafowl [1] which is an open-source analytical database optimized for running "at the edge," with cache-friendly semantics making it ideal for querying from Web applications. It's built in Rust using DataFusion and incorporates many of the lessons we've learned building the Data Delivery Network [2] for Splitgraph.
[0] https://www.splitgraph.com
[1] https://seafowl.io
[2] https://www.splitgraph.com/connect
Postgres Auditing in 150 lines of SQL
10 projects | news.ycombinator.com | 9 Mar 2022

You might like what we're doing with Splitgraph. Our command line tool (sgr) installs an audit log into Postgres to track changes [0]. Then `sgr commit` can write these changes to delta-compressed objects [1], where each object is a columnar fragment of data, addressable by the LTHash of rows added/deleted by the fragment, and attached to metadata describing its index [2].
I haven't explored sirix before, but at first glance it looks like we have some similar ideas — thanks for sharing, I'm excited to learn more, especially about its application of ZFS.
[0] https://www.splitgraph.com/docs/working-with-data/tracking-c...
[1] https://www.splitgraph.com/docs/concepts/objects
[2] https://github.com/splitgraph/splitgraph/blob/master/splitgr...
The world of PostgreSQL wire compatibility
3 projects | news.ycombinator.com | 10 Feb 2022

Shameless plug, but your list is missing Splitgraph [0] :)
We’ve been based on Postgres from the beginning, and although the backend is a bit more complex at this point, we’ve kept the wire protocol intact. We’re also heavily invested in FDWs, not only for federated queries (e.g. querying data at Snowflake – btw, you might enjoy our blog post on achieving a 100x speedup with aggregation pushdown), but also for queries on warehoused data stored as Splitgraph images. By keeping Postgres compatibility as our guiding constraint, we’ve been able to build a lot of functionality on top of just a few simple abstractions. The result is something akin to a magic Postgres database – you can connect dozens of live sources to it using FDW plugins, or you can ingest from hundreds data sources using Airbyte connectors, ultimately storing the data as immutable Splitgraph images in object storage.
As for the wire protocol, our implementation is heavily reliant on (a forked version of) PgBouncer. Basically, a query arrives, we parse it for references to tables (which look like Docker image tags), and the proxy layer performs whatever orchestration is necessary to satisfy the query. That could mean instantiating a foreign server to a saved connection, loading some data from object storage, or even lazily loading only the requisite data (we call this “layered querying” since it’s implemented similarly to AUFS). In the future, it could also mean delegating the query to a more specialized engine like Presto.
Point is, by keeping the frontend intact, we’re able to retain compatibility with all Postgres clients, but we’re free to implement the backend in more scalable or domain specific ways. For example, we’re able to horizontally scale our query capacity by simply adding more “cache nodes” that perform the layered querying.
We are definitely all-in on the Postgres wire protocol, and all the ecosystem compatibility that comes along with it. You can read our blog for more in depth discussions of this, but I don’t want to spam too many links here. :)
[0] https://www.splitgraph.com
[1] https://www.splitgraph.com/blog/postgresql-fdw-aggregation-p...
Scalable PostgreSQL Connection Pooler
11 projects | news.ycombinator.com | 12 Nov 2021

We are building a solution for this problem at Splitgraph [0] – it sounds like we could probably help with your use case. You can get it to work yourself with our open source code [1], but our (private beta, upcoming public) SaaS service will put all your schemas on a more scalable “data delivery network,” which incidentally, happens to be implemented with PgBouncer + rewriting + ephemeral instances. In a local engine (just a Postgres DB managed by Splitgraph client to add extra stuff), there is no PgBouncer, but we use Foreign Data Wrappers to accomplish the same.
On Splitgraph, every dataset – and every version of every dataset – has an address. Think of it like tagged Docker images. The address either points to an immutable “data image” (in which case we can optionally download objects required to resolve a query on-the-fly, although loading up-front is possible too) or to a live data source (in which case we proxy directly to it via FDW translation). This simple idea of _addressable data products_ goes a long way – for example, it means that computing a diff is now as simple as joining across two tables (one with the previous version, one with the new).
Please excuse the Frankenstein marketing site – we’re in the midst of redesign / rework of info architecture while we build out our SaaS product.
Feel free to reach out if you’ve got questions. And if you have a business case, we have spots available in our private pilot. My email is in my profile – mention HN :)
[0] https://www.splitgraph.com/connect
[1] examples: https://github.com/splitgraph/splitgraph/tree/master/example...
Ask HN: How to get compeitors to use our open source interop-prototcol?
4 projects | news.ycombinator.com | 4 Oct 2021

Federated data sharing is the core use case of the magic Postgres database we’re building at Splitgraph [0]. We’d love to help you solve these problems! The ideas you’re describing are exactly what we want to achieve – data sharing should be as easy as changing a connection string in a SQL client. It sounds like your use case would be a good fit for what we’re building. If you’d like to learn more, please send me a note – email in profile.
[0] https://www.splitgraph.com
Cloudera taken private for $5.3b, acquires Datacoral and Cazena
2 projects | news.ycombinator.com | 1 Jun 2021

The data industry continues to hype this idea of “multi-cloud,” but then the “modern data stack” is centralized around a single warehouse and nobody sees any irony in that.
The big bet we’re making at Splitgraph [0] is that the next wave of data engineering will take a more decentralized, “data mesh” type approach to enterprise architecture. “Data gravity” really does exist -expensive to move, in terms of both cost and operational complexity. So instead of bringing the data to the query, why not bring the query to the data? All we need for that is a set of read only credentials.
Cloudera mentions they bought DataCoral to help with data integration and connectors. They’ve correctly identified the problem - data sprawl and fragmentation will inevitably grow - but I’m not sure they have the right solution.
Data integration is important, but it’s a moving target, which is why it calls for a collaborative open source solution. This is why so many new startups, like AirByte most recently, are coalescing around the Singer taps that Stitch left behind after its acquisition by Talend.
We also support using Singer taps to ingest data into versioned Splitgraph images [1], so we’re excited to see more collaboration on maintenance of taps. For us it’s a useful feature, but it should be just that — a feature. Is there really a need to replicate all of your data before you can even query it? Or would you rather experiment by directly querying its source?
[0] https://www.splitgraph.com
[1] unreleased and undocumented atm, but it does work. We’re hiring, especially on the frontend if you want to help build the web UI. See profile.
Google Dataset Search
1 project | news.ycombinator.com | 6 May 2021

On the public DDN (data.splitgraph.com:5432), we enforce a (currently arbitrary) 10k row limit on responses. You can construct multiple queries using LIMIT and OFFSET, or you can run a local Splitgraph engine without a limit. We also have a private beta program if you want a managed or self-hosted deployment. And we are planning to ship some features for "export to csv" type use cases (potentially other output formats too).
For live/external data, we proxy the query to the data source, so there is no theoretical data size limit except for any defined by the upstream.
For snapshotted data, we store the data as fragments in object storage. Any size limit depends on the machine where Splitgraph's Postgres engine is running, and how you choose to materialize the data when downloading it from object storage. You can "check out" an entire image to materialize it locally, at which point it will be like any other Postgres schema. Or you can use "layered querying" which will return a result set while only materializing the fragments necessary to answer the query.
Regarding ClickHouse, you could watch this presentation [0] my co-founder Artjoms gave at a recent ClickHouse meet-up on the topic of your question. We also have specific documentation for using the ClickHouse ODBC client with the DDN [1], as well as an example reference implementation. [2]
[0] https://www.youtube.com/watch?v=44CDs7hJTho
[1] https://www.splitgraph.com/connect
[2] https://github.com/splitgraph/splitgraph/tree/master/example...
Ask HN: Who is hiring? (April 2021)
21 projects | news.ycombinator.com | 1 Apr 2021

Splitgraph (https://www.splitgraph.com) | Remote | Full-time
Splitgraph is reshaping how organizations interact with data. We provide a unified interface to discover and query data. In practice, this means we're building a data catalog (a web app) and query layer (implemented with the Postgres wire protocol).
We're a seed-stage, venture-funded startup hiring our initial team. The two co-founders are looking to grow the team by adding multiple engineers across the stack. This is an opportunity to make a big impact on an agile team while working closely with the founders.
Splitgraph is a remote-first organization. The founders are based in the UK, and the company is incorporated in both USA and UK. Candidates are welcome to apply from any geography. We want to work with the most talented, thoughtful and productive engineers in the world.
Open positions:
* Senior Software Engineer - Frontend. Responsible for the web stack, mainly involving Typescript, React, Next.js, Postgraphile, etc.
* Senior Software Engineer - Backend. Responsible for a variety of core services, using Python, Poetry, Postgres, C, Lua, and a ton of other technologies.
Learn more & apply: https://www.notion.so/splitgraph/Splitgraph-is-Hiring-25b421...

sirix

Posts with mentions or reviews of sirix. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-01-24.

Show HN: Integer Map Data Structure
3 projects | news.ycombinator.com | 24 Jan 2024

We're using a similar trie structure as the main document (node) index in SirixDB[1]. Lately, I got some inspiration for different page-sizes based on the ART and HAMT basically for the rightmost inner pages (as the node-IDs are generated by a simple sequence generator and thus also all inner pages (we call them IndirectPage) except for the rightmost are fully occupied (the tree height is adapted dynamically depending on the size of the stored data. Currently, always 1024 references are stored to indirect child pages, but I'll experiment with smaller sized, as the inner nodes are simply copied for each new revision, whereas the leaf pages storing the actual data are versioned themselfes with a novel sliding snapshot algorithm.
You can simply compute from a unique nodeId each data is assigned (64bit) the page and reference to traverse on each level in the trie through some bit shifting.
[1] https://github.com/sirixdb/sirix
Endatabas: A SQLite-inspired, SQL document database with full history
3 projects | news.ycombinator.com | 1 Dec 2023

I'm working on something similar for the JVM, however with no document semantics, but on a much more fine granular level.
JSON is shredded during an initial import into a tree structure with fine granular nodes. Thus, an import can be done with very low memory consumption (permitted that auto-commit issues a sync to disk before RAM space is exceeded). Furthermore, it doesn't require a WAL for consistency. Instead the indexes are stored in a log-structure using a persistent tree (as in every commit creates a new tree root). A sliding snapshot algorithm makes sure, that only a fragment of a page has to be copied on a write.
As thus, it's also a perfect candidate for an event store, storing both the (lightweight) snapshots and tracking the changes optionally.
https://github.com/sirixdb/sirix
The architecture is described over here:
https://sirix.io/docs/concepts.html
Furthermore I'm working on a tutorial for a local client usage (work in progress):
https://sirix.io/docs/jsoniq-tutorial.html
Kind regards
Show HN: Bitemporal, Binary JSON Based DBS and Event Store
6 projects | news.ycombinator.com | 13 Nov 2023

If anyone is up to building a new frontend, that would be awesome (of course, work could also be split between interested people) :-)
https://github.com/sirixdb/sirix/issues/627
Show HN: Light implementation of Event Sourcing using PostgreSQL as event store
9 projects | news.ycombinator.com | 31 Oct 2023

I'm working on an append-only (immutable) (bi)temporal DBS[1] in my spare time, which transforms CRUD operations into an event store, automatically providing an audit log for each stored node, while the nodes are stored with immutable node-IDs, which never change. As the contents stored are based on a custom binary JSON format also a rolling hash can optionally be built, to check if a whole subtree has changed or not.
The system uses persistent index data structures to share unchanged pages between revisions.
The intermittant snapshots are omitted. Rather the snapshot is spread over several revisions, applying a sliding snapshot algorithm on the data pages (thus, avoiding write peaks, while at max a predefined number of page fragments has to be read in parallel to reconstruct a page in-memory).
[1] https://sirix.io | https://sirix.io/docs/concepts.html
Show HN: Evolutionary (binary) JSON data store (full immutable revision history)
3 projects | news.ycombinator.com | 21 Oct 2023

I've already posted the project a couple of years ago and it gained some interest, but a lot of stuff has been done since then, especially regarding performance, a complete new JSON store, a REST API, various internals refactored, an improved JSONiq based query engine allowing updates, a now already dated web UI, a new Kotlin based CLI, a Python and TypeScript client to ease the use of Sirix...
First prototypes from a precursor stem already from 2005.
So, what is it all about?
I'm working on an evolutionary data store in my spare time[1]. It is based on the idea to get rid of the need for a second trx log (the WAL) by using a persistent tree of tries (preserving the previous revision through copy on write and path copying to the root) index as the log itself with only a single permitted read/write txn concurrently and in parallel to N read-only txns, which are bound to specific revisions during the start. The single writer is permitted on a resource (comparable to a table/relation in a relational DB) basis within a database, reads do not involve any locks at all.
The idea is, that the system atomically swaps the tree root to the new version (replicated). If something fails the log can simply be truncated to the former tree root.
Thus, the system has many similarities with Git (structural sharing of unchanged nodes/pages) and ZFS snapshots (regarding the latter the keyed trie has been inspired by ZFS, as well as that checksums for child pages are stored in parent pages in the references to the child pages)[2].
You can of course simply execute time travel queries on the whole revision history, add commit comments and the author to answer questions such as who committed what at which point in time and why...
The system not only copies full data pages, but it applies a sliding snapshot versioning algorithm to keep storage space to a minimum.
Thus, it's best suited for fast flash drives with fast random reads and sequential writes. Data is never overwritten, thus audit trails are given for free.
The system stores find granular JSON nodes, thus the structure and size of an object has almost no limits. A path summary is built, which is an unordered set of all paths to leaf nodes in the tree and enables various optimizations. Furthermore a rolling hash is optionally built, whereas during inserts all ancestor node hashes are adapted.
Furthermore it optionally keeps track of update operations and the ctx nodes involved during txn commits. Thus, you can easily get the changes between revisions, you can check the full history of nodes, as well as navigate in time to the first revision, the last revision, the next and previous revision of a node...
You can also open a revision at a specific system time revert to a revision and commit a new version while preserving all revisions in-between.
As said one feature is, that the objects can be arbitrarily nested, thus almost no limits in the number and updates are cheap.
A dated Jupyter notebook with some examples can be found in [3] and overall documentation in [4].
The query engine[5] Brackit is retargetable (a couple of interfaces and rewrite rules have to be implemented for DB systems) and especially finds implicit joins and applies known algorithms from the relational DB systems world to optimize joins and aggregate functions due to set-oriented processing of the operators.[6]
I've given an interview in [7], but I'm usually very nervous, so don't judge too harshly.
Give it a try and happy coding!
Kind regards
Johannes
[1] https://sirix.io | https://github.com/sirixdb/sirix
[2] https://sirix.io/docs/concepts.html
[3] https://colab.research.google.com/drive/1NNn1nwSbK6hAekzo1YbED52RI3NMqqbG#scrollTo=CBWQIvc0Ov3P
[4] https://sirix.io/docs/
[5] http://brackit.io
[6] https://colab.research.google.com/drive/19eC-UfJVm_gCjY--koOWN50sgiFa5hSC
[7] https://youtu.be/Ee-5ruydgqo?si=Ift73d49w84RJWb2
Evolutionary, JSON data store (keeping the full revision history)
3 projects | news.ycombinator.com | 20 Oct 2023
Immutable Data
2 projects | news.ycombinator.com | 26 Jun 2023

You can use Datomic for instance (mentioned already in your article IIRC!?) or SirixDB[1] on sich I'm working in my spare time.
The idea is an indexed append-only log-structure and to use a functional tree structure (sharing unchanged nodes between revisions) plus a novel algorithm to balance incremental and full dumps of database pages using a sliding window instead.
[1] https://sirix.io | https://github.com/sirixdb/sirix
Java opensource projects that need help from community.
13 projects | /r/java | 20 May 2023

Append-only database system (based on a persistent inddx structure): https://github.com/sirixdb/sirix or a retargetable query compiler https://github.com/sirixdb/brackit
Looking to help out on some open source projects
4 projects | /r/opensource | 17 Apr 2023

You can work on a temporal data store called SirixDB: https://github.com/sirixdb/sirix
SirixDB - an embeddable, evolutionary database system
2 projects | /r/java | 3 Apr 2023

What are some alternatives?

When comparing sgr and sirix you can also consider the following projects:

haystack - :mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.

CXXGraph - Header-Only C++ Library for Graph Representation and Algorithms

dremio-oss - Dremio - the missing link in modern data

keycloak-kafka - Keycloak module to produce events to kafka

parabol - Free online agile retrospective meeting tool

zed - A novel data lake based on super-structured data

Baserow - Open source no-code database and Airtable alternative. Create your own online database without technical experience. Performant with high volumes of data, can be self hosted and supports plugins

hash4j - Dynatrace hash library for Java

django-pgviews - Fork of django-postgres that focuses on maintaining and improving support for Postgres SQL Views.

sqlglot - Python SQL Parser and Transpiler

pgbouncer-fast-switchover - Adds query routing and rewriting extensions to pgbouncer

Sinatra - Classy web-development dressed in a DSL (official / canonical repo)

sgr vs haystack sirix vs CXXGraph sgr vs dremio-oss sirix vs keycloak-kafka sgr vs parabol sirix vs zed sgr vs Baserow sirix vs hash4j sgr vs django-pgviews sirix vs sqlglot sgr vs pgbouncer-fast-switchover sirix vs Sinatra

Compare sgr vs sirix and see what are their differences.

sgr

sirix

sgr

sirix

What are some alternatives?