Brackit: A retargatable JSONiq query engine

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • brackit

    Query processor with proven optimizations, ready to use for your JSON store to query semi-structured data with JSONiq. Can also be used as an ad-hoc in-memory query processor.

  • Hi all,

    Sebastian and his students did a tremendous job creating Brackit[1] in the first place as a retargetable query engine for different data stores. They worked hard to optimize aggregations and joins. Despite its clear database query engine routes, it's furthermore useable as a standalone ad-hoc in-memory query engine.

    Sebastian did his research for his Ph.D. at the TU-Kaiserslautern at the database systems group of Theo Härder. Theo Härder coined the well-known acronym ACID with Andreas Reuter, the desired properties of transactions.

    As he's currently not maintaining the project anymore, I stepped up and forked the project a couple of years ago. I'm using it for my evolutionary, immutable data store SirixDB[2], which stores the entire history of your JSON data in small-sized snapshots in an append-only file (tailored binary format similar to BSON). It's exceptionally well suited for audits, undo operations, and sophisticated analytical time travel queries.

    I've changed a lot of stuff, such that Brackit is getting more and more compatible with the JSONiq query language standard, added JSONiq update primitives, array slices as known from Python and fixed several bugs. Furthermore, I've added interfaces for temporal data stores, temporal XPath axis to navigate not only in space, but also in time and temporal extension functions in SirixDB, index rewrite rules, etc. pp.

    As Brackit can query XML, you're of course able to transform XML data to JSON and vice versa.

    Moshe and I are working on a Jupyter Notebook / Tutorial[3] for interactive queries.

    We're looking forward to your bug reports, issues, and questions. Contributions are, of course, highly welcome. Maybe even implementations for other data stores or common query optimizations.

    Furthermore, we'd gladly see further (university-based?) research.

    It should, for instance, be possible to add vector instructions for SIMD instructions in the future, as the query engine is already set-oriented and processes sets of tuples for the so-called FLWOR expressions (see JSONiq). Brackit rewrites FLWOR expression trees in the AST to a pipeline of operations to port optimizations from relational query engines for efficient join processing and aggregate expressions. Furthermore, certain parts of the queries are parallelizable, as detailed in Sebastian's thesis. We also envision a stage for the compiler to use distributed processing (first research used MapReduce, but we can now use better-suited approaches, of course).

    Kind regards

    Johannes

    [1] https://github.com/sirixdb/brackit

    [2] https://sirix.io | https://github.com/sirixdb/sirix

    [3] https://colab.research.google.com/drive/19eC-UfJVm_gCjY--koO...

  • sirix

    SirixDB is an an embeddable, bitemporal, append-only database system and event store, storing immutable lightweight snapshots. It keeps the full history of each resource. Every commit stores a space-efficient snapshot through structural sharing. It is log-structured and never overwrites data. SirixDB uses a novel page-level versioning approach.

  • Hi all,

    Sebastian and his students did a tremendous job creating Brackit[1] in the first place as a retargetable query engine for different data stores. They worked hard to optimize aggregations and joins. Despite its clear database query engine routes, it's furthermore useable as a standalone ad-hoc in-memory query engine.

    Sebastian did his research for his Ph.D. at the TU-Kaiserslautern at the database systems group of Theo Härder. Theo Härder coined the well-known acronym ACID with Andreas Reuter, the desired properties of transactions.

    As he's currently not maintaining the project anymore, I stepped up and forked the project a couple of years ago. I'm using it for my evolutionary, immutable data store SirixDB[2], which stores the entire history of your JSON data in small-sized snapshots in an append-only file (tailored binary format similar to BSON). It's exceptionally well suited for audits, undo operations, and sophisticated analytical time travel queries.

    I've changed a lot of stuff, such that Brackit is getting more and more compatible with the JSONiq query language standard, added JSONiq update primitives, array slices as known from Python and fixed several bugs. Furthermore, I've added interfaces for temporal data stores, temporal XPath axis to navigate not only in space, but also in time and temporal extension functions in SirixDB, index rewrite rules, etc. pp.

    As Brackit can query XML, you're of course able to transform XML data to JSON and vice versa.

    Moshe and I are working on a Jupyter Notebook / Tutorial[3] for interactive queries.

    We're looking forward to your bug reports, issues, and questions. Contributions are, of course, highly welcome. Maybe even implementations for other data stores or common query optimizations.

    Furthermore, we'd gladly see further (university-based?) research.

    It should, for instance, be possible to add vector instructions for SIMD instructions in the future, as the query engine is already set-oriented and processes sets of tuples for the so-called FLWOR expressions (see JSONiq). Brackit rewrites FLWOR expression trees in the AST to a pipeline of operations to port optimizations from relational query engines for efficient join processing and aggregate expressions. Furthermore, certain parts of the queries are parallelizable, as detailed in Sebastian's thesis. We also envision a stage for the compiler to use distributed processing (first research used MapReduce, but we can now use better-suited approaches, of course).

    Kind regards

    Johannes

    [1] https://github.com/sirixdb/brackit

    [2] https://sirix.io | https://github.com/sirixdb/sirix

    [3] https://colab.research.google.com/drive/19eC-UfJVm_gCjY--koO...

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Implementing a Merkle Tree for an Immutable Verifiable Log

    2 projects | news.ycombinator.com | 6 May 2022
  • Show HN: Brackit – a retargetable JSONiq based query engine for JSON

    3 projects | news.ycombinator.com | 1 Mar 2022
  • Show HN: Bitemporal, Binary JSON Based DBS and Event Store

    6 projects | news.ycombinator.com | 13 Nov 2023
  • Show HN: Evolutionary (binary) JSON data store (full immutable revision history)

    3 projects | news.ycombinator.com | 21 Oct 2023
  • Evolutionary, JSON data store (keeping the full revision history)

    3 projects | news.ycombinator.com | 20 Oct 2023