Trustfall: How to Query (Almost) Everything

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • trustfall

    A query engine for any combination of data sources. Query your files and APIs as if they were databases!

  • As I wrote in reply to your other comment, the join algorithms are the adapter's choice. I wouldn't necessarily recommend using it as a SQL replacement, though.

    There's no measurable MoM growth yet. This is the first major look the broader community has had at the system outside of a relatively small conference where I gave a talk, and demos I've given to friends and colleagues. I wasn't planning on submitting to HN for another few months -- but someone else beat me to it today :)

    There are production-grade adapters for HackerNews APIs and for Rust's rustdoc JSON format, used by the Playground (https://play.predr.ag/hackernews) and the Rust semver-checking linter cargo-semver-checks, respectively. There are also a few more demo adapters in the Trustfall repo itself: https://github.com/obi1kenobi/trustfall I expect the number of adapters to grow significantly in the coming months, so please stay tuned and let me know if you have datasets you'd like to try it with!

    The "largest dataset" is a bit of a trick question: how big is the dataset of "all HackerNews data available via its Firebase and Algolia APIs"? Because that's what the Playground queries.

    The Rust semver linter has been used by dozens if not hundreds of crates, and the JSON payloads in question there are in the 100-400MB range. The example in this blog post runs 40 quite complex Trustfall queries (they express semver rules!) over 400MB across two JSON files in 8 seconds: https://predr.ag/blog/speeding-up-rust-semver-checking-by-ov...

    You can also see more real-world use cases in this talk I gave last year: https://www.hytradboi.com/2022/how-to-query-almost-everythin...

    Probably around 200-300 people or so have written queries in the language, with experience levels ranging from seasoned engineers to Excel analysts and SQL analysts with no programming experience outside of that domain. The language itself is actually a refinement from an earlier open-source project I developed and open-sourced through my previous job. That open-source project was used to query everything from a multi-TB SQL cluster to APIs to ML models, and with Trustfall I've taken the opportunity to revisit and update design decisions that the previous project got to regret in retrospect.

  • steampipe

    Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No DB required.

  • Readers who like SQL may also enjoy Steampipe [1], an open source tool to live query 99+ services with SQL (e.g. AWS, GitHub, CSV, Kubernetes, etc). It uses Postgres Foreign Data Wrappers under the hood and supports joins etc across the services. (Disclaimer - I'm a lead on the project.)

    1 - https://github.com/turbot/steampipe

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • cloudquery

    The open source high performance ELT framework powered by Apache Arrow

  • Also relevant - High Performance Open Source ELT Framework - https://github.com/cloudquery/cloudquery

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts