Building a Distributed Data Warehouse Without Data Lakes

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • bacalhau

    Compute over Data framework for public, transparent, and optionally verifiable computation

  • I think it's more generalized distributed compute, with OLAP being one of the use cases. There are a lot of videos on YouTube if you search "Bacalhau cluster" or similar, including what appears to be their legacy channel [0] (which I saw a year ago and found very impressive) and a newer one [1], but also conference talks from other channels.

    I haven't been following the project, but I remember finding the original demo to be very impressive - just don't remember the details off-hand. Seems like a lot of work has taken place since then. Docs are here [2]

    [0] https://www.youtube.com/@bacalhau3295

    [1] https://www.youtube.com/@bacalhauproject

    [2] https://docs.bacalhau.org/

  • duckdb

    DuckDB is an in-process SQL OLAP Database Management System

  • It's an interesting question!

    The problem is that the data is spread everywhere - no choice about that. So with that in mind, how do you query that data? Today, the idea is that you HAVE to put it into a central location. With tools like Bacalhau[1] and DuckDB [2], you no longer have to - a single query can be sharded amongst all your data - EFFECTIVELY giving you a lot of what you want from a data lake.

    It's not a replacement, but if you can do a few of these items WITHOUT moving the data, you will be able to see really significant cost and time savings.

    [1] https://github.com/bacalhau-project/bacalhau

    [2] https://github.com/duckdb/duckdb

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts