Trying Delta Lake at home

This page summarizes the projects mentioned and recommended in the original post on /r/dataengineering

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • delta-docs

    Delta Lake Documentation

  • trino-getting-started

  • https://github.com/bitsondatadev/trino-getting-started/tree/main/delta-lake => Trino (Presto "equivalent") + delta lake format + Minio (s3 equivalent)

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • docker-spark-deltalake

    Discontinued Docker image for running SparkSQL Thrift server

  • dbt-spark

    dbt-spark contains all of the code enabling dbt to work with Apache Spark and Databricks

  • Spark + dbt => https://github.com/dbt-labs/dbt-spark/blob/main/docker-compose.yml

  • delta

    An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs (by delta-io)

  • There is an open PR for delta docker => https://github.com/delta-io/delta/pull/922

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts