lakehouse

Open-source projects categorized as lakehouse

Top 12 lakehouse Open-Source Projects

  • Presto

    The official home of the Presto distributed SQL query engine for big data

  • Project mention: Multi-Database Support in DuckDB | news.ycombinator.com | 2024-01-28

    We have some of this functionality in Presto (https://github.com/prestodb/presto), but it takes fair bit of work to implement it for all the different backends.

  • doris

    Apache Doris is an easy-to-use, high performance and unified analytics database.

  • Project mention: Variant in Apache Doris 2.1.0: a new data type 8 times faster than JSON for semi-structured data analysis | dev.to | 2024-03-27

    As an open-source real-time data warehouse, Apache Doris provides semi-structured data processing capabilities, and the newly-released version 2.1.0 makes a stride in this direction. Before V2.1, Apache Doris stores semi-structured data as JSON files. However, during query execution, the real-time parsing of JSON data leads to high CPU and I/O consumption in addition to high query latency, especially when the dataset is huge and complicated. Moreover, the lack of a pre-defined schema means there is no handle for query optimization.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • starrocks

    StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries. InfoWorld’s 2023 BOSSIE Award for best open source software.

  • Project mention: A MySQL compatible database engine written in pure Go | news.ycombinator.com | 2024-04-09

    tidb has been around for a while, it is distributed, written in Go and Rust, and MySQL compatible. https://github.com/pingcap/tidb

    Somewhat relatedly, StarRocks is also MySQL compatible, written in Java and C++, but it's tackling OLAP use-cases. https://github.com/StarRocks/starrocks

  • LakeSoul

    LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.

  • ytsaurus

    YTsaurus is a scalable and fault-tolerant open-source big data platform.

  • cuelake

    Use SQL to build ELT pipelines on a data lakehouse.

  • dataall

    A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • terraform-databricks-examples

    Examples of using Terraform to deploy Databricks resources

  • Project mention: I can’t terraform my company’s Databricks environment and I’m going insane. | /r/dataengineering | 2023-06-20

    Use the Databricks terraform examples the external credentials and external locations in UC should help.

  • space

    Unified storage framework for the entire machine learning lifecycle (by google)

  • Project mention: Unified storage framework for the entire machine learning lifecycle | news.ycombinator.com | 2024-02-28
  • awesome-data-temporality

    A curated list to help you manage temporal data across many modalities 🚀.

  • Project mention: FLaNK Stack Weekly for 14 Aug 2023 | dev.to | 2023-08-14
  • Local-Data-LakeHouse

    Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testing.

  • FLiPStackWeekly

    FLaNK AI Weekly covering Apache NiFi, Apache Flink, Apache Kafka, Apache Spark, Apache Iceberg, Apache Ozone, Apache Pulsar, and more...

  • Project mention: FLaNK AI-April 22, 2024 | dev.to | 2024-04-22
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

lakehouse related posts

Index

What are some of the best open-source lakehouse projects? This list will help you:

Project Stars
1 Presto 15,591
2 doris 11,314
3 starrocks 7,764
4 LakeSoul 2,301
5 ytsaurus 1,765
6 cuelake 284
7 dataall 209
8 terraform-databricks-examples 177
9 space 135
10 awesome-data-temporality 96
11 Local-Data-LakeHouse 43
12 FLiPStackWeekly 14

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com