delta-lake

Open-source projects categorized as delta-lake

Top 15 delta-lake Open-Source Projects

  • doris

    Apache Doris is an easy-to-use, high performance and unified analytics database.

  • Project mention: Variant in Apache Doris 2.1.0: a new data type 8 times faster than JSON for semi-structured data analysis | dev.to | 2024-03-27

    As an open-source real-time data warehouse, Apache Doris provides semi-structured data processing capabilities, and the newly-released version 2.1.0 makes a stride in this direction. Before V2.1, Apache Doris stores semi-structured data as JSON files. However, during query execution, the real-time parsing of JSON data leads to high CPU and I/O consumption in addition to high query latency, especially when the dataset is huge and complicated. Moreover, the lack of a pre-defined schema means there is no handle for query optimization.

  • Trino

    Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

  • Project mention: Trino: Fast distributed SQL query engine for big data analytics | news.ycombinator.com | 2024-03-19
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • starrocks

    StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries. InfoWorld’s 2023 BOSSIE Award for best open source software.

  • Project mention: A MySQL compatible database engine written in pure Go | news.ycombinator.com | 2024-04-09

    tidb has been around for a while, it is distributed, written in Go and Rust, and MySQL compatible. https://github.com/pingcap/tidb

    Somewhat relatedly, StarRocks is also MySQL compatible, written in Java and C++, but it's tackling OLAP use-cases. https://github.com/StarRocks/starrocks

  • delta

    An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs (by delta-io)

  • Project mention: Delta Lake vs. Parquet: A Comparison | news.ycombinator.com | 2024-01-19

    Delta is pretty great, let's you do upserts into tables in DataBricks much easier than without it.

    I think the website is here: https://delta.io

  • roapi

    Create full-fledged APIs for slowly moving datasets without writing a single line of code.

  • Project mention: Full-fledged APIs for slowly moving datasets without writing code | news.ycombinator.com | 2023-10-25
  • delta-rs

    A native Rust library for Delta Lake, with bindings into Python

  • Project mention: Delta-rs – a Rust-based implementation of deltalake | news.ycombinator.com | 2024-04-08
  • LearningSparkV2

    This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • delta-sharing

    An open protocol for secure data sharing

  • Project mention: Azure data lake - Data Share | /r/dataengineering | 2023-06-29
  • incubator-xtable

    Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.

  • Project mention: FLaNK AI-April 22, 2024 | dev.to | 2024-04-22
  • seafowl

    Analytical database for data-driven Web applications 🪶

  • Project mention: Gcsfuse: A user-space file system for interacting with Google Cloud Storage | news.ycombinator.com | 2023-09-06

    In case you're interested in scale-to-zero database hosting, a few months ago I paired gcsfuse with Seafowl [0][1], an early stage open source database written in Rust. Was a lot of fun balancing tradeoffs that are usually not possible with classical databases e.g. Postgres. Thank you gcsfuse contributors.

    [0] https://seafowl.io

  • amazon-sagemaker-local-mode

    Amazon SageMaker Local Mode Examples

  • Project mention: Debugging Python Code in Amazon SageMaker Locally Using Visual Studio Code and PyCharm: A Step-by-Step Guide | dev.to | 2023-11-15

    git clone https://github.com/aws-samples/amazon-sagemaker-local-mode/ cd amazon-sagemaker-local-mode/general_pipeline_local_debug python3 -m venv .venv source .venv/bin/activate pip install jupyter jupyter lab

  • delta-sharing-rs

    A Minimalistic Rust Implementation of Delta Sharing Server.

  • delta-go

    Native Delta Lake Implementation in Go

  • Project mention: Delta-go supports Azure Blob now | /r/golang | 2023-05-26
  • delta-buddy

    Introducing Delta-Buddy: Your ultimate Delta Lake companion! 🚀 Streamline your data journey with an AI-powered chatbot. Ask Delta-Buddy anything about your Delta Lake.

  • Project mention: A ChatBot with open source LLM to ask questions on your Delta Lake | news.ycombinator.com | 2023-06-18
  • delta-fetch

    HTTP API on Delta Lake tables

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

delta-lake related posts

  • Delta-rs – a Rust-based implementation of deltalake

    1 project | news.ycombinator.com | 8 Apr 2024
  • Delta Lake vs. Parquet: A Comparison

    2 projects | news.ycombinator.com | 19 Jan 2024
  • [D] Is there other better data format for LLM to generate structured data?

    1 project | /r/MachineLearning | 10 Dec 2023
  • OneTable is now live | Table format interoperability is not a dream anymore

    1 project | /r/dataengineering | 19 Nov 2023
  • Delta vs Iceberg: make love not war

    1 project | /r/MicrosoftFabric | 30 Jun 2023
  • Azure data lake - Data Share

    1 project | /r/dataengineering | 29 Jun 2023
  • Databricks Strikes $1.3B Deal for Generative AI Startup MosaicML

    4 projects | news.ycombinator.com | 26 Jun 2023
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 8 May 2024
    Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source delta-lake projects? This list will help you:

Project Stars
1 doris 11,389
2 Trino 9,597
3 starrocks 7,789
4 delta 6,919
5 roapi 3,087
6 delta-rs 1,833
7 LearningSparkV2 1,095
8 delta-sharing 676
9 incubator-xtable 692
10 seafowl 358
11 amazon-sagemaker-local-mode 230
12 delta-sharing-rs 70
13 delta-go 34
14 delta-buddy 9
15 delta-fetch 1

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com