Olap

Open-source projects categorized as Olap

Top 23 Olap Open-Source Projects

  • ClickHouse

    ClickHouseยฎ is a free analytics DBMS for big data

  • Project mention: We Built a 19 PiB Logging Platform with ClickHouse and Saved Millions | news.ycombinator.com | 2024-04-02

    Yes, we are working on it! :) Taking some of the learnings from current experimental JSON Object datatype, we are now working on what will become the production-ready implementation. Details here: https://github.com/ClickHouse/ClickHouse/issues/54864

    Variant datatype is already available as experimental in 24.1, Dynamic datatype is WIP (PR almost ready), and JSON datatype is next up. Check out the latest comment on that issue with how the Dynamic datatype will work: https://github.com/ClickHouse/ClickHouse/issues/54864#issuec...

  • duckdb

    DuckDB is an in-process SQL OLAP Database Management System

  • Project mention: ๐Ÿช„ DuckDB sql hack : get things SORTED w/ constraint CHECK | dev.to | 2024-04-04
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • doris

    Apache Doris is an easy-to-use, high performance and unified analytics database.

  • Project mention: Variant in Apache Doris 2.1.0: a new data type 8 times faster than JSON for semi-structured data analysis | dev.to | 2024-03-27

    As an open-source real-time data warehouse, Apache Doris provides semi-structured data processing capabilities, and the newly-released version 2.1.0 makes a stride in this direction. Before V2.1, Apache Doris stores semi-structured data as JSON files. However, during query execution, the real-time parsing of JSON data leads to high CPU and I/O consumption in addition to high query latency, especially when the dataset is huge and complicated. Moreover, the lack of a pre-defined schema means there is no handle for query optimization.

  • starrocks

    StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries. InfoWorldโ€™s 2023 BOSSIE Award for best open source software.

  • Project mention: A MySQL compatible database engine written in pure Go | news.ycombinator.com | 2024-04-09

    tidb has been around for a while, it is distributed, written in Go and Rust, and MySQL compatible. https://github.com/pingcap/tidb

    Somewhat relatedly, StarRocks is also MySQL compatible, written in Java and C++, but it's tackling OLAP use-cases. https://github.com/StarRocks/starrocks

  • databend

    ๐——๐—ฎ๐˜๐—ฎ, ๐—”๐—ป๐—ฎ๐—น๐˜†๐˜๐—ถ๐—ฐ๐˜€ & ๐—”๐—œ. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com

  • Project mention: Solutions to manage runaway Snowflake costs? | news.ycombinator.com | 2024-01-16

    Databend vs. Snowflake: https://github.com/datafuselabs/databend/issues/13059

  • datafusion

    Apache DataFusion SQL Query Engine

  • Project mention: Velox: Meta's Unified Execution Engine [pdf] | news.ycombinator.com | 2024-03-25

    Python's Substrait seems like the biggest/most-used competitor-ish out there. I'd love some compare & contrast; my sense is that Substrait has a smaller ambition, and more wants to be a language for talking about execution rather than a full on execution engine. https://github.com/substrait-io/substrait

    We can also see from the DataFusion discussion that they too see themselves as a bit of a Velox competitor. https://github.com/apache/arrow-datafusion/discussions/6441

  • Crate

    CrateDB is a distributed and scalable SQL database for storing and analyzing massive amounts of data in near real-time, even with complex queries. It is PostgreSQL-compatible, and based on Lucene.

  • Project mention: FLaNK AI - 01 April 2024 | dev.to | 2024-04-01
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • heavydb

    HeavyDB (formerly OmniSciDB)

  • chdb

    chDB is an embedded OLAP SQL Engine ๐Ÿš€ powered by ClickHouse

  • Project mention: FLaNK Stack Weekly 06 Nov 2023 | dev.to | 2023-11-06
  • matrixone

    Hyperconverged cloud-edge native database

  • Project mention: Push or Pull, is this a question? | dev.to | 2023-08-09

    Source code๏ผšmatrixorigin/matrixone: Hyperconverged cloud-edge native database (github.com)

  • risinglight

    An educational OLAP database system.

  • Cubes

    [NOT MAINTAINED] Light-weight Python OLAP framework for multi-dimensional data analysis

  • datafusion-ballista

    Apache Arrow Ballista Distributed Query Engine

  • Project mention: Polars | news.ycombinator.com | 2024-01-08

    Not super on topic because this is all immature and not integrated with one another yet, but there is a scaled-out rust data-frames-on-arrow implementation called ballista that could maybe? form the backend of a polars scale out approach: https://github.com/apache/arrow-ballista

  • mondrian

    Mondrian is an Online Analytical Processing (OLAP) server that enables business users to analyze large quantities of data in real-time.

  • scratchdata

    Scratch is a swiss army knife for big data.

  • Project mention: Debugging a Golang Bug with Non-Blocking Reads | news.ycombinator.com | 2024-03-12

    Go team does acknowledge [1] it as a bug, so there is some point here

    However, that said, I wonder if OP (duckdb) could have written their solution [2] differently. Shouldn't they be able to select from a Pipe as well as Error channel simultaneously? (similar to how they are doing it inside here [3]). If not, I would have create a go-routine that does blocking read on the Pipe and then pass it on to another channel to select on.

    [1] https://github.com/golang/go/issues/66239

    [2] https://github.com/scratchdata/scratchdata/blob/7c1a0fcd0e20...

    [3] https://github.com/scratchdata/scratchdata/blob/7c1a0fcd0e20...

  • kuzu

    Embeddable property graph database management system built for query speed and scalability. Implements Cypher.

  • Project mention: Unum: Vector Search engine in a single file | news.ycombinator.com | 2023-07-31
  • duckdb-wasm

    WebAssembly version of DuckDB

  • Project mention: Parquet-WASM: Rust-based WebAssembly bindings to read and write Parquet data | news.ycombinator.com | 2024-04-22

    i think duckdb-wasm is closer to 6MB over wire, but ~36MB once decompressed. (see net panel when loading https://shell.duckdb.org/)

    the decompressed size should be okay since it's not the same as parsing and JITing 36MB of JS.

  • stonedb

    StoneDB is an Open-Source MySQL HTAP and MySQL-Native DataBase for OLTP, Real-Time Analytics, a counterpart of MySQLHeatWave. (https://stonedb.io)

  • stanchion

    A SQLite extension that brings column-oriented tables to SQLite

  • Project mention: Show HN: Stanchion โ€“ Column-oriented tables in SQLite | news.ycombinator.com | 2024-01-31

    The "Data Storage Internals" section[1] of the README sounds to me like it has its own column-oriented format for these tables, at least that's how I'm reading the part about segments. Is that the case? If so, have you tried using Apache Arrow or Parquet to see how they compare?

    [1] https://github.com/dgllghr/stanchion#data-storage-internals

  • ClickBench

    ClickBench: a Benchmark For Analytical Databases

  • Project mention: Umbra: A Disk-Based System with In-Memory Performance [pdf] | news.ycombinator.com | 2024-05-02

    Benchmarks: https://benchmark.clickhouse.com

    So definitely compared against PostgreSQL, MariaDB it is significantly faster.

    On par with lower-end Snowflake.

  • inline-sql

    ๐Ÿช„ Inline SQL in any Python program

  • duckdb-rs

    Ergonomic bindings to duckdb for Rust

  • KuiBaDB

    Another OLAP database

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Olap related posts

  • Umbra: A Disk-Based System with In-Memory Performance [pdf]

    3 projects | news.ycombinator.com | 2 May 2024
  • ๐Ÿช„ DuckDB sql hack : get things SORTED w/ constraint CHECK

    1 project | dev.to | 4 Apr 2024
  • We Built a 19 PiB Logging Platform with ClickHouse and Saved Millions

    1 project | news.ycombinator.com | 2 Apr 2024
  • Build time is a collective responsibility

    2 projects | news.ycombinator.com | 24 Mar 2024
  • Variant in Apache Doris 2.1.0: a new data type 8 times faster than JSON for semi-structured data analysis

    2 projects | dev.to | 27 Mar 2024
  • 42.parquet โ€“ A Zip Bomb for the Big Data Age

    1 project | news.ycombinator.com | 26 Mar 2024
  • DuckDB: Move to push-based execution model (2021)

    1 project | news.ycombinator.com | 15 Mar 2024
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 4 May 2024
    Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more โ†’

Index

What are some of the best open-source Olap projects? This list will help you:

Project Stars
1 ClickHouse 34,269
2 duckdb 16,749
3 doris 11,363
4 starrocks 7,789
5 databend 7,214
6 datafusion 5,086
7 Crate 3,965
8 heavydb 2,903
9 chdb 1,702
10 matrixone 1,679
11 risinglight 1,538
12 Cubes 1,490
13 datafusion-ballista 1,288
14 mondrian 1,120
15 scratchdata 1,034
16 kuzu 1,031
17 duckdb-wasm 924
18 stonedb 851
19 stanchion 618
20 ClickBench 571
21 inline-sql 412
22 duckdb-rs 365
23 KuiBaDB 311

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com