C++ Big Data

Open-source C++ projects categorized as Big Data

Top 10 C++ Big Data Projects

  • ClickHouse

    ClickHouse® is a free analytics DBMS for big data

    Project mention: Postgresql on NVME SSD slow performance tuning/monitoring | reddit.com/r/btrfs | 2023-03-27

    The problem with clickhouse is that it doesn't have some algorithms implemented when data doesn't fit memory, and just fails with OOM: https://github.com/ClickHouse/ClickHouse/issues/47521

  • NebulaGraph Database

    A distributed, fast open-source graph database featuring horizontal scalability and high availability (by vesoft-inc)

    Project mention: What is a NoSQL Graph Database? | dev.to | 2023-01-09

    A NoSQL graph database is a type of non-relational, distributed database which employs a graph model. NoSQL stands for “Not only SQL” and refers to a new breed of databases that differ from traditional relational databases in their data model and performance. Graph databases are especially useful for data associated with relationships—everything from friendships on social netwo#rks to equipment supply chains or business processes. They can quickly traverse vast amounts of linked data points to discover insights and hidden connections between entities, making them ideal for network analysis– such as financial fraud detection, recommendation engines and many other use cases– all while performing at scale.

  • InfluxDB

    Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.

  • kudu

    Mirror of Apache Kudu (by apache)

    Project mention: Tencent Data Engineer: Why We Went from ClickHouse to Apache Doris? | reddit.com/r/dataengineering | 2023-03-10

    Really interested in partial updates, but haven't found any information on how physically the merges/upserts happen. It would be great if a doc like https://github.com/apache/kudu/blob/master/docs/design-docs/tablet.md existed for apache doris.

  • ytsaurus

    YTsaurus is a scalable and fault-tolerant open-source big data platform.

    Project mention: Yandex open-sources its exabyte-scale big data platform | news.ycombinator.com | 2023-03-22
  • PGM-index

    🏅State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes

    Project mention: Piecewise Geometric Model Index | news.ycombinator.com | 2022-07-05
  • oneDAL

    oneAPI Data Analytics Library (oneDAL)

    Project mention: Is there a no-compromise (presumably C/C++) platform similar to Apache Spark? | reddit.com/r/dataengineering | 2022-07-27
  • ukv

    Replacing MongoDB, Neo4J, and Elastic with 1 transactional database. Features: zero-copy semantics, swappable backends, bindings for C, C++, Python, Java, GoLang

    Project mention: Up to 100x Faster FastAPI with simdjson and io_uring on Linux 5.19+ | reddit.com/r/programming | 2023-03-06

    Just to clarify, I meant in other projects, like the UKV.

  • Sonar

    Write Clean C++ Code. Always.. Sonar helps you commit clean C++ code every time. With over 550 unique rules to find C++ bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.

  • nebula

    A distributed block-based data storage and compute engine (by varchar-io)

    Project mention: Show HN: Turn any data into a fast analytical API | news.ycombinator.com | 2022-04-10

    we use our in-house baked engine - open sourced here https://github.com/varchar-io/nebula

    Yeah, Tinybird has lots of similarities, I will do more research on it, thanks for the reference.

  • GraphAr

    An open source, standard data file format for graph data storage and retrieval

    Project mention: Show HN: GraphAr – Open-source file format for archiving/exchanging graph data | news.ycombinator.com | 2023-03-06
  • ReductStore

    A time series database for storing and managing large amounts of blob data

    Project mention: CLI Client for ReductStore v0.8.0 has been released | dev.to | 2023-03-09

    Hey, I've released version 0.8.0 of Reduct CLI, the Python package for managing data stored in ReductStore. This release includes two new features that will be particularly helpful for our public datasets hosted on ReductStore, where metadata can be used to provide important context for the data.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-03-27.

C++ Big Data related posts


What are some of the best open-source Big Data projects in C++? This list will help you:

Project Stars
1 ClickHouse 27,646
2 NebulaGraph Database 8,891
3 kudu 1,698
4 ytsaurus 1,273
5 PGM-index 674
6 oneDAL 549
7 ukv 257
8 nebula 132
9 GraphAr 54
10 ReductStore 35
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives