C++ Big Data

Open-source C++ projects categorized as Big Data

Top 13 C++ Big Data Projects

  1. ClickHouse

    ClickHouse® is a real-time analytics database management system

    Project mention: Strategies for Fast Lexers | news.ycombinator.com | 2025-07-14
  2. JetBrains

    Tell us how you use coding tools. You may win a prize! Are you a developer or a data analyst? Share your thoughts about your coding tools in our short survey and get a chance to win prizes!

    JetBrains logo
  3. NebulaGraph Database

    A distributed, fast open-source graph database featuring horizontal scalability and high availability (by vesoft-inc)

  4. catboost

    A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

    Project mention: 🚀 Why Your ML Service Needs Rust + CatBoost: A Setup Guide That Actually Works | dev.to | 2025-01-19

    [package] name = "MLApp" version = "0.1.0" edition = "2021" [dependencies] catboost = { git = "https://github.com/catboost/catboost", rev = "0bfdc35"}

  5. GraphScope

    🔨 🍇 💻 🚀 GraphScope: A One-Stop Large-Scale Graph Computing System from Alibaba | 一站式图计算系统

  6. ytsaurus

    YTsaurus is a scalable and fault-tolerant open-source big data platform.

  7. ArcticDB

    ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.

    Project mention: All Data and AI Weekly #203: 18-Aug-2025 | dev.to | 2025-08-18

    ArcticDB: A high-performance, serverless database for Python. Visit Website

  8. kudu

    Mirror of Apache Kudu (by apache)

  9. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  10. MyScaleDB

    A @ClickHouse fork that supports high-performance vector search and full-text search.

  11. PGM-index

    🏅State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes

    Project mention: PGM-index learned data structure with lookup, range, updates with OOM less space | news.ycombinator.com | 2025-07-22
  12. oneDAL

    oneAPI Data Analytics Library (oneDAL)

  13. ustore

    Multi-Modal Database replacing MongoDB, Neo4J, and Elastic with 1 faster ACID solution, with NetworkX and Pandas interfaces, and bindings for C 99, C++ 17, Python 3, Java, GoLang 🗄️

  14. incubator-graphar

    An open source, standard data file format for graph data storage and retrieval.

  15. nebula

    A distributed block-based data storage and compute engine (by varchar-io)

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

C++ Big Data discussion

Log in or Post with

C++ Big Data related posts

Index

What are some of the best open-source Big Data projects in C++? This list will help you:

# Project Stars
1 ClickHouse 42,570
2 NebulaGraph Database 11,627
3 catboost 8,545
4 GraphScope 3,477
5 ytsaurus 2,072
6 ArcticDB 2,033
7 kudu 1,888
8 MyScaleDB 984
9 PGM-index 826
10 oneDAL 639
11 ustore 603
12 incubator-graphar 293
13 nebula 154

Sponsored
Tell us how you use coding tools. You may win a prize!
Are you a developer or a data analyst? Share your thoughts about your coding tools in our short survey and get a chance to win prizes!
surveys.jetbrains.com

Did you know that C++ is
the 7th most popular programming language
based on number of references?