Are you a developer or a data analyst? Share your thoughts about your coding tools in our short survey and get a chance to win prizes! Learn more →
Top 13 C++ Big Data Projects
-
-
JetBrains
Tell us how you use coding tools. You may win a prize! Are you a developer or a data analyst? Share your thoughts about your coding tools in our short survey and get a chance to win prizes!
-
NebulaGraph Database
A distributed, fast open-source graph database featuring horizontal scalability and high availability (by vesoft-inc)
-
catboost
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
Project mention: 🚀 Why Your ML Service Needs Rust + CatBoost: A Setup Guide That Actually Works | dev.to | 2025-01-19[package] name = "MLApp" version = "0.1.0" edition = "2021" [dependencies] catboost = { git = "https://github.com/catboost/catboost", rev = "0bfdc35"}
-
GraphScope
🔨 🍇 💻 🚀 GraphScope: A One-Stop Large-Scale Graph Computing System from Alibaba | 一站式图计算系统
-
-
ArcticDB
ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.
ArcticDB: A high-performance, serverless database for Python. Visit Website
-
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
-
PGM-index
🏅State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes
Project mention: PGM-index learned data structure with lookup, range, updates with OOM less space | news.ycombinator.com | 2025-07-22 -
-
ustore
Multi-Modal Database replacing MongoDB, Neo4J, and Elastic with 1 faster ACID solution, with NetworkX and Pandas interfaces, and bindings for C 99, C++ 17, Python 3, Java, GoLang 🗄️
-
-
C++ Big Data discussion
C++ Big Data related posts
-
ClickHouse raises $350M Series C
-
Apache Iceberg
-
ArcticDB: High performance, serverless DataFrame database
-
ArcticDB: Why a Hedge Fund Built Its Own Database
-
Garage: Open-Source Distributed Object Storage
-
Fair Benchmarking Considered Difficult (2018) [pdf]
-
Ask HN: Where to Store Logs?
-
A note from our sponsor - JetBrains
surveys.jetbrains.com | 1 Sep 2025
Index
What are some of the best open-source Big Data projects in C++? This list will help you:
# | Project | Stars |
---|---|---|
1 | ClickHouse | 42,570 |
2 | NebulaGraph Database | 11,627 |
3 | catboost | 8,545 |
4 | GraphScope | 3,477 |
5 | ytsaurus | 2,072 |
6 | ArcticDB | 2,033 |
7 | kudu | 1,888 |
8 | MyScaleDB | 984 |
9 | PGM-index | 826 |
10 | oneDAL | 639 |
11 | ustore | 603 |
12 | incubator-graphar | 293 |
13 | nebula | 154 |