Are you a developer or a data analyst? Share your thoughts about your coding tools in our short survey and get a chance to win prizes! Learn more →
Top 23 C++ Data Science Projects
-
-
JetBrains
Tell us how you use coding tools. You may win a prize! Are you a developer or a data analyst? Share your thoughts about your coding tools in our short survey and get a chance to win prizes!
-
catboost
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
Project mention: 🚀 Why Your ML Service Needs Rust + CatBoost: A Setup Guide That Actually Works | dev.to | 2025-01-19[package] name = "MLApp" version = "0.1.0" edition = "2021" [dependencies] catboost = { git = "https://github.com/catboost/catboost", rev = "0bfdc35"}
-
-
GraphScope
🔨 🍇 💻 🚀 GraphScope: A One-Stop Large-Scale Graph Computing System from Alibaba | 一站式图计算系统
-
-
-
Project mention: ClickHouse gets lazier (and faster): Introducing lazy materialization | news.ycombinator.com | 2025-04-22
https://github.com/chdb-io/chdb/issues/101#issuecomment-2824...
Ps. I work for ClickHouse
-
Sevalla
Deploy and host your apps and databases, now with $50 credit! Sevalla is the PaaS you have been looking for! Advanced deployment pipelines, usage-based pricing, preview apps, templates, human support by developers, and much more!
-
ArcticDB
ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.
ArcticDB: A high-performance, serverless database for Python. Visit Website
-
TileDB, Inc. | Full-time | REMOTE | USA, Greece | https://tiledb.com/
TileDB is the database designed for discovery, built to organize, structure, and analyze any data. Our solutions for single-cell and population genomics are used by major pharmaceutical companies and research institutes, and power large public data collections such as the Cellxgene Discover Census. We are actively hiring for several roles building our unified data catalog, scalable computation, and interactive analysis platform.
- Infrastructure Engineer: Kubernetes, Terraform, Argo, Grafana, Prometheus, CloudWatch, GitOps; Golang, Python, C++, or Rust (GMT -8/+4).
- Frontend/UI developer: Typescript, React; experience with high-performance/high-volume data and visualization applications. GMT -8/+1
We are fully-remote, with optional co-working hubs in Cambridge, MA, New York, NY, and Athens, Greece. Apply today at https://ats.rippling.com/tiledb-careers/jobs or reach out directly (email in profile).
-
-
-
turbodbc
Turbodbc is a Python module to access relational databases via the Open Database Connectivity (ODBC) interface. The module complies with the Python Database API Specification 2.0.
-
-
-
desbordante-core
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.
Project mention: Show HN: Desbordante 2.3.0 is out, now supports macOS | news.ycombinator.com | 2025-02-04Desbordante, an open-source, high-performance data profiler that discovers and validates complex patterns in data, has released version 2.3.0. This update introduces two new patterns and adds support for macOS. Users can now install the Desbordante-core pip package on macOS via PyPi, compatible with CPython versions 3.8 through 3.13 and PyPy versions 3.7 through 3.10.
Release notes are here: https://github.com/Desbordante/desbordante-core/releases/tag...
-
labplot
LabPlot is a FREE, open source and cross-platform Data Visualization and Analysis software accessible to everyone.
Project mention: LabPlot: Free, open source and cross-platform Data Visualization and Analysis | news.ycombinator.com | 2025-08-22I think that's just a GitHub mirror, the actual development is happening over at the KDE GitLab
https://invent.kde.org/education/labplot
-
-
Tiger
C++ Matrix -- High performance and accurate (e.g. edge cases) matrix math library with expression template arithmetic operators (by hosseinmoein)
-
-
-
TileDB-VCF
Efficient variant-call data storage and retrieval library using the TileDB storage library.
-
-
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
C++ Data Science discussion
C++ Data Science related posts
-
chDB: An In-Process OLAP SQL Engine Powered by ClickHouse
-
ChDB 3.0 released, 12% faster than DuckDB
-
Show HN: SQLite like API of ClickHouse engine in Python
-
Tell HN: Causal Got Acquired
-
Kotlin DataFrame ❤️ Arrow
-
ClickHouse Based Duck-Db
-
ChDB: In-Process SQL OLAP Engine Powered by ClickHouse
-
A note from our sponsor - JetBrains
surveys.jetbrains.com | 1 Sep 2025
Index
What are some of the best open-source Data Science projects in C++? This list will help you:
# | Project | Stars |
---|---|---|
1 | cudf | 9,141 |
2 | catboost | 8,545 |
3 | matplotplusplus | 4,658 |
4 | GraphScope | 3,477 |
5 | SHOGUN | 3,045 |
6 | DataFrame | 2,786 |
7 | chdb | 2,455 |
8 | ArcticDB | 2,033 |
9 | TileDB | 1,978 |
10 | MLPP | 1,105 |
11 | vectordb | 860 |
12 | turbodbc | 642 |
13 | oneDAL | 639 |
14 | GPBoost | 628 |
15 | desbordante-core | 417 |
16 | labplot | 353 |
17 | Graphia | 253 |
18 | Tiger | 121 |
19 | nelson | 110 |
20 | secure-xgboost | 105 |
21 | TileDB-VCF | 97 |
22 | MachineLearning | 25 |
23 | twinning | 24 |