SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 C++ Data Science Projects
-
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
catboost
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
Project mention: 🚀 Why Your ML Service Needs Rust + CatBoost: A Setup Guide That Actually Works | dev.to | 2025-01-19[package] name = "MLApp" version = "0.1.0" edition = "2021" [dependencies] catboost = { git = "https://github.com/catboost/catboost", rev = "0bfdc35"}
-
-
GraphScope
🔨 🍇 💻 🚀 GraphScope: A One-Stop Large-Scale Graph Computing System from Alibaba | 一站式图计算系统
-
-
DataFrame
C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types and contiguous memory storage
-
Project mention: ClickHouse gets lazier (and faster): Introducing lazy materialization | news.ycombinator.com | 2025-04-22
https://github.com/chdb-io/chdb/issues/101#issuecomment-2824...
Ps. I work for ClickHouse
-
Stream
Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
-
ArcticDB
ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.
Project mention: ArcticDB: High performance, serverless DataFrame database | news.ycombinator.com | 2024-09-06 -
TileDB, Inc. | Full-time | REMOTE | USA, Greece | https://tiledb.com/
TileDB is the database designed for discovery, built to organize, structure, and analyze any data. Our solutions for single-cell and population genomics are used by major pharmaceutical companies and research institutes, and power large public data collections such as the Cellxgene Discover Census. We are actively hiring for several roles building our unified data catalog, scalable computation, and interactive analysis platform.
- Infrastructure Engineer: Kubernetes, Terraform, Argo, Grafana, Prometheus, CloudWatch, GitOps; Golang, Python, C++, or Rust (GMT -8/+4).
- Frontend/UI developer: Typescript, React; experience with high-performance/high-volume data and visualization applications. GMT -8/+1
We are fully-remote, with optional co-working hubs in Cambridge, MA, New York, NY, and Athens, Greece. Apply today at https://ats.rippling.com/tiledb-careers/jobs or reach out directly (email in profile).
-
-
-
turbodbc
Turbodbc is a Python module to access relational databases via the Open Database Connectivity (ODBC) interface. The module complies with the Python Database API Specification 2.0.
-
-
-
desbordante-core
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.
Project mention: Show HN: Desbordante 2.3.0 is out, now supports macOS | news.ycombinator.com | 2025-02-04Desbordante, an open-source, high-performance data profiler that discovers and validates complex patterns in data, has released version 2.3.0. This update introduces two new patterns and adds support for macOS. Users can now install the Desbordante-core pip package on macOS via PyPi, compatible with CPython versions 3.8 through 3.13 and PyPy versions 3.7 through 3.10.
Release notes are here: https://github.com/Desbordante/desbordante-core/releases/tag...
-
-
Tiger
C++ Matrix -- High performance and accurate (e.g. edge cases) matrix math library with expression template arithmetic operators (by hosseinmoein)
-
-
-
TileDB-VCF
Efficient variant-call data storage and retrieval library using the TileDB storage library.
-
-
-
Project mention: Show HN: Lesser Pandas – Data Analysis Library in C++ | news.ycombinator.com | 2025-05-22
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
C++ Data Science discussion
C++ Data Science related posts
-
chDB: An In-Process OLAP SQL Engine Powered by ClickHouse
-
ChDB 3.0 released, 12% faster than DuckDB
-
Show HN: SQLite like API of ClickHouse engine in Python
-
Tell HN: Causal Got Acquired
-
Kotlin DataFrame ❤️ Arrow
-
ClickHouse Based Duck-Db
-
ChDB: In-Process SQL OLAP Engine Powered by ClickHouse
-
A note from our sponsor - SaaSHub
www.saashub.com | 15 Jul 2025
Index
What are some of the best open-source Data Science projects in C++? This list will help you:
# | Project | Stars |
---|---|---|
1 | cudf | 9,033 |
2 | catboost | 8,464 |
3 | matplotplusplus | 4,622 |
4 | GraphScope | 3,457 |
5 | SHOGUN | 3,045 |
6 | DataFrame | 2,742 |
7 | chdb | 2,409 |
8 | ArcticDB | 1,977 |
9 | TileDB | 1,961 |
10 | MLPP | 1,097 |
11 | vectordb | 861 |
12 | turbodbc | 637 |
13 | oneDAL | 636 |
14 | GPBoost | 619 |
15 | desbordante-core | 407 |
16 | Graphia | 251 |
17 | Tiger | 121 |
18 | nelson | 107 |
19 | secure-xgboost | 105 |
20 | TileDB-VCF | 95 |
21 | MachineLearning | 25 |
22 | twinning | 24 |
23 | lesser_pandas | 8 |