C++ Data Science

Open-source C++ projects categorized as Data Science

Top 23 C++ Data Science Projects

Data Science
  1. cudf

    cuDF - GPU DataFrame Library

  2. JetBrains

    Tell us how you use coding tools. You may win a prize! Are you a developer or a data analyst? Share your thoughts about your coding tools in our short survey and get a chance to win prizes!

    JetBrains logo
  3. catboost

    A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

    Project mention: 🚀 Why Your ML Service Needs Rust + CatBoost: A Setup Guide That Actually Works | dev.to | 2025-01-19

    [package] name = "MLApp" version = "0.1.0" edition = "2021" [dependencies] catboost = { git = "https://github.com/catboost/catboost", rev = "0bfdc35"}

  4. matplotplusplus

    Matplot++: A C++ Graphics Library for Data Visualization 📊🗾

  5. GraphScope

    🔨 🍇 💻 🚀 GraphScope: A One-Stop Large-Scale Graph Computing System from Alibaba | 一站式图计算系统

  6. SHOGUN

    Shōgun

  7. DataFrame

    C++ DataFrame for statistical, financial, and ML analysis in modern C++

  8. chdb

    chDB is an in-process OLAP SQL Engine 🚀 powered by ClickHouse

    Project mention: ClickHouse gets lazier (and faster): Introducing lazy materialization | news.ycombinator.com | 2025-04-22

    https://github.com/chdb-io/chdb/issues/101#issuecomment-2824...

    Ps. I work for ClickHouse

  9. Sevalla

    Deploy and host your apps and databases, now with $50 credit! Sevalla is the PaaS you have been looking for! Advanced deployment pipelines, usage-based pricing, preview apps, templates, human support by developers, and much more!

    Sevalla logo
  10. ArcticDB

    ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.

    Project mention: All Data and AI Weekly #203: 18-Aug-2025 | dev.to | 2025-08-18

    ArcticDB: A high-performance, serverless database for Python. Visit Website

  11. TileDB

    The Universal Storage Engine

    Project mention: Ask HN: Who is hiring? (February 2025) | news.ycombinator.com | 2025-02-03

    TileDB, Inc. | Full-time | REMOTE | USA, Greece | https://tiledb.com/

    TileDB is the database designed for discovery, built to organize, structure, and analyze any data. Our solutions for single-cell and population genomics are used by major pharmaceutical companies and research institutes, and power large public data collections such as the Cellxgene Discover Census. We are actively hiring for several roles building our unified data catalog, scalable computation, and interactive analysis platform.

    - Infrastructure Engineer: Kubernetes, Terraform, Argo, Grafana, Prometheus, CloudWatch, GitOps; Golang, Python, C++, or Rust (GMT -8/+4).

    - Frontend/UI developer: Typescript, React; experience with high-performance/high-volume data and visualization applications. GMT -8/+1

    We are fully-remote, with optional co-working hubs in Cambridge, MA, New York, NY, and Athens, Greece. Apply today at https://ats.rippling.com/tiledb-careers/jobs or reach out directly (email in profile).

  12. MLPP

    A library created to revitalize C++ as a machine learning front end. Per aspera ad astra.

  13. vectordb

    Epsilla is a high performance Vector Database Management System

  14. turbodbc

    Turbodbc is a Python module to access relational databases via the Open Database Connectivity (ODBC) interface. The module complies with the Python Database API Specification 2.0.

  15. oneDAL

    oneAPI Data Analytics Library (oneDAL)

  16. GPBoost

    Combining tree-boosting with Gaussian process and mixed effects models

  17. desbordante-core

    Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.

    Project mention: Show HN: Desbordante 2.3.0 is out, now supports macOS | news.ycombinator.com | 2025-02-04

    Desbordante, an open-source, high-performance data profiler that discovers and validates complex patterns in data, has released version 2.3.0. This update introduces two new patterns and adds support for macOS. Users can now install the Desbordante-core pip package on macOS via PyPi, compatible with CPython versions 3.8 through 3.13 and PyPy versions 3.7 through 3.10.

    Release notes are here: https://github.com/Desbordante/desbordante-core/releases/tag...

  18. labplot

    LabPlot is a FREE, open source and cross-platform Data Visualization and Analysis software accessible to everyone.

    Project mention: LabPlot: Free, open source and cross-platform Data Visualization and Analysis | news.ycombinator.com | 2025-08-22

    I think that's just a GitHub mirror, the actual development is happening over at the KDE GitLab

    https://invent.kde.org/education/labplot

  19. Graphia

    A visualisation tool for the creation and analysis of graphs

  20. Tiger

    C++ Matrix -- High performance and accurate (e.g. edge cases) matrix math library with expression template arithmetic operators (by hosseinmoein)

  21. nelson

    The Nelson Programming Language (by nelson-lang)

  22. secure-xgboost

    Secure collaborative training and inference for XGBoost.

  23. TileDB-VCF

    Efficient variant-call data storage and retrieval library using the TileDB storage library.

  24. MachineLearning

    From linear regression towards neural networks... (by aromanro)

  25. twinning

    Data Twinning

  26. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

C++ Data Science discussion

Log in or Post with

C++ Data Science related posts

Index

What are some of the best open-source Data Science projects in C++? This list will help you:

# Project Stars
1 cudf 9,141
2 catboost 8,545
3 matplotplusplus 4,658
4 GraphScope 3,477
5 SHOGUN 3,045
6 DataFrame 2,786
7 chdb 2,455
8 ArcticDB 2,033
9 TileDB 1,978
10 MLPP 1,105
11 vectordb 860
12 turbodbc 642
13 oneDAL 639
14 GPBoost 628
15 desbordante-core 417
16 labplot 353
17 Graphia 253
18 Tiger 121
19 nelson 110
20 secure-xgboost 105
21 TileDB-VCF 97
22 MachineLearning 25
23 twinning 24

Sponsored
Tell us how you use coding tools. You may win a prize!
Are you a developer or a data analyst? Share your thoughts about your coding tools in our short survey and get a chance to win prizes!
surveys.jetbrains.com

Did you know that C++ is
the 7th most popular programming language
based on number of references?