C++ Data Science

Open-source C++ projects categorized as Data Science | Edit details

Top 10 C++ Data Science Projects

  • GitHub repo SHOGUN


  • GitHub repo matplotplusplus

    Matplot++: A C++ Graphics Library for Data Visualization 📊🗾

    Project mention: How can I create animation of mathematical function that changes over time in c++ and save it as video | reddit.com/r/cpp_questions | 2021-12-27
  • Scout APM

    Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.

  • GitHub repo TileDB

    The Universal Storage Engine

    Project mention: TileDB VS Activeloop hub - a user suggested alternative | libhunt.com/r/TileDB | 2021-10-20
  • GitHub repo turbodbc

    Turbodbc is a Python module to access relational databases via the Open Database Connectivity (ODBC) interface. The module complies with the Python Database API Specification 2.0.

    Project mention: arrow-odbc: Fetch arrow arrays from an ODBC data source in a pip installable environment | reddit.com/r/Python | 2021-11-23

    turbodbc is great, but a pain to build, at least without conda. arrow-odbc-py uses cffi (rather than PyO3) to talk to a rust backend and than uses the Arrow C Data interface to provide the user with pyarrow compatible arrow arrays. The use of a dedicated C interface in both places, avoids linking directly against the Python C-Interpreter as well as the specific C++ Arrow libraries your pyarrow version depends on. Avoiding some pain of dependency hell.

  • GitHub repo GPBoost

    Combining tree-boosting with Gaussian process and mixed effects models

    Project mention: fabsig/GPBoost: Combining tree-boosting with Gaussian process and mixed effects models | reddit.com/r/learnmachinelearning | 2021-06-25
  • GitHub repo Graphia

    A visualisation tool for the creation and analysis of graphs

    Project mention: Handbook of Graph Drawing and Visualization | news.ycombinator.com | 2021-12-30
  • GitHub repo secure-xgboost

    Secure collaborative training and inference for XGBoost.

    Project mention: Announcing MC²: Securely perform analytics and machine learning on confidential data | dev.to | 2021-06-17

    The MC2 Compute Services: MC2 offers several compute services: these include Spark SQL, distributed XGBoost, and secure aggregation for federated learning. All are intended to run in a primarily untrusted environment, such as a cluster of machines hosted on a public cloud, that has support for trusted execution environments (hardware enclaves). Data is encrypted in transit using a client key and only ever decrypted inside hardware enclaves, providing the previously mentioned security guarantees for data-in-use. For all compute services, MC2 leverages the Open Enclave SDK, a project intended to provide a consistent API for a variety of different enclave architectures.

  • SonarLint

    Deliver Cleaner and Safer Code - Right in Your IDE of Choice!. SonarLint is a free and open source IDE extension that identifies and catches bugs and vulnerabilities as you code, directly in the IDE. Install from your favorite IDE marketplace today.

  • GitHub repo Matrix

    C++ Matrix -- High performance and accurate (e.g. edge cases) matrix math library with expression template arithmetic operators (by hosseinmoein)

    Project mention: Introducing The C++ DataFrame for data analysis | reddit.com/r/datascience | 2021-11-01

    Just to be clear, DataFrame is not a matrix. It is a data-frame. If you want a fast C++ matrix you can take a look at https://github.com/hosseinmoein/Matrix

  • GitHub repo TileDB-VCF

    Efficient variant-call data storage and retrieval library using the TileDB storage library.

    Project mention: [TileDB webinar] Population genomics is a data management problem | reddit.com/r/bioinformatics | 2021-10-20

    Here are the docs to the open-source TileDB-VCF storage engine: https://docs.tiledb.com/main/integrations-and-extensions/population-genomics

  • GitHub repo nelson

    Nelson numerical interpreter

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2021-12-30.

C++ Data Science related posts


What are some of the best open-source Data Science projects in C++? This list will help you:

Project Stars
1 SHOGUN 2,863
2 matplotplusplus 2,474
3 TileDB 1,257
4 turbodbc 495
5 GPBoost 282
6 Graphia 123
7 secure-xgboost 74
8 Matrix 58
9 TileDB-VCF 38
10 nelson 38
Find remote jobs at our new job board 99remotejobs.com. There are 29 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
OPS - Build and Run Open Source Unikernels
Quickly and easily build and deploy open source unikernels in tens of seconds. Deploy in any language to any cloud.