C++ Data Analysis

Open-source C++ projects categorized as Data Analysis | Edit details

Top 11 C++ Data Analysis Projects

  • GitHub repo cudf

    cuDF - GPU DataFrame Library

    Project mention: Dask – a flexible library for parallel computing in Python | news.ycombinator.com | 2021-11-17

    You can probably use https://github.com/rapidsai/cudf/tree/main/python/dask_cudf a dask wrapper around cuDF.

  • GitHub repo matplotplusplus

    Matplot++: A C++ Graphics Library for Data Visualization 📊🗾

    Project mention: How can I create animation of mathematical function that changes over time in c++ and save it as video | reddit.com/r/cpp_questions | 2021-12-27
  • Scout APM

    Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.

  • GitHub repo root

    The official repository for ROOT: analyzing, storing and visualizing big data, scientifically

    Project mention: Double Pendulum, written in Python and visualized with matplotlib (github code in comments) | reddit.com/r/Physics | 2022-01-16

    I actually just use matplotlib, except for CERN data, where I use CERNs python front-end of their own framework ROOT. It's just easier to keep things in the format than it is to convert them to the usual python data types (arrays, dataframes, etc).

  • GitHub repo datatable

    A Python package for manipulating 2-dimensional tabular data structures

    Project mention: Scikit-Learn Version 1.0 | news.ycombinator.com | 2021-09-14

    > For me I had with pandas the most issues using it's multiindex.

    Yessss. I loathe indices, and have never been in a situation where I was better off with them than without them.

    > Regarding fast you have something like Vaex on python sid

    I've never used Vaex, but I've used datatable (https://github.com/h2oai/datatable) and polars (https://github.com/pola-rs/polars). Polars is my favorite API, but datatable was faster at reading data (Polars was faster in execution). I'll have to give Vaex a try at some point.

  • GitHub repo DataFrame

    C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types and contiguous memory storage

    Project mention: DataFrame: NEW Data - star count:1313.0 | reddit.com/r/algoprojects | 2021-12-25
  • GitHub repo TileDB

    The Universal Storage Engine

    Project mention: TileDB VS Activeloop hub - a user suggested alternative | libhunt.com/r/TileDB | 2021-10-20
  • GitHub repo volbx

    Graphical tool for data manipulation written in C++/Qt

  • OPS

    OPS - Build and Run Open Source Unikernels. Quickly and easily build and deploy open source unikernels in tens of seconds. Deploy in any language to any cloud.

  • GitHub repo Graphia

    A visualisation tool for the creation and analysis of graphs

    Project mention: Graphviz: Open-source graph visualization software | news.ycombinator.com | 2022-01-17

    A fast 3D alternative for visualizing large graphs is Graphia: https://github.com/graphia-app/graphia https://graphia.app However, it's currently suffering from the Qt switch from 5 to 6.

    Regarding Graphviz itself, I wonder why is there no special layout logic for planar graphs? They can be recognized and embedded on the plane in linear time without intersecting edges, so it would be very nice if some of the Graphviz tools actually did that.

    A recent set of minimal graph coloring Graphviz visualizations of mine: https://gitlab.com/nsajko/example_optimally_colored_graphs

  • GitHub repo nebula

    A distributed block-based data storage and compute engine (by varchar-io)

    Project mention: Streaming multi-file SQL and CSV/TSV/etc., native/WASM and fastest CSV parser | news.ycombinator.com | 2022-01-14

    cool - I also hand crafted a CSV parser following RFC4180 a while ago, not sure if you have a repeatable way to benchmark the performance difference?


  • GitHub repo vinum

    Vinum is a SQL processor for Python, designed for data analysis workflows and in-memory analytics.

    Project mention: Practical SQL for Data Analysis(what you can do without Pandas) | news.ycombinator.com | 2021-05-03

    Following similar observations I was wondering if one can actually execute SQL queries inside of Python process with the access to native Python functions and Numpy as UDFs. Thanks to Apache Arrow one can mix C++ and Python operators without need to copy the data and essentially combine DataFrame API with SQL, all while within the confines of the same Python process.


    Vinum allows users to write queries which may invoke any Numpy or Python functions as UDFs available to the interpreter.

  • GitHub repo vif

    Easy, robust, and fast numerics in C++. (by cschreib)

    Project mention: Hardcore metaprogramming in the wild | reddit.com/r/cpp | 2021-10-26

    I wrote a C++11 n-dimensional array library for data analysis during my PhD https://github.com/cschreib/vif

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2022-01-17.

C++ Data Analysis related posts


What are some of the best open-source Data Analysis projects in C++? This list will help you:

Project Stars
1 cudf 4,423
2 matplotplusplus 2,474
3 root 1,640
4 datatable 1,435
5 DataFrame 1,350
6 TileDB 1,259
7 volbx 205
8 Graphia 126
9 nebula 97
10 vinum 53
11 vif 7
Find remote jobs at our new job board 99remotejobs.com. There are 29 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
Deliver Cleaner and Safer Code - Right in Your IDE of Choice!
SonarLint is a free and open source IDE extension that identifies and catches bugs and vulnerabilities as you code, directly in the IDE. Install from your favorite IDE marketplace today.