C++ Data Analysis

Open-source C++ projects categorized as Data Analysis

Top 15 C++ Data Analysis Projects

  • cudf

    cuDF - GPU DataFrame Library

    Project mention: Introducing TeaScript C++ Library | reddit.com/r/cpp | 2023-02-16

    Yes sure, that is how OpenMP does; but on the other side: you seem to already do some basic type inference, and building an AST, no? Then you know as well the size and type of your vectors, and can execute actions in parallel if there is enough data to be worth parallelizing. Is there anyone who don't want their code to execute faster if it is possible? Those that do work in big data domain do use threads and vectorized instructions without user having to type in any directive; just import different library. Example, numpy or numpy with cuda backend, or similar GPU accelerated libraries like cudf.

  • matplotplusplus

    Matplot++: A C++ Graphics Library for Data Visualization 📊🗾

    Project mention: Best Library to Visualize Mathematical Concepts | reddit.com/r/cpp_questions | 2023-03-02

    The best way to visualize most mathematical concepts is by plotting a 2D graph. To do that you can use e.g. Matplot++

  • InfluxDB

    Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.

  • root

    The official repository for ROOT: analyzing, storing and visualizing big data, scientifically

    Project mention: Root: Analyzing Petabytes of Scientific Data | news.ycombinator.com | 2023-02-01
  • DataFrame

    C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types and contiguous memory storage

    Project mention: DataFrame: NEW Data - star count:1772.0 | reddit.com/r/algoprojects | 2023-03-11
  • datatable

    A Python package for manipulating 2-dimensional tabular data structures

    Project mention: Any advice on using Pandas as a data analyst? | reddit.com/r/datascience | 2023-01-17
  • TileDB

    The Universal Storage Engine

    Project mention: Ask HN: Who is hiring? (December 2022) | news.ycombinator.com | 2022-12-01

    TileDB, Inc. | Full-Time | REMOTE | USA | Greece | https://tiledb.com

    TileDB transforms the lives of analytics professionals and data scientists with a universal database, allowing them to access, analyze, and share any data with any tool at global scale. TileDB unifies the way we think about data, delivering superior performance and foundational data management capabilities. All data — tables, genomics, images, videos, location, time-series — across multiple domains is captured as multi-dimensional arrays. TileDB offers extreme interoperability via numerous APIs and tool integrations across the data science ecosystem, eliminating the hassles and inefficiencies of data conversion. TileDB Cloud implements a totally serverless infrastructure and delivers access control, easier data and code sharing and distributed computing at global scale, eliminating cluster management, minimizing TCO and promoting scientific collaboration and reproducibility.

    TileDB, Inc. was spun out of MIT and Intel Labs in May 2017 and is backed by Two Bear Capital, Nexus Venture Partners, Uncorrelated Ventures, Intel Capital and Big Pi.

    Recent HN article: https://news.ycombinator.com/item?id=23896131

    Website: https://tiledb.com

    GitHub: https://github.com/TileDB-Inc/TileDB

    Docs: https://docs.tiledb.com

    Blog: https://tiledb.com/blog

    Our headquarters are located in Cambridge, MA and we have a subsidiary in Athens, Greece. We offer the ability to work remotely. If you are located outside of the USA and Greece we have options to accommodate this, don't hesitate to apply!

    We have several open positions aimed at increasing TileDB’s feature set, growth and adoption. You will have the opportunity to work on innovative technology that creates impact on challenging and exciting problems in Genomics, Geospatial, Time Series, and more. Immediate features on the roadmap for TileDB Cloud include, advanced distributed computations, advanced computation pushdown, improved multi-cloud deployments and more.

    We are actively seeking:

    - Senior Golang Engineer

    - Senior Python Engineer

    - Site Reliability Engineer

    - React Frontend Engineer

    Apply today at https://tiledb.workable.com !

  • oneDAL

    oneAPI Data Analytics Library (oneDAL)

    Project mention: Is there a no-compromise (presumably C/C++) platform similar to Apache Spark? | reddit.com/r/dataengineering | 2022-07-27
  • Sonar

    Write Clean C++ Code. Always.. Sonar helps you commit clean C++ code every time. With over 550 unique rules to find C++ bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.

  • gdl

    GDL - GNU Data Language

    Project mention: GDL: GNU Data Language | reddit.com/r/programming | 2022-05-15
  • volbx

    Graphical tool for data manipulation written in C++/Qt

  • AlphaPlot

    :chart_with_upwards_trend: Application for statistical analysis and data visualization which can generate different types of publication quality 2D and 3D plots with extensive visual customization.

  • Graphia

    A visualisation tool for the creation and analysis of graphs

  • nebula

    A distributed block-based data storage and compute engine (by varchar-io)

    Project mention: Show HN: Turn any data into a fast analytical API | news.ycombinator.com | 2022-04-10

    we use our in-house baked engine - open sourced here https://github.com/varchar-io/nebula

    Yeah, Tinybird has lots of similarities, I will do more research on it, thanks for the reference.

  • vinum

    Vinum is a SQL processor for Python, designed for data analysis workflows and in-memory analytics.

  • vif

    Easy, robust, and fast numerics in C++. (by cschreib)

  • MachineLearning

    From linear regression towards neural networks... (by aromanro)

    Project mention: Invata cum functioneaza Chat GPT si retelele neuronale | reddit.com/r/programare | 2023-02-06
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-03-11.

C++ Data Analysis related posts


What are some of the best open-source Data Analysis projects in C++? This list will help you:

Project Stars
1 cudf 5,407
2 matplotplusplus 3,165
3 root 2,067
4 DataFrame 1,790
5 datatable 1,674
6 TileDB 1,481
7 oneDAL 549
8 gdl 235
9 volbx 226
10 AlphaPlot 173
11 Graphia 168
12 nebula 131
13 vinum 63
14 vif 9
15 MachineLearning 4
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives