Sonar helps you commit clean C++ code every time. With over 550 unique rules to find C++ bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work. Learn more →
Top 17 C++ Data Science Projects
-
Yes sure, that is how OpenMP does; but on the other side: you seem to already do some basic type inference, and building an AST, no? Then you know as well the size and type of your vectors, and can execute actions in parallel if there is enough data to be worth parallelizing. Is there anyone who don't want their code to execute faster if it is possible? Those that do work in big data domain do use threads and vectorized instructions without user having to type in any directive; just import different library. Example, numpy or numpy with cuda backend, or similar GPU accelerated libraries like cudf.
-
Project mention: Best Library to Visualize Mathematical Concepts | reddit.com/r/cpp_questions | 2023-03-02
The best way to visualize most mathematical concepts is by plotting a 2D graph. To do that you can use e.g. Matplot++
-
Sonar
Write Clean C++ Code. Always.. Sonar helps you commit clean C++ code every time. With over 550 unique rules to find C++ bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.
-
The function is trying to get the median, which is not defined for an empty set. With this particular implementation, there is an assert for that:
https://github.com/shogun-toolbox/shogun/blob/9b8d85/src/sho...
Unrelatedly, but from the same section:
> Fixes are trivial, access the nth element only after the call being made. Be careful.
Wouldn't the proper fix to do the nth_element for the larget element first (for those cases that don't do that already) and then adjust the end to be the begin + larger_n for the second nth_element call? Otherwise the second call will check [begin + larger_n, end) again for no reason at all.
-
DataFrame
C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types and contiguous memory storage
-
TileDB, Inc. | Full-Time | REMOTE | USA | Greece | https://tiledb.com
TileDB transforms the lives of analytics professionals and data scientists with a universal database, allowing them to access, analyze, and share any data with any tool at global scale. TileDB unifies the way we think about data, delivering superior performance and foundational data management capabilities. All data — tables, genomics, images, videos, location, time-series — across multiple domains is captured as multi-dimensional arrays. TileDB offers extreme interoperability via numerous APIs and tool integrations across the data science ecosystem, eliminating the hassles and inefficiencies of data conversion. TileDB Cloud implements a totally serverless infrastructure and delivers access control, easier data and code sharing and distributed computing at global scale, eliminating cluster management, minimizing TCO and promoting scientific collaboration and reproducibility.
TileDB, Inc. was spun out of MIT and Intel Labs in May 2017 and is backed by Two Bear Capital, Nexus Venture Partners, Uncorrelated Ventures, Intel Capital and Big Pi.
Recent HN article: https://news.ycombinator.com/item?id=23896131
Website: https://tiledb.com
GitHub: https://github.com/TileDB-Inc/TileDB
Docs: https://docs.tiledb.com
Blog: https://tiledb.com/blog
Our headquarters are located in Cambridge, MA and we have a subsidiary in Athens, Greece. We offer the ability to work remotely. If you are located outside of the USA and Greece we have options to accommodate this, don't hesitate to apply!
We have several open positions aimed at increasing TileDB’s feature set, growth and adoption. You will have the opportunity to work on innovative technology that creates impact on challenging and exciting problems in Genomics, Geospatial, Time Series, and more. Immediate features on the roadmap for TileDB Cloud include, advanced distributed computations, advanced computation pushdown, improved multi-cloud deployments and more.
We are actively seeking:
- Senior Golang Engineer
- Senior Python Engineer
- Site Reliability Engineer
- React Frontend Engineer
Apply today at https://tiledb.workable.com !
-
-
turbodbc
Turbodbc is a Python module to access relational databases via the Open Database Connectivity (ODBC) interface. The module complies with the Python Database API Specification 2.0.
It supports reading from and writing to ODBC compliant databases at likely similar performance as turbodbc and it does not require conda to install.
-
InfluxDB
Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.
-
Project mention: Is there a no-compromise (presumably C/C++) platform similar to Apache Spark? | reddit.com/r/dataengineering | 2022-07-27
-
-
-
-
Matrix
C++ Matrix -- High performance and accurate (e.g. edge cases) matrix math library with expression template arithmetic operators (by hosseinmoein)
Project mention: Update on C++ Algo Trading/ Data Analysis tool | reddit.com/r/algotrading | 2023-02-05Yes, I have. As matter of fact I have another open source (https://github.com/hosseinmoein/Matrix) that uses this technique.
-
TileDB-VCF
Efficient variant-call data storage and retrieval library using the TileDB storage library.
Project mention: Has anyone stored/queried VCFs and their variant records in a relational database? | reddit.com/r/bioinformatics | 2022-11-12Perhaps of interest https://github.com/TileDB-Inc/TileDB-VCF
-
-
Desbordante
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.
Project mention: Desbordante – an open-source data profiling tool | news.ycombinator.com | 2023-02-20 -
-
Project mention: Invata cum functioneaza Chat GPT si retelele neuronale | reddit.com/r/programare | 2023-02-06
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
C++ Data Science related posts
- Update on C++ Algo Trading/ Data Analysis tool
- Update on C++ DataFrame project
- Revitalize C++ as a machine learning front end
- MLPP: Revitalize C++ as a machine learning front end
- [OC] I've made an analysis tool for cryptocurrency source code. Here are the results for Bitcoin, Ethereum, Solana, EOS.
- MLOps on AWS
-
TileDB VS Activeloop hub - a user suggested alternative
2 projects | 20 Oct 2021
-
A note from our sponsor - Sonar
www.sonarsource.com | 21 Mar 2023
Index
What are some of the best open-source Data Science projects in C++? This list will help you:
Project | Stars | |
---|---|---|
1 | cudf | 5,386 |
2 | matplotplusplus | 3,165 |
3 | SHOGUN | 2,921 |
4 | DataFrame | 1,786 |
5 | TileDB | 1,475 |
6 | MLPP | 1,032 |
7 | turbodbc | 563 |
8 | oneDAL | 545 |
9 | GPBoost | 391 |
10 | Graphia | 168 |
11 | secure-xgboost | 93 |
12 | Matrix | 77 |
13 | TileDB-VCF | 62 |
14 | nelson | 60 |
15 | Desbordante | 39 |
16 | twinning | 23 |
17 | MachineLearning | 4 |