C++ Machine Learning

Open-source C++ projects categorized as Machine Learning | Edit details

Top 23 C++ Machine Learning Projects

  • GitHub repo tensorflow

    An Open Source Machine Learning Framework for Everyone

    Project mention: Need help with tensorflow library installed in mac m1 | reddit.com/r/MacOS | 2021-10-17

    I think this subreddit is probably the last place you'll find decent support for Tensorflow. I recommend opening up an issue on Github.

  • GitHub repo Pytorch

    Tensors and Dynamic neural networks in Python with strong GPU acceleration

    Project mention: Making complex neural network models | reddit.com/r/learnpython | 2021-10-02

    I don't know what you mean by "ensure connections don't cascade", but there are several libraries that implement various types of neural networks. Look into PyTorch. I used it for a project once and found it intuitive to use.

  • Scout APM

    Scout APM: A developer's best friend. Try free for 14-days. Scout APM uses tracing logic that ties bottlenecks to source code so you know the exact line of code causing performance issues and can get back to building a great product faster.

  • GitHub repo tesseract-ocr

    Tesseract Open Source OCR Engine (main repository)

    Project mention: Trying to get a deeper understanding of PDFs | reddit.com/r/pdf | 2021-09-27

    In this case I would go the low effort route: Use tesseract (https://github.com/tesseract-ocr/tesseract) to do the OCR part and "pdftotext" from the poppler utils to convert all PDFs to text. The quality should be fine. Works on Linux and most probably also natively on Windows.

  • GitHub repo Caffe

    Caffe: a fast open framework for deep learning.

    Project mention: Una corta intro a las Redes Neuronales Artificiales | dev.to | 2021-09-22

    Caffe de BAIR

  • GitHub repo openpose

    OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation

    Project mention: Help finding an appropriate model for human pose estimation | reddit.com/r/computervision | 2021-09-29

    Openpose: This is supposedly realtime (I assume on a gpu, 24fps?) and they provide training code

  • GitHub repo xgboost

    Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

  • GitHub repo mxnet

    Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

    Project mention: just released my Clojure AI book | reddit.com/r/Clojure | 2021-05-23

    Clojure and Python also have bindings to the Apache MXNet library. Is there a reason why you didn't use them in some of your projects?

  • Nanos

    Run Linux Software Faster and Safer than Linux with Unikernels.

  • GitHub repo DeepSpeech

    DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

    Project mention: Offline speech to text software | reddit.com/r/AskTechnology | 2021-10-16
  • GitHub repo CNTK

    Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit (by microsoft)

  • GitHub repo mediapipe

    Cross-platform, customizable ML solutions for live and streaming media.

    Project mention: Show HN: YoHa – A practical hand tracking engine | news.ycombinator.com | 2021-10-11

    This architecture was also used in the link referenced when bringing up alternative implementations:


  • GitHub repo LightGBM

    A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

    Project mention: Is it possible to clean memory after using a package that has a memory leak in my python script? | reddit.com/r/Python | 2021-04-29

    I'm working on the AutoML python package (Github repo). In my package, I'm using many different algorithms. One of the algorithms is LightGBM. The algorithm after the training doesn't release the memory, even if del is called and gc.collect() after. I created the issue on LightGBM GitHub -> link. Because of this leak, memory consumption is growing very fast during algorithm training.

  • GitHub repo Dlib

    A toolkit for making real world machine learning and data analysis applications in C++

    Project mention: Building an object detector for a small dataset with a single class | reddit.com/r/computervision | 2021-10-05

    Here's the example code for using it from python: https://github.com/davisking/dlib/blob/master/python_examples/cnn_face_detector.py

  • GitHub repo vowpal_wabbit

    Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

    Project mention: [Table] We are Microsoft researchers working on machine learning and reinforcement learning. Ask Dr. John Langford and Dr. Akshay Krishnamurthy anything about contextual bandits, RL agents, RL algorithms, Real-World RL, and more! | reddit.com/r/tabled | 2021-07-03

    Questions Answers AFAIK most model-based reinforcement learning algorithms are more data efficient than model-free (that don't create an explicit model of the environment). However, all the model-based techniques I've seen eventually "throw away" data and stop using it for model training. Could we do better (lower sample complexity) if we didn't throw away old data? I imagine an algorithm that keeps track of all past observations as "paths" through perception space, and can use something akin to nearest neighbor to identify when it is seeing a similar "path" again in the future. I.e., what if the model learned a compression from perception space into a lower dimension representation (like the first 10 principle components), could we then record all data and make predictions about future states with nearest neighbor? This method would benefit from "immediate learning". Does this direction sound promising? Definitely. This is highly related to the latent space discovery research direction of which we've had several recent papers at ICLR, NeurIPs, ICML. There are several challenging elements here. You need to learn nonlinear maps, you need to use partial learning to gather information for more learning, and it all needs to be scalable. -John Hello, do you have any events in New York? I've been teaching myself for the last couple years on ML and AI theory and practice but would love accelerate my learning by working on stuff (could be for free). I have 7 years of professional programming experience and work as a lead for a large financial company. Well, we have "Reinforcement Learning day" each year. I'm really looking forward to the pandemic being over because we have a beautiful new office at 300 Lafayette---more might start happening when we can open up. -John RL seems to more strategy oriented/original than the papers I observe in other areas of ML and Deep Learning, which seems to be more about adding layers upon layers to get slightly better metrics. What is your opinion about it ? Secondly I would love to know the role RL in real world applications. By strategy I guess you mean "algorithmic." I think both areas are fairly algorithmic nature. There have been some very cool computational advancements involved in getting certain architectures (like transformers) to scale and similarly there are many algorithmic advancements in domain adaptation, robustness, etc. RL is definitely fairly algorithmically focused, which I like =) ​ RL problems are kind of ubiquitous, since optimizing for some value is a basic primitive. The question is whether ""standard RL"" methods should be used to solve these problems or not. I think this requires some trial-and-error and, at least with current capabilities, some deeper understand of the specific problem you are interested in. -Akshay Dr. Langford & Dr. Krishnamurthy, Thank you for this AMA. My question: From what I understand about RL, there are trade offs one must consider between computational complexity and sample efficiency for given RL algorithms. What do you both prioritize when developing your algorithms? I tend to think first about statistical/sample efficiency. The basic observation is that computational complexity is gated by sample complexity because minimally you have to read in all of your samples. Additionally, understanding what is possible statistically seems quite a bit easier than understanding this computationally (e.g., computational lower bounds are much harder to prove that statistical ones). Obviously both are important, but you can't have a computationally efficient algorithm that requires exponentially many samples to achieve near-optimality, while you can have the converse (statistically efficient algorithm that requires exponential time to achieve near-optimality). This suggests you should go after the statistics first. -Akshay Can you share some real examples of how your work has made its way into MS products? Is this a requirement for any work that happens at MSR or is it more like an independent entity and is not always required to tie back into something within Microsoft? A simple answer is that Vowpal Wabbit (http://vowpalwabbit.org ) is used by the personalizer service (http://aka.ms/personalizer ). Many individual research projects have impacted Microsoft in various ways as well. However, many research projects have not. In general, Microsoft Research exists to explore possibilities. Inherent in the exploration of possibilities is the discovery that many possibilities do not work. - John What are some of the obstacles getting in the way of wide-spread applications of online and offline RL learning for real-world scenarios, and what research avenues look promising to you that could chip away at, or sidestep, the obstacles? I suppose there are many obstacles and the most notable one is that we don't have sample efficient algorithms that can operate at scale. There are other issues like safety, stability, etc., that will matter depending on the application. The community is working on all of these issues, but in the meantime, I like all of the side-stepping ideas people are trying. Leveraging strong inductive bias (via pre-trained representation or state abstraction or prior), sim-to-real, imitation learning. These all seem very worthwhile to pursue. I am in favor of the trying everything and seeing what sticks, because different problems might admit different structures, so it's important to have a suite of tools at our disposal. ​ On sample efficiency, I like the model based approach as it has many advantages (obvious supervision signal, offline planning, zero-shot transfer to a new reward function, etc.). So (a) fitting accurate dynamics models, (b) efficient planning in such models, and (c) using them to explore, all seem like good questions to study. We have some recent work on this approach (https://arxiv.org/abs/2006.10814) -Akshay Hi! Thanks for doing this AMA. What is the status of Real World RL? What are the practical areas that RL is being applied to in the real world right now? There are certainly many deployments of real world RL. This blog post: https://blogs.microsoft.com/ai/reinforcement-learning/ covers a number related to work at Microsoft. In terms of where we are, I'd say "at the beginning". There are many applications that haven't even been tried, a few that have, and lots of room for improvement. -John With the xbox series X having hardware for machine learning, what kind of applications of this apply to gaming? An immediate answer is to use RL to control non-player-characters. -Akshay How can I prepare in order to be part of Microsoft Researcher in Reinforcement Learning? This depends on the role you are interested in. We try to post new reqs here (http://aka.ms/rl_hiring ) and have hired in researcher, engineer, and applied/data scientist roles. For a researcher role, a phd is typically required. The other roles each have their own reqs. -John What is latent state discovery and why do you think it is important in real world RL ? Latent state discovery is an approach for getting reinforcement learning to provably scale to complex domains. The basic idea is to decouple of the dynamics which are determined by a simple latent state space, from an observation process, which could be arbitrarily complex. The natural example is a visual navigation task: there are far fewer locations in the world, than visual inputs you might see at those locations. The ""discovery"" aspect is that we don't want to know this latent state space in advance, so we need to learn how to map observations to latent states if we want to plan and explore. Essentially this is a latent dynamics modeling approach, where we use the latent state to drive exploration (such ideas are also gaining favor in the Deep-RL literature). ​ The latent state approach has enabled us to develop essentially the only provably efficient exploration methods for such complex environments (using arbitrary nonlinear function approximation). In this sense, it seems like a promising approach for real world settings where exploration is essential. -Akshay

  • GitHub repo MNN

    MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba

    Project mention: Newbie having error code of cannot build selected target abi x86 no suitable splits configured | reddit.com/r/AndroidStudio | 2021-04-12

    I found a solution on GitHub check your app's build.gradle, defaultConfig section - you need to add x86 to your ndk abiFilters ndk.abiFilters 'armeabi-v7a','arm64-v8a', 'x86' GitHub Hope it will help. You have to find that file and edit it as given here

  • GitHub repo Open3D

    Open3D: A Modern Library for 3D Data Processing

    Project mention: 3D Reconstruction of Indoor Environments using SLAM and deep learning on RGB-D Data. | reddit.com/r/computervision | 2021-10-08

    Open3D v0.13.0 http://www.open3d.org/

  • GitHub repo onnxruntime

    ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

    Project mention: Export and run other machine learning models | dev.to | 2021-10-14

    txtai primarily has support for Hugging Face Transformers and ONNX models. This enables txtai to hook into the rich model framework available in Python, export this functionality via the API to other languages (JavaScript, Java, Go, Rust) and even export and natively load models with ONNX.

  • GitHub repo tiny-cnn

    header only, dependency-free deep learning framework in C++14

  • GitHub repo serving

    A flexible, high-performance serving system for machine learning models

    Project mention: Running concurrent inference processes in Flask or should I use FastAPI? | reddit.com/r/flask | 2021-03-29

    Don't roll this yourself. Look at Tensorflow Serving: https://github.com/tensorflow/serving.

  • GitHub repo jetson-inference

    Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.

    Project mention: Jetson Nano | reddit.com/r/JetsonNano | 2021-07-18

    Jetson-Inference is another amazing resource to get started on. This will allow you to try out a number of neural networks (classification, detection, and segmentation) all with your own data or with sample images included in the repo.

  • GitHub repo interpret

    Fit interpretable models. Explain blackbox machine learning.

    Project mention: [N] Google confirms DeepMind Health Streams project has been killed off | reddit.com/r/MachineLearning | 2021-09-01

    Microsoft Explainable Boosting Machine (which is a Gaussian Additive Model and not a Gradient Boosted Trees 🙄 model) is a step in that direction https://github.com/interpretml/interpret

  • GitHub repo flashlight

    A C++ standalone library for machine learning (by flashlight)

    Project mention: Mozilla Common Voice Adds 16 New Languages and 4,600 New Hours of Speech | news.ycombinator.com | 2021-08-05

    I've had good results with https://github.com/flashlight/flashlight/blob/master/flashli.... Seems to work well with spoken english in a variety of accents. Biggest limitation is that the architecture they have pretrained models for doesn't really work well with clips longer than ~15 seconds, so you have to segment your input files.

  • GitHub repo mlpack

    mlpack: a scalable C++ machine learning library --

    Project mention: Top 10 Python Libraries for Machine Learning | dev.to | 2021-09-09

    Github Repository: https://github.com/mlpack/mlpack Developed By: Community, supported by Georgia Institute of technology Primary purpose: Multiple ML Models and Algorithms

  • GitHub repo SHOGUN


NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2021-10-17.


What are some of the best open-source Machine Learning projects in C++? This list will help you:

Project Stars
1 tensorflow 159,776
2 Pytorch 51,334
3 tesseract-ocr 42,155
4 Caffe 31,993
5 openpose 22,240
6 xgboost 21,694
7 mxnet 19,688
8 DeepSpeech 18,273
9 CNTK 17,107
10 mediapipe 14,250
11 LightGBM 13,052
12 Dlib 10,630
13 vowpal_wabbit 7,736
14 MNN 6,130
15 Open3D 5,555
16 onnxruntime 5,514
17 tiny-cnn 5,432
18 serving 5,215
19 jetson-inference 4,976
20 interpret 4,127
21 flashlight 3,958
22 mlpack 3,830
23 SHOGUN 2,854
Find remote jobs at our new job board 99remotejobs.com. There are 34 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives