Are you a developer or a data analyst? Share your thoughts about your coding tools in our short survey and get a chance to win prizes! Learn more →
Top 13 C++ Inference Projects
-
Project mention: Show HN: OWhisper – Ollama for realtime speech-to-text | news.ycombinator.com | 2025-08-14
Thank you for taking the time to build something and share it. However what is the advantage of using this over whisper.cpp stream that can also do real time conversion?
https://github.com/ggml-org/whisper.cpp/tree/master/examples...
-
JetBrains
Tell us how you use coding tools. You may win a prize! Are you a developer or a data analyst? Share your thoughts about your coding tools in our short survey and get a chance to win prizes!
-
Project mention: Google AI Edge – on-device cross-platform AI deployment | news.ycombinator.com | 2025-06-01
This isn't really true. They are different offerings.
CoreML is specific to the Apple ecosystem and lets you convert a PyTorch model to a CoreML .mlmodel that will run with acceleration on iOS/Mac.
Google Mediapipe is a giant C++ library for running ML flows on any device (iOS/Android/Web). It includes Tensorflow Lite (now LiteRT) but is also a graph processor that helps with common ML preprocessing tasks like image resizing, annotating, etc.
Google killing products early is a good meme but Mediapipe is open source so you can at least credit them with that. https://github.com/google-ai-edge/mediapipe
I used a fork of Mediapipe for a contract iOS/Android computer vision product and it was very complex but worked well. A cross-platform solution would not have been possible with CoreML.
-
ncnn
ncnn is a high-performance neural network inference framework optimized for the mobile platform
-
TensorRT
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Project mention: Generative AI Interview for Senior Data Scientists: 50 Key Questions and Answers | dev.to | 2025-05-06What is the purpose of using ONNX or TensorRT for deployment? When deploying a trained deep learning model into a real-world service environment for inference, optimization to increase execution speed and reduce resource consumption is crucial. ONNX and TensorRT are prominent tools and frameworks widely used for this purpose.
-
Project mention: Court is in session: Top 10 most notorious C and C++ errors in 2024 | dev.to | 2024-12-28
V766 An item with the same key '"SoftPlus"' has already been added. cpu_types.cpp 198
-
jetson-inference
Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.
-
TNN
TNN: developed by Tencent Youtu Lab and Guangying Lab, a uniform deep learning inference framework for mobile、desktop and server. TNN is distinguished by several outstanding features, including its cross-platform capability, high performance, model compression and code pruning. Based on ncnn and Rapidnet, TNN further strengthens the support and performance optimization for mobile devices, and also draws on the advantages of good extensibility and high performance from existed open source efforts
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
Thanks for the added context on the builds! As "foreign" BW player and fellow speech processing researcher, I agree shallow contextual biasing should help. While not difficult to implement, most generally available ASR solutions don't make it easy to use. There's a PR in ctranslate2 implementing the same feature so that it could be exposed in faster-whisper: https://github.com/OpenNMT/CTranslate2/pull/1789
-
-
-
-
Project mention: Zero-Shot Text Classification on a low-end CPU-only machine? | news.ycombinator.com | 2024-10-07
Hah, it actually gets worse. What I was describing was the Triton ONNX backend with the OpenVINO execution accelerator[0] (not the OpenVINO backend itself). Clear as mud, right?
Your issue here is model performance with the additional challenge of offering it over a network socket across multiple requests and doing so in a performant manner.
Triton does things like dynamic batching[1] where throughput is increased significantly by aggregating disparate requests into one pass through the GPU.
A docker container for torch, ONNX, OpenVINO, etc isn't even natively going to offer a network socket. This is where people try to do things like rolling their own FastAPI API implementation (or something) only to discover it completely falls apart at any kind of load. That's development effort as well but it's a waste of time.
[0] - https://github.com/triton-inference-server/onnxruntime_backe...
[1] - https://docs.nvidia.com/deeplearning/triton-inference-server...
-
C++ Inference discussion
C++ Inference related posts
-
Whispercpp – Local, Fast, and Private Audio Transcription for Ruby
-
Build Your Own Siri. Locally. On-Device. No Cloud
-
Whisper.cpp: Looking for Maintainers
-
Court is in session: Top 10 most notorious C and C++ errors in 2024
-
OpenMP 6.0
-
OpenVINO's AI Success: Brilliance or Cracks Beneath the Surface?
-
12 moments of typos and copy-paste, or why AI hallucinates: checking OpenVINO
-
A note from our sponsor - JetBrains
surveys.jetbrains.com | 1 Sep 2025
Index
What are some of the best open-source Inference projects in C++? This list will help you:
# | Project | Stars |
---|---|---|
1 | whisper.cpp | 42,817 |
2 | mediapipe | 31,139 |
3 | ncnn | 21,982 |
4 | TensorRT | 12,079 |
5 | openvino | 8,764 |
6 | jetson-inference | 8,472 |
7 | TNN | 4,572 |
8 | CTranslate2 | 3,992 |
9 | bark.cpp | 835 |
10 | cppflow | 802 |
11 | tensorrt-cpp-api | 757 |
12 | onnxruntime_backend | 159 |
13 | EasyOCR-cpp | 59 |