Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR. Learn more →
Top 15 C++ Inference Projects
-
Project mention: Building a personal, private AI computer on a budget | news.ycombinator.com | 2025-02-11
A great thread with the type of info your looking for lives here: https://github.com/ggerganov/whisper.cpp/issues/89
But you can likely find similar threads for the llama.cpp benchmark here: https://github.com/ggerganov/llama.cpp/tree/master/examples/...
These are good examples because the llama.cpp and whisper.cpp benchmarks take full advantage of the Apple hardware but also take full advantage of non-Apple hardware with GPU support, AVX support etc.
It’s been true for a while now that the memory bandwidth of modern Apple systems in tandem with the neural cores and gpu has made them very competitive Nvidia for local inference and even training.
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
-
Project mention: Integrating MediaPipe with DeepSeek for Enhanced AI Performance | dev.to | 2025-02-03
Code Examples: Check out the MediaPipe and LLM Integration Examples provided by Google AI Edge.
-
ncnn
ncnn is a high-performance neural network inference framework optimized for the mobile platform
-
TensorRT
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Extensions: Jan supports extensions like TensortRT and Inference Nitro for customizing and enhancing your AI models.
-
jetson-inference
Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.
-
Project mention: Court is in session: Top 10 most notorious C and C++ errors in 2024 | dev.to | 2024-12-28
V766 An item with the same key '"SoftPlus"' has already been added. cpu_types.cpp 198
-
TNN
TNN: developed by Tencent Youtu Lab and Guangying Lab, a uniform deep learning inference framework for mobile、desktop and server. TNN is distinguished by several outstanding features, including its cross-platform capability, high performance, model compression and code pruning. Based on ncnn and Rapidnet, TNN further strengthens the support and performance optimization for mobile devices, and also draws on the advantages of good extensibility and high performance from existed open source efforts
-
InfluxDB
InfluxDB high-performance time series database. Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems.
-
Thanks for the added context on the builds! As "foreign" BW player and fellow speech processing researcher, I agree shallow contextual biasing should help. While not difficult to implement, most generally available ASR solutions don't make it easy to use. There's a PR in ctranslate2 implementing the same feature so that it could be exposed in faster-whisper: https://github.com/OpenNMT/CTranslate2/pull/1789
-
-
-
-
-
dlstreamer
This repository is a home to Intel® Deep Learning Streamer (Intel® DL Streamer) Pipeline Framework. Pipeline Framework is a streaming media analytics framework, based on GStreamer* multimedia framework, for creating complex media analytics pipelines.
-
Project mention: Zero-Shot Text Classification on a low-end CPU-only machine? | news.ycombinator.com | 2024-10-07
Hah, it actually gets worse. What I was describing was the Triton ONNX backend with the OpenVINO execution accelerator[0] (not the OpenVINO backend itself). Clear as mud, right?
Your issue here is model performance with the additional challenge of offering it over a network socket across multiple requests and doing so in a performant manner.
Triton does things like dynamic batching[1] where throughput is increased significantly by aggregating disparate requests into one pass through the GPU.
A docker container for torch, ONNX, OpenVINO, etc isn't even natively going to offer a network socket. This is where people try to do things like rolling their own FastAPI API implementation (or something) only to discover it completely falls apart at any kind of load. That's development effort as well but it's a waste of time.
[0] - https://github.com/triton-inference-server/onnxruntime_backe...
[1] - https://docs.nvidia.com/deeplearning/triton-inference-server...
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
C++ Inference discussion
C++ Inference related posts
-
Whisper.cpp: Looking for Maintainers
-
Court is in session: Top 10 most notorious C and C++ errors in 2024
-
OpenMP 6.0
-
OpenVINO's AI Success: Brilliance or Cracks Beneath the Surface?
-
12 moments of typos and copy-paste, or why AI hallucinates: checking OpenVINO
-
Intel releases OpenVINO 2024.2 with broader LLM and quantization support
-
Show HN: I ported Suno AI's Bark model in C for fast realistic audio generation
-
A note from our sponsor - CodeRabbit
coderabbit.ai | 24 Apr 2025
Index
What are some of the best open-source Inference projects in C++? This list will help you:
# | Project | Stars |
---|---|---|
1 | whisper.cpp | 39,345 |
2 | mediapipe | 29,432 |
3 | ncnn | 21,332 |
4 | TensorRT | 11,478 |
5 | jetson-inference | 8,241 |
6 | openvino | 8,129 |
7 | TNN | 4,499 |
8 | CTranslate2 | 3,751 |
9 | lightseq | 3,244 |
10 | bark.cpp | 802 |
11 | cppflow | 799 |
12 | tensorrt-cpp-api | 693 |
13 | dlstreamer | 548 |
14 | onnxruntime_backend | 141 |
15 | EasyOCR-cpp | 55 |