Top 13 C++ Inference Projects

mediapipe

49 25,405 9.9 C++

Cross-platform, customizable ML solutions for live and streaming media.

Project mention: Mediapipe openpose Controlnet model for SD | /r/localdiffusion | 2023-11-15

mediapipe/docs/solutions/pose.md at master · google/mediapipe · GitHub

ncnn

12 19,176 9.4 C++

ncnn is a high-performance neural network inference framework optimized for the mobile platform

Project mention: AMD Funded a Drop-In CUDA Implementation Built on ROCm: It's Open-Source | news.ycombinator.com | 2024-02-12

ncnn uses Vulkan for GPU acceleration, I've seen it used in a few projects to get AMD hardware support.
https://github.com/Tencent/ncnn

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
TensorRT

22 9,065 5.0 C++

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

Project mention: AMD MI300X 30% higher performance than Nvidia H100, even with optimized stack | news.ycombinator.com | 2023-12-17

> It's not rocket science to implement matrix multiplication in any GPU.
You're right, it's harder. Saying this as someone who's done more work on the former than the latter. (I have, with a team, built a rocket engine. And not your school or backyard project size, but nozzle bigger than your face kind. I've also written CUDA kernels and boy is there a big learning curve to the latter that you gotta fundamentally rethink how you view a problem. It's unquestionable why CUDA devs are paid so much. Really it's only questionable why they aren't paid more)
I know it is easy to think this problem is easy, it really looks that way. But there's an incredible amount of optimization that goes into all of this and that's what's really hard. You aren't going to get away with just N for loops for a tensor rank N. You got to chop the data up, be intelligent about it, manage memory, how you load memory, handle many data types, take into consideration different results for different FMA operations, and a whole lot more. There's a whole lot of non-obvious things that result in high optimization (maybe obvious __after__ the fact, but that's not truthfully "obvious"). The thing is, the space is so well researched and implemented that you can't get away with naive implementations, you have to be on the bleeding edge.
Then you have to do that and make it reasonably usable for the programmer too, abstracting away all of that. Cuda also has a huge head start and momentum is not a force to be reckoned with (pun intended).
Look at TensorRT[0]. The software isn't even complete and it still isn't going to cover all neural networks on all GPUs. I've had stuff work on a V100 and H100 but not an A100, then later get fixed. They even have the "Apple Advantage" in that they have control of the hardware. I'm not certain AMD will have the same advantage. We talk a lot about the difficulties of being first mover, but I think we can also recognize that momentum is an advantage of being first mover. And it isn't one to scoff at.
[0] https://github.com/NVIDIA/TensorRT

jetson-inference

11 7,323 8.5 C++

Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.
openvino

17 5,911 10.0 C++

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference

Project mention: FLaNK Stack 05 Feb 2024 | dev.to | 2024-02-05

TNN

1 4,281 2.5 C++

TNN: developed by Tencent Youtu Lab and Guangying Lab, a uniform deep learning inference framework for mobile、desktop and server. TNN is distinguished by several outstanding features, including its cross-platform capability, high performance, model compression and code pruning. Based on ncnn and Rapidnet, TNN further strengthens the support and performance optimization for mobile devices, and also draws on the advantages of good extensibility and high performance from existed open source efforts
lightseq

1 3,088 3.7 C++

LightSeq: A High Performance Library for Sequence Processing and Generation
WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
CTranslate2

13 2,776 9.0 C++

Fast inference engine for Transformer models

Project mention: Distil-Whisper: distilled version of Whisper that is 6 times faster, 49% smaller | news.ycombinator.com | 2023-10-31

Just a point of clarification - faster-whisper references it but ctranslate2[0] is what's really doing the magic here.
Ctranslate2 is a sleeper powerhouse project that enables a lot. They should be up front and center and get the credit they deserve.
[0] - https://github.com/OpenNMT/CTranslate2

cppflow

9 759 0.0 C++

Run TensorFlow models in C++ without installation and without Bazel
bark.cpp

4 513 8.1 C++

Port of Suno AI's Bark in C/C++ for fast inference

Project mention: Show HN: I ported Suno AI's Bark model in C for fast realistic audio generation | news.ycombinator.com | 2024-04-24

dlstreamer

2 498 3.3 C++

This repository is a home to Intel® Deep Learning Streamer (Intel® DL Streamer) Pipeline Framework. Pipeline Framework is a streaming media analytics framework, based on GStreamer* multimedia framework, for creating complex media analytics pipelines.
tensorrt-cpp-api

2 462 7.5 C++

TensorRT C++ API Tutorial

Project mention: YOLOv7 with TensorRT on Jetson Nano for Object detection | /r/computervision | 2023-05-07

Take a look at my tutorial project, it works with models like Yolo and supports dynamic batch sizes. You just need to decode the output: https://github.com/cyrusbehr/tensorrt-cpp-api

EasyOCR-cpp

1 26 7.3 C++

Custom C++ implementation of deep learning based OCR

Project mention: [P] EasyOCR in C++! | /r/MachineLearning | 2023-12-02

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

C++ Inference related posts

Show HN: I ported Suno AI's Bark model in C for fast realistic audio generation
1 project | news.ycombinator.com | 24 Apr 2024
Bark.cpp: Port of Suno AI's Bark in C/C++ for fast inference
1 project | news.ycombinator.com | 19 Apr 2024
AMD MI300X 30% higher performance than Nvidia H100, even with optimized stack
1 project | news.ycombinator.com | 17 Dec 2023
Getting SDXL-turbo running with tensorRT
1 project | /r/StableDiffusion | 6 Dec 2023
QUIK is a method for quantizing LLM post-training weights to 4 bit precision
2 projects | news.ycombinator.com | 6 Nov 2023
Intel OpenVINO 2023.1.0 released
1 project | /r/intel | 20 Sep 2023
Intel OpenVINO 2023.1.0 released, open-source toolkit for optimizing and deploying AI inference
1 project | /r/opensource | 20 Sep 2023
A note from our sponsor - WorkOS
workos.com | 26 Apr 2024

The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →

Index

What are some of the best open-source Inference projects in C++? This list will help you:

	Project	Stars
1	mediapipe	25,405
2	ncnn	19,176
3	TensorRT	9,065
4	jetson-inference	7,323
5	openvino	5,911
6	TNN	4,281
7	lightseq	3,088
8	CTranslate2	2,776
9	cppflow	759
10	bark.cpp	513
11	dlstreamer	498
12	tensorrt-cpp-api	462
13	EasyOCR-cpp	26