Top 23 Inference Open-Source Projects

ColossalAI

41 37,775 9.7 Python

Making large AI models cheaper, faster and more accessible

Project mention: Making large AI models cheaper, faster and more accessible | news.ycombinator.com | 2024-03-21
DeepSpeed

51 32,447 9.8 Python

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Project mention: Can we discuss MLOps, Deployment, Optimizations, and Speed? | /r/LocalLLaMA | 2023-12-06

DeepSpeed can handle parallelism concerns, and even offload data/model to RAM, or even NVMe (!?) . I'm surprised I don't see this project used more.
InfluxDB

www.influxdata.com
sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
whisper.cpp

187 29,540 9.8 C

Port of OpenAI's Whisper model in C/C++

Project mention: Show HN: I created automatic subtitling app to boost short videos | news.ycombinator.com | 2024-04-09

whisper.cpp [1] has a karaoke example that uses ffmpeg's drawtext filter to display rudimentary karaoke-like captions. It also supports diarisation. Perhaps it could be a starting point to create a better script that does what you need.
--
1: https://github.com/ggerganov/whisper.cpp/blob/master/README....
mediapipe

49 25,331 9.9 C++

Cross-platform, customizable ML solutions for live and streaming media.

Project mention: Mediapipe openpose Controlnet model for SD | /r/localdiffusion | 2023-11-15

mediapipe/docs/solutions/pose.md at master · google/mediapipe · GitHub
ncnn

12 19,125 9.4 C++

ncnn is a high-performance neural network inference framework optimized for the mobile platform

Project mention: AMD Funded a Drop-In CUDA Implementation Built on ROCm: It's Open-Source | news.ycombinator.com | 2024-02-12

ncnn uses Vulkan for GPU acceleration, I've seen it used in a few projects to get AMD hardware support.
https://github.com/Tencent/ncnn
vllm

30 17,656 9.9 Python

A high-throughput and memory-efficient inference and serving engine for LLMs

Project mention: Mistral AI Launches New 8x22B Moe Model | news.ycombinator.com | 2024-04-09

The easiest is to use vllm (https://github.com/vllm-project/vllm) to run it on a Couple of A100's, and you can benchmark this using this library (https://github.com/EleutherAI/lm-evaluation-harness)
ts-pattern

38 10,832 8.5 TypeScript

🎨 The exhaustive Pattern Matching library for TypeScript, with smart type inference.

Project mention: You Don't Need React | news.ycombinator.com | 2024-02-08

ts-pattern has been a decent band-aid for the lack of native pattern matching, but obviously has downsides that could be avoided if it was built into the language.
https://github.com/gvergnaud/ts-pattern
WorkOS

workos.com
sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
amazon-sagemaker-examples

17 9,477 9.3 Jupyter Notebook

Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.

Project mention: Thesis Project Help Using SageMaker Free Tier | /r/aws | 2023-09-23

I need to use AWS Sagemaker (required, can't use easier services) and my adviser gave me this document to start with: https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart-foundation-models/question_answering_retrieval_augmented_generation/question_answering_langchain_jumpstart.ipynb
TensorRT

22 9,031 5.0 C++

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

Project mention: AMD MI300X 30% higher performance than Nvidia H100, even with optimized stack | news.ycombinator.com | 2023-12-17

> It's not rocket science to implement matrix multiplication in any GPU.
You're right, it's harder. Saying this as someone who's done more work on the former than the latter. (I have, with a team, built a rocket engine. And not your school or backyard project size, but nozzle bigger than your face kind. I've also written CUDA kernels and boy is there a big learning curve to the latter that you gotta fundamentally rethink how you view a problem. It's unquestionable why CUDA devs are paid so much. Really it's only questionable why they aren't paid more)
I know it is easy to think this problem is easy, it really looks that way. But there's an incredible amount of optimization that goes into all of this and that's what's really hard. You aren't going to get away with just N for loops for a tensor rank N. You got to chop the data up, be intelligent about it, manage memory, how you load memory, handle many data types, take into consideration different results for different FMA operations, and a whole lot more. There's a whole lot of non-obvious things that result in high optimization (maybe obvious __after__ the fact, but that's not truthfully "obvious"). The thing is, the space is so well researched and implemented that you can't get away with naive implementations, you have to be on the bleeding edge.
Then you have to do that and make it reasonably usable for the programmer too, abstracting away all of that. Cuda also has a huge head start and momentum is not a force to be reckoned with (pun intended).
Look at TensorRT[0]. The software isn't even complete and it still isn't going to cover all neural networks on all GPUs. I've had stuff work on a V100 and H100 but not an A100, then later get fixed. They even have the "Apple Advantage" in that they have control of the hardware. I'm not certain AMD will have the same advantage. We talk a lot about the difficulties of being first mover, but I think we can also recognize that momentum is an advantage of being first mover. And it isn't one to scoff at.
[0] https://github.com/NVIDIA/TensorRT
faster-whisper

22 8,578 8.3 Python

Faster Whisper transcription with CTranslate2

Project mention: Using Groq to Build a Real-Time Language Translation App | dev.to | 2024-04-05

For our real-time STT needs, we'll employ a fantastic library called faster-whisper.
text-generation-inference

28 7,722 9.6 Python

Large Language Model Text Generation Inference

Project mention: Zephyr 141B, a Mixtral 8x22B fine-tune, is now available in Hugging Chat | news.ycombinator.com | 2024-04-12

I wanted to write that TGI inference engine is not Open Source anymore, but they have reverted the license back to Apache 2.0 for the new version TGI v2.0: https://github.com/huggingface/text-generation-inference/rel...
Good news!
jetson-inference

11 7,294 8.5 C++

Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.
server

24 7,277 9.5 Python

The Triton Inference Server provides an optimized cloud and edge inferencing solution. (by triton-inference-server)

Project mention: FLaNK Weekly 08 Jan 2024 | dev.to | 2024-01-08
io-ts

80 6,593 4.9 TypeScript

Runtime type system for IO decoding/encoding

Project mention: TDD | /r/CharruaDevs | 2023-12-07

Qué rico. Si tenés chance meté un proceso de code review fuerte, y para el tema de I/O probá a usar https://github.com/Effect-TS/schema ó https://github.com/gcanti/io-ts que les da una solución obvia al tema de "tipos para lo que devuelva el backend", aunque es en realidad mucho más capaz que eso.
openvino

17 5,818 10.0 C++

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference

Project mention: FLaNK Stack 05 Feb 2024 | dev.to | 2024-02-05
adversarial-robustness-toolbox

8 4,433 9.7 Python

Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
torch2trt

5 4,376 3.1 Python

An easy to use PyTorch to TensorRT converter
TNN

1 4,267 2.5 C++

TNN: developed by Tencent Youtu Lab and Guangying Lab, a uniform deep learning inference framework for mobile、desktop and server. TNN is distinguished by several outstanding features, including its cross-platform capability, high performance, model compression and code pruning. Based on ncnn and Rapidnet, TNN further strengthens the support and performance optimization for mobile devices, and also draws on the advantages of good extensibility and high performance from existed open source efforts
open_model_zoo

5 3,934 8.7 Python

Pre-trained Deep Learning models and demos (high quality and extremely fast)

Project mention: FLaNK Stack Weekly 06 Nov 2023 | dev.to | 2023-11-06
AutoGPTQ

19 3,703 9.5 Python

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Project mention: Setting up LLAMA2 70B Chat locally | /r/developersIndia | 2023-08-18
grakn

11 3,666 9.3 Java

TypeDB: the polymorphic database powered by types

Project mention: Datomic Is Now Free | news.ycombinator.com | 2023-04-27
lightseq

1 3,080 3.7 C++

LightSeq: A High Performance Library for Sequence Processing and Generation
deepsparse

21 2,858 9.6 Python

Sparsity-aware deep learning inference runtime for CPUs

Project mention: Fast Llama 2 on CPUs with Sparse Fine-Tuning and DeepSparse | news.ycombinator.com | 2023-11-23

Interesting company. Yannic Kilcher interviewed Nir Shavit last year and they went into some depth: https://www.youtube.com/watch?v=0PAiQ1jTN5k DeepSparse is on GitHub: https://github.com/neuralmagic/deepsparse
SaaSHub

www.saashub.com
sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-04-12.

Inference related posts

AI Inference Now Available in Supabase Edge Functions
1 project | news.ycombinator.com | 16 Apr 2024
Twinny: Locally hosted (or API hosted) AI code completion for Visual Studio Code
1 project | news.ycombinator.com | 10 Apr 2024
Show HN: I created automatic subtitling app to boost short videos
1 project | news.ycombinator.com | 9 Apr 2024
Hugging Face reverts the license back to Apache 2.0
1 project | news.ycombinator.com | 8 Apr 2024
karpathy/llm.c
10 projects | news.ycombinator.com | 8 Apr 2024
Apple Explores Home Robotics as Potential 'Next Big Thing'
3 projects | news.ycombinator.com | 4 Apr 2024
Using Groq to Build a Real-Time Language Translation App
3 projects | dev.to | 5 Apr 2024
A note from our sponsor - SaaSHub
www.saashub.com | 17 Apr 2024

SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Inference projects? This list will help you:

	Project	Stars
1	ColossalAI	37,775
2	DeepSpeed	32,447
3	whisper.cpp	29,540
4	mediapipe	25,331
5	ncnn	19,125
6	vllm	17,656
7	ts-pattern	10,832
8	amazon-sagemaker-examples	9,477
9	TensorRT	9,031
10	faster-whisper	8,578
11	text-generation-inference	7,722
12	jetson-inference	7,294
13	server	7,277
14	io-ts	6,593
15	openvino	5,818
16	adversarial-robustness-toolbox	4,433
17	torch2trt	4,376
18	TNN	4,267
19	open_model_zoo	3,934
20	AutoGPTQ	3,703
21	grakn	3,666
22	lightseq	3,080
23	deepsparse	2,858