SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python Inference Projects
-
Project mention: ColossalChat: An Open-Source Solution for Cloning ChatGPT with a RLHF Pipeline | news.ycombinator.com | 2023-04-04
> open-source a complete RLHF pipeline ... based on the LLaMA pre-trained model
I've gotten to where when I see "open source AI" I now know it's "well, except for $some_other_dependencies"
Anyway: https://scribe.rip/@yangyou_berkeley/colossalchat-an-open-so... and https://github.com/hpcaitech/ColossalAI#readme (Apache 2) can save you some medium.com heartache at least
-
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
DeepSpeed Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales (https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-chat)
-
InfluxDB
Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.
-
Project mention: What are you building with LLMs? I'm writing an article about what people are building with LLMs | /r/programming | 2023-03-27
Hi everyone. I’m the creator of ChatLLaMA https://github.com/nebuly-ai/nebullvm/tree/main/apps/accelerate/chatllama, an opensource framework to train LLMs with limited resources and create There’s been amazing usage of LLMs in these days, from chatbots to retrieve about company’s product information, to cooking assistants for traditional dishes, and much more. And you? What you building or would love to build with LLMs? Let me know and I’ll share the article about your stories soon. https://qpvirevo4tz.typeform.com/to/T3PruEuE Cheers
-
server
The Triton Inference Server provides an optimized cloud and edge inferencing solution. (by triton-inference-server)
Project mention: Single RTX 3080 or two RTX 3060s for deep learning inference? | /r/computervision | 2023-04-12For inference of CNNs, memory should really not be an issue. If it is a software engineering problem, not a hardware issue. FP16 or Int8 for weights is fine and weight size won’t increase due to the high resolution. And during inference memory used for hidden layer tensors can be reused as soon as the last consumer layer has been processed. You likely using something that is designed for training for inference and that blows up the memory requirement, or if you are using TensorRT or something like that, you need to be careful to avoid that every tasks loads their own copy of the library code into the GPU. Maybe look at https://github.com/triton-inference-server/server
-
-
adversarial-robustness-toolbox
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
Project mention: [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? | /r/MachineLearning | 2023-01-22 -
Since then I figured out live transcription and also how to get faster-whisper running. I still need to write things down in detail at some point, though.
-
Sonar
Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.
-
Project mention: Falcon 40B LLM which beats Llama license changed to Apache 2.0 | news.ycombinator.com | 2023-05-31
For fast inference, the HuggingFace cofounder, Thom Wolf recommends their text-generation-interface library https://github.com/huggingface/text-generation-inference
-
deepsparse
Inference runtime offering GPU-class performance on CPUs and APIs to integrate ML into your application
Project mention: [D] How to get the fastest PyTorch inference and what is the "best" model serving framework? | /r/MachineLearning | 2022-10-28For 1), what is the easiest way to speed up inference (assume only PyTorch and primarily GPU but also some CPU)? I have been using ONNX and Torchscript but there is a bit of a learning curve and sometimes it can be tricky to get the model to actually work. Is there anything else worth trying? I am enthused by things like TorchDynamo (although I have not tested it extensively) due to its apparent ease of use. I also saw the post yesterday about Kernl using (OpenAI) Triton kernels to speed up transformer models which also looks interesting. Are things like SageMaker Neo or NeuralMagic worth trying? My only reservation with some of these is they still seem to be pretty model/architecture specific. I am a little reluctant to put much time into these unless I know others have had some success first.
-
transformer-deploy
Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
Project mention: [D] How to get the fastest PyTorch inference and what is the "best" model serving framework? | /r/MachineLearning | 2022-10-28For 2), I am aware of a few options. Triton inference server is an obvious one as is the ‘transformer-deploy’ version from LDS. My only reservation here is that they require the model compilation or are architecture specific. I am aware of others like Bento, Ray serving and TorchServe. Ideally I would have something that allows any (PyTorch model) to be used without the extra compilation effort (or at least optionally) and has some convenience things like ease of use, easy to deploy, easy to host multiple models and can perform some dynamic batching. Anyway, I am really interested to hear people's experience here as I know there are now quite a few options! Any help is appreciated! Disclaimer - I have no affiliation or are connected in any way with the libraries or companies listed here. These are just the ones I know of. Thanks in advance.
-
-
optimum
🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
Optimum to accelerate inference of transformers with hardware optimization
-
-
-
filetype.py
Small, dependency-free, fast Python package to infer binary file types checking the magic numbers signature
-
pinferencia
Python + Inference - Model Deployment library in Python. Simplest model inference server ever.
Project mention: Show HN: Pinferencia, Deploy Your AI Models with Pretty UI and REST API | news.ycombinator.com | 2022-07-04 -
Project mention: Introducing Basaran: self-hosted open-source alternative to the OpenAI text completion API | /r/LocalLLaMA | 2023-06-01
Instead of integrating GPTQ-for-Lllama, use AutoGPTQ instead.
-
Project mention: Hidet: A Deep Learning Compiler for Efficient Model Serving | news.ycombinator.com | 2023-04-28
Hey @bructhemoose2 can you file an issue, we will try to fix it ASAP: https://github.com/hidet-org/hidet/issues
-
I've tried https://github.com/Ki6an/fastT5 but it works with CPU only.
-
-
model_analyzer
Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Server models.
-
Project mention: EleutherAI announces it has become a non-profit | news.ycombinator.com | 2023-03-02
> My big gripe, and for obvious reasons, is that we need to step away from cloud-based inference, and it doesn't seem like anyone's working on that.
I think there are steps being taken in this direction (check out [1] and [2] for interesting lightweight transpile / ad-hoc training projects) but there is a lack of centralized community for these constrained problems.
-
Project mention: Show HN: Python Monitoring for LLMs, OpenAI, Inference, GPUs | news.ycombinator.com | 2023-04-04
We've built it for apps that use LLMs and other ML models. The lightweight Python agent autoinstruments OpenAI, LangChain, Banana, and other APIs and frameworks. Basically by adding one line of code you'll be able to monitor and analyze latency, errors, compute and costs. Profiling using CProfile, PyTorch Kineto or Yappi can be enabled if code-level statistics are necessary.
Here is a short demo screencast for a LangChain/OpenAI app: https://www.loom.com/share/17ba8aff32b74d74b7ba7f5357ed9250
In terms of data privacy, we only send metadata and statistics to https://graphsignal.com. So no raw data, such as prompts or images leave your app.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python Inference related posts
- AutoGPTQ - An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm
- AutoGPTQ - An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm
- AutoGPTQ - An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm
- Which architecture does Hugging face use for model serving ? Are they using kserve ?
- Need help with auto_gptq install, module not found on windows install
- How to qlora 33B model on a GPU with 24GB of VRAM
- Open Source LLM Quantization Library
-
A note from our sponsor - #<SponsorshipServiceOld:0x00007f0920f7d638>
www.saashub.com | 6 Jun 2023
Index
What are some of the best open-source Inference projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | ColossalAI | 30,025 |
2 | DeepSpeed | 25,390 |
3 | nebuly | 8,152 |
4 | server | 5,418 |
5 | torch2trt | 3,967 |
6 | adversarial-robustness-toolbox | 3,711 |
7 | faster-whisper | 2,237 |
8 | text-generation-inference | 1,592 |
9 | deepsparse | 1,491 |
10 | transformer-deploy | 1,399 |
11 | budgetml | 1,312 |
12 | optimum | 1,200 |
13 | BERT-NER | 1,123 |
14 | DeepSpeed-MII | 740 |
15 | filetype.py | 526 |
16 | pinferencia | 525 |
17 | AutoGPTQ | 517 |
18 | hidet | 438 |
19 | fastT5 | 436 |
20 | sparktorch | 297 |
21 | model_analyzer | 258 |
22 | emlearn | 256 |
23 | graphsignal-python | 171 |