vllm

A high-throughput and memory-efficient inference and serving engine for LLMs (by vllm-project)

Vllm Alternatives

Similar projects and alternatives to vllm

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better vllm alternative or higher similarity.

vllm discussion

Log in or Post with

vllm reviews and mentions

Posts with mentions or reviews of vllm. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-11-08.
  • Running Phi 3 with vLLM and Ray Serve
    6 projects | dev.to | 8 Nov 2024
    vLLM stands for virtual large language models. It is one of the open source fast inferencing and serving libraries. As the name suggests, โ€˜virtualโ€™ encapsulates the concept of virtual memory and paging from operating systems, which allows addressing the problem of maximum utilization of resources and providing faster token generation by utilizing PagedAttention. Traditional LLM serving involves storing large attention keys and value tensors in GPU memory, leading to inefficient memory usage.
  • AIM Weekly for 04Nov2024
    29 projects | dev.to | 4 Nov 2024
    ๐ŸŒ Composed Image Retrieval ๐Ÿ“Ž Intro to Multimodal LLama 3.2 ๐Ÿ› ๏ธ Multi Agent Concierge ๐Ÿ’ป RAG with Langchain Granite, Milvus ๐Ÿซถ Download content โœ… Transformer Replacement? ๐Ÿค– vLLM for runing models ๐ŸŒ Amphion ๐Ÿ“ Autogluon ๐Ÿš™ Notebook LLama like Google's Notebook LLM ๐Ÿซถ Monocle2ai for tracing GenAI app code LFA&D Project ๐Ÿค– Bee Agent Framework โœ… LLama RFP Response โ–ถ๏ธ GenAI Script ๐Ÿ‘ฝ Simular AI Agent S ๐Ÿฆพ DrawDB with AI โœจ Ollama with LLama 3.2 Vision!!!! Preview ๐Ÿš• Powerful RAG Checker ๐Ÿ“Š SQL Generator ๐Ÿ’ป Role of LLMs ๐Ÿ Document Extraction ๐Ÿ•ถ๏ธ Open Source Vector DB Reddit ๐Ÿ” The Practical Guide to Self Hosting LLM ๐Ÿฆพ Stagehand Controller ๐Ÿ•ถ๏ธ Understanding HNSWLIB ๐Ÿ Best practices in RAG ๐Ÿ’ป Enigma Agent ๐Ÿ“ Langchain, Ollama, Phi3 for Function Calling ๐Ÿ”‹ Compass Judger ๐Ÿ“ Princeton NLP SimPO ๐Ÿ” Princeton NLP ProLong ๐Ÿ”‹ Princeton NLP HELMET ๐Ÿง Ollama Cheatsheet ๐Ÿš• Princeton NLP CopyCat ๐Ÿ“Š Princeton NLP Shp ๐Ÿ•ถ๏ธ Can LLM Solve Hard Github Issues ๐Ÿ“ Enabling Large Language Models to Generate Text with Citations ๐Ÿ”‹ Princeton NLP CharXiv ๐Ÿ“Š Awesome AI Agents List ๐Ÿฆพ Nomicโ€™s Matryoshka text embedding model
  • Quantized Llama models with increased speed and a reduced memory footprint
    7 projects | news.ycombinator.com | 24 Oct 2024
    Yes, I've used the v3.2 3B-Instruct model in a slack app. Specifically using vLLM, with a template: https://github.com/vllm-project/vllm/blob/main/examples/tool...

    Works as expected if you provide a few system prompts with context

  • Tutorial: Deploying Llama 3.1 405B on GKE Autopilot with 8 x A100 80GB
    2 projects | dev.to | 7 Oct 2024
    git clone https://github.com/vllm-project/vllm.git cd vllm/benchmarks wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json python3 benchmark_serving.py --backend openai \ --base-url http://localhost:8000/openai \ --dataset-name=sharegpt --dataset-path=ShareGPT_V3_unfiltered_cleaned_split.json \ --model llama-3.1-405b-instruct-fp8-a100 \ --seed 12345 --tokenizer neuralmagic/Meta-Llama-3.1-405B-Instruct-FP8
  • Show HN: We made glhf.chat โ€“ run almost any open-source LLM, including 405B
    1 project | news.ycombinator.com | 23 Jul 2024
    Hey there!

    We currently use vllm under the hood and vllm doesn't support Codestral (yet). We're working on expanding our model support. Hence (almost) any model.

    Thanks for testing! :)

    https://github.com/vllm-project/vllm/issues/6479

    - Billy :)

  • Codestral Mamba
    15 projects | news.ycombinator.com | 16 Jul 2024
  • vLLM, a fast and easy-to-use library for LLM inference and serving
    1 project | news.ycombinator.com | 15 Jul 2024
  • Deploy the vLLM Inference Engine to Run Large Language Models (LLM) on Koyeb
    2 projects | dev.to | 26 Jun 2024
    vLLM is a high performance and easy-to-use library for running inference workloads. It allows you to download popular models from Hugging Face, run them on local hardware with custom configuration, and serve an OpenAI-compatible API server as an interface. Using vLLM, you can experiment with different models and build LLM-based applications without relying on externally hosted services.
  • Best LLM Inference Engines and Servers to Deploy LLMs in Production
    6 projects | dev.to | 5 Jun 2024
    GitHub repository: https://github.com/vllm-project/vllm
  • AI leaderboards are no longer useful. It's time to switch to Pareto curves
    1 project | news.ycombinator.com | 30 Apr 2024
    I guess the root cause of my claim is that OpenAI won't tell us whether or not GPT-3.5 is an MoE model, and I assumed it wasn't. Since GPT-3.5 is clearly nondeterministic at temp=0, I believed the nondeterminism was due to FPU stuff, and this effect was amplified with GPT-4's MoE. But if GPT-3.5 is also MoE then that's just wrong.

    What makes this especially tricky is that small models are truly 100% deterministic at temp=0 because the relative likelihoods are too coarse for FPU issues to be a factor. I had thought 3.5 was big enough that some of its token probabilities were too fine-grained for the FPU. But that's probably wrong.

    On the other hand, it's not just GPT, there are currently floating-point difficulties in vllm which significantly affect the determinism of any model run on it: https://github.com/vllm-project/vllm/issues/966 Note that a suggested fix is upcasting to float32. So it's possible that GPT-3.5 is using an especially low-precision float and introducing nondeterminism by saving money on compute costs.

    Sadly I do not have the money[1] to actually run a test to falsify any of this. It seems like this would be a good little research project.

    [1] Or the time, or the motivation :) But this stuff is expensive.

  • A note from our sponsor - SaaSHub
    www.saashub.com | 3 Dec 2024
    SaaSHub helps you find the best software and product alternatives Learn more โ†’

Stats

Basic vllm repo stats
40
31,096
10.0
3 days ago

vllm-project/vllm is an open source project licensed under Apache License 2.0 which is an OSI approved license.

The primary programming language of vllm is Python.


Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com

Did you konow that Python is
the 1st most popular programming language
based on number of metions?