vllm VS OpenPipe

Compare vllm vs OpenPipe and see what are their differences.

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs (by vllm-project)

OpenPipe

Turn expensive prompts into cheap fine-tuned models (by OpenPipe)
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
vllm OpenPipe
31 13
19,344 2,381
12.6% 2.0%
9.9 9.9
2 days ago about 1 month ago
Python TypeScript
Apache License 2.0 Apache License 2.0
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

vllm

Posts with mentions or reviews of vllm. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-04-09.
  • AI leaderboards are no longer useful. It's time to switch to Pareto curves
    1 project | news.ycombinator.com | 30 Apr 2024
    I guess the root cause of my claim is that OpenAI won't tell us whether or not GPT-3.5 is an MoE model, and I assumed it wasn't. Since GPT-3.5 is clearly nondeterministic at temp=0, I believed the nondeterminism was due to FPU stuff, and this effect was amplified with GPT-4's MoE. But if GPT-3.5 is also MoE then that's just wrong.

    What makes this especially tricky is that small models are truly 100% deterministic at temp=0 because the relative likelihoods are too coarse for FPU issues to be a factor. I had thought 3.5 was big enough that some of its token probabilities were too fine-grained for the FPU. But that's probably wrong.

    On the other hand, it's not just GPT, there are currently floating-point difficulties in vllm which significantly affect the determinism of any model run on it: https://github.com/vllm-project/vllm/issues/966 Note that a suggested fix is upcasting to float32. So it's possible that GPT-3.5 is using an especially low-precision float and introducing nondeterminism by saving money on compute costs.

    Sadly I do not have the money[1] to actually run a test to falsify any of this. It seems like this would be a good little research project.

    [1] Or the time, or the motivation :) But this stuff is expensive.

  • Mistral AI Launches New 8x22B Moe Model
    4 projects | news.ycombinator.com | 9 Apr 2024
    The easiest is to use vllm (https://github.com/vllm-project/vllm) to run it on a Couple of A100's, and you can benchmark this using this library (https://github.com/EleutherAI/lm-evaluation-harness)
  • FLaNK AI for 11 March 2024
    46 projects | dev.to | 11 Mar 2024
  • Show HN: We got fine-tuning Mistral-7B to not suck
    4 projects | news.ycombinator.com | 7 Feb 2024
    Great question! scheduling workloads onto GPUs in a way where VRAM is being utilised efficiently was quite the challenge.

    What we found was the IO latency for loading model weights into VRAM will kill responsiveness if you don't "re-use" sessions (i.e. where the model weights remain loaded and you run multiple inference sessions over the same loaded weights).

    Obviously projects like https://github.com/vllm-project/vllm exist but we needed to build out a scheduler that can run a fleet of GPUs for a matrix of text/image vs inference/finetune sessions.

    disclaimer: I work on Helix

  • Mistral CEO confirms 'leak' of new open source AI model nearing GPT4 performance
    5 projects | news.ycombinator.com | 31 Jan 2024
    FYI, vLLM also just added experiment multi-lora support: https://github.com/vllm-project/vllm/releases/tag/v0.3.0

    Also check out the new prefix caching, I see huge potential for batch processing purposes there!

  • VLLM Sacrifices Accuracy for Speed
    1 project | news.ycombinator.com | 23 Jan 2024
  • Easy, fast, and cheap LLM serving for everyone
    1 project | news.ycombinator.com | 17 Dec 2023
  • vllm
    1 project | news.ycombinator.com | 15 Dec 2023
  • Mixtral Expert Parallelism
    1 project | news.ycombinator.com | 15 Dec 2023
  • Mixtral 8x7B Support
    1 project | news.ycombinator.com | 11 Dec 2023

OpenPipe

Posts with mentions or reviews of OpenPipe. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-04-23.
  • Ask HN: How does deploying a fine-tuned model work
    4 projects | news.ycombinator.com | 23 Apr 2024
    - Fireworks: $0.20

    If you're looking for an end-to-end flow that will help you gather the training data, validate it, run the fine tune and then define evaluations, you could also check out my company, OpenPipe (https://openpipe.ai/). In addition to hosting your model, we help you organize your training data, relabel if necessary, define evaluations on the finished fine-tune, and monitor its performance in production. Our inference prices are higher than the above providers, but once you're happy with your model you can always export your weights and host them on one of the above!

  • OpenAI: Improvements to the fine-tuning API and expanding our cus
    1 project | news.ycombinator.com | 4 Apr 2024
    Btw, if you've tried fine-tuning OpenAI models before January and came away unimpressed with the quality of the finished model, it's worth trying again. They made some unannounced changes in the last few months that make the fine-tuned models much stronger.

    That said, we've found that Mixtral fine-tunes still typically outperform GPT-3.5 fine tunes, and are far cheaper to serve. It's a bit of a plug, but I honestly think we have the simplest platform to fine-tune multiple models (both API-based like OpenAI as well as open source alternatives) side by side and compare quality. https://openpipe.ai

  • GPT-4, without specialized training, beat a GPT-3.5 class model that cost $10B
    3 projects | news.ycombinator.com | 24 Mar 2024
    IMO it's possible to over-generalize from this datapoint (lol). While it's true that creating a general "finance" model that's stronger than GPT-4 is hard, training a task-specific model is much easier. Eg. "a model that's better than GPT-4 at answering finance-related questions": very hard. "A model that's better than GPT-4 at extracting forward-looking financial projections in a standard format": very easy.

    And in practice, most tasks people are using GPT-4 for in production are more like the latter than the former.

    (Disclaimer: building https://openpipe.ai, which makes it super easy to productize this workflow).

  • Fine Tuning LLMs to Process Massive Amounts of Data 50x Cheaper than GPT-4
    3 projects | dev.to | 8 Jan 2024
    In this article I'll share how I used OpenPipe to effortlessly fine tune Mistral 7B, reducing the cost of one of my prompts by 50x. I included tips and recommendations if you are doing this for the first time, because I definitely left some performance increases on the table. Skip to Fine Tuning Open Recommender if you are specifically interested in what the fine tuning process looks like. You can always DM me on Twitter (@experilearning) or leave a comment if you have questions!
  • OpenAI Switch Kit: Swap OpenAI with any open-source model
    5 projects | news.ycombinator.com | 6 Dec 2023
    The problem is that most non-OpenAI models haven't actually been fine-tuned with function calling in mind, and getting a model to output function-calling-like syntax without having been trained on it is quite unreliable. There are a few alternatives that have been (OpenHermes 2.5 has some function calling in its dataset and does a decent job with it, and the latest Claude does as well), but for now it just doesn't work great.

    That said, it's not that hard to fine-tune a model to understand function calling -- we do that as part of all of our OpenPipe fine tunes, and you can see the serialization method we use here: https://github.com/OpenPipe/OpenPipe/blob/main/app/src/model...

    It isn't particularly difficult, and I'd expect more general-purpose fine-tunes will start doing something similar as they get more mature!

  • OpenAI is too cheap to beat
    4 projects | news.ycombinator.com | 12 Oct 2023
    Eh, OpenAI is too cheap to beat at their own game.

    But there are a ton of use-cases where a 1 to 7B parameter fine-tuned model will be faster, cheaper and easier to deploy than a prompted or fine-tuned GPT-3.5-sized model.

    In fact, it might be a strong statement but I'd argue that most current use-cases for (non-fine-tuned) GPT-3.5 fit in that bucket.

    (Disclaimer: currently building https://openpipe.ai; making it trivial for product engineers to replace OpenAI prompts with their own fine-tuned models.)

  • Show HN: Fine-tune your own Llama 2 to replace GPT-3.5/4
    8 projects | news.ycombinator.com | 12 Sep 2023
    Yep! The linked notebook includes an example of exactly that (fine-tuning a 7b model to match the syntax of GPT-4 function call responses): https://github.com/OpenPipe/OpenPipe/blob/main/examples/clas...
  • Show HN: Automatically convert your GPT-3.5 prompt to Llama 2
    1 project | news.ycombinator.com | 9 Aug 2023
    Hey HN! I'm working on OpenPipe, an open source prompt workshop. I wanted to share a feature we recently released: prompt translations. Prompt translations allow you to quickly convert a prompt between GPT 3.5, Llama 2, and Claude 1/2 compatible formats. The common case would be if you’re using GPT 3.5 in production and are interested in evaluating a Claude or Llama 2 model for your use case. Here's a screen recording to show how it works in our UI: https://twitter.com/OpenPipeLab/status/1687875354311180288

    We’ve found a lot of our users are interested in evaluating Claude or Llama 2, but weren’t sure what changes they need to make to their prompts to get the best performance out of those models. Prompt translations make that easier.

    A bit more background: OpenPipe is an open-source prompt studio that lets you test your LLM prompts against scenarios from your real workloads. We currently support GPT 3.5/4, Claude 1/2, and Llama 2. The full codebase (including prompt translations) is available at https://github.com/OpenPipe/OpenPipe. If you’d prefer a managed experience, you can also sign up for our hosted version at at https://openpipe.ai/.

    Happy to answer any questions!

  • Join the Prompt Engineering World Championships -- Kickoff August 14, $15,000 prize!
    1 project | /r/ChatGPT | 4 Aug 2023
    Star our Github repo at https://github.com/openpipe/openpipe
  • Patterns for Building LLM-Based Systems and Products
    6 projects | news.ycombinator.com | 1 Aug 2023
    This is fantastic! I found myself nodding along in many places. I've definitely found in practice that evals are critical to shipping LLM-based apps with confidence. I'm actually working on an open-source tool in this space: https://github.com/openpipe/openpipe. Would love any feedback on ways to make it more useful. :)

What are some alternatives?

When comparing vllm and OpenPipe you can also consider the following projects:

TensorRT - NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

ollama - Get up and running with Llama 3, Mistral, Gemma, and other large language models.

CTranslate2 - Fast inference engine for Transformer models

agenta - The all-in-one LLM developer platform: prompt management, evaluation, human feedback, and deployment all in one place.

lmdeploy - LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

axolotl - Go ahead and axolotl questions

Llama-2-Onnx

llama - Inference code for Llama models

tritony - Tiny configuration for Triton Inference Server

faster-whisper - Faster Whisper transcription with CTranslate2

marsha - Marsha is a functional, higher-level, English-based programming language that gets compiled into tested Python software by an LLM