Turn expensive prompts into cheap fine-tuned models (by OpenPipe)

OpenPipe Alternatives

Similar projects and alternatives to OpenPipe

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better OpenPipe alternative or higher similarity.

OpenPipe reviews and mentions

Posts with mentions or reviews of OpenPipe. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-09-12.
  • Show HN: Fine-tune your own Llama 2 to replace GPT-3.5/4
    8 projects | | 12 Sep 2023
    There has been a lot of interest on HN in fine-tuning open-source LLMs recently (eg. Anyscale's post at I've been playing around with fine-tuning models for a couple of years, and wanted to share some insights and practical code. I’ve condensed what I’ve learned into a small set of notebooks at, covering labeling data, fine-tuning, running efficient inference, and evaluating costs/performance. The 7B model we train here matches GPT-4’s labels 95% of the time on the test set, and for the 5% of cases where they disagree it’s often because the correct answer is genuinely ambiguous.

    What is fine-tuning? You can think of it as a more-powerful form of prompting, where instead of writing your instructions in text you actually encode them in the weights of the model itself. You do this by training an existing model on example input/output pairs that demonstrate the task you want your fine-tuned model to learn. Fine-tuning can work with as few as 50 examples but I usually try to get 1000+ if possible.

    Prompting still has some big advantages over fine-tuning. It's way easier/faster to iterate on your instructions than label data and re-train a model. And operationally it's easier to deploy one big model and just adjust its behavior as necessary vs deploying many small fine-tuned models that will likely each get lower utilization.

    Fine-tuning has one huge advantage though: it is far more effective at guiding a model's behavior than prompting, so you can often get away with a much smaller model. That gets you faster responses and lower inference costs. A fine-tuned Llama 7B model is 50x cheaper than GPT-3.5 on a per-token basis, and for many use cases can produce results that are as good or better!

    For example, classifying the 2M recipes at with GPT-4 would cost $23k. Even with GPT-3.5 it would cost over $1k. The model we fine-tuned performs similarly to GPT-4 and costs just $19 to run over the entire dataset.

    Disclaimer: My brother David and I are working on an open-source product called OpenPipe to help engineers adopt fine-tuning as simply as possible. But none of the information above depends on our startup. The current post is just about sharing information that we’ve learned about fine-tuning. I hope it’s useful!

    If you're interested in finding out more you can watch our repo at, and if you'd just like to chat about a fine-tuning project you're thinking about you can also just email me at [email protected]!

    8 projects | | 12 Sep 2023
    Yep! The linked notebook includes an example of exactly that (fine-tuning a 7b model to match the syntax of GPT-4 function call responses):
  • Patterns for Building LLM-Based Systems and Products
    6 projects | | 1 Aug 2023
    This is fantastic! I found myself nodding along in many places. I've definitely found in practice that evals are critical to shipping LLM-based apps with confidence. I'm actually working on an open-source tool in this space: Would love any feedback on ways to make it more useful. :)
  • Llama 2 – Meta AI
    16 projects | | 18 Jul 2023
    It depends -- do you mean as a general end-user of a chat platform or do you mean to include a model as part of an app or service?

    As an end user, what I've found works in practice is to use one of the models until it gives me an answer I'm unhappy with. At that point I'll try another model and see whether the response is better. Do this for long enough and you'll get a sense of the various models' strengths and weaknesses (although the tl;dr is that if you're willing to pay GPT-4 is better than anything else across most use cases right now).

    For evaluating models for app integrations, I can plug an open source combined playground + eval harness I'm currently developing:

    We're working on integrating Llama 2 so users can test it against other models for their own workloads head to head. (We're also working on a hosted SaaS version so people don't have to download/install Postgres and Node!)

  • A note from our sponsor - SurveyJS | 24 Sep 2023
    SurveyJS JavaScript libraries allow you to easily set up a robust form management system fully integrated into your IT infrastructure where users can create and edit multiple dynamic JSON-based forms in a no-code form builder. Learn more now. Learn more →


Basic OpenPipe repo stats
5 days ago

OpenPipe/OpenPipe is an open source project licensed under Apache License 2.0 which is an OSI approved license.

The primary programming language of OpenPipe is TypeScript.

Write Clean JavaScript Code. Always.
Sonar helps you commit clean code every time. With over 300 unique rules to find JavaScript bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.