ragas vs OpenPipe

ragas

Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines (by explodinggradients)

llm llmops

Source Code

docs.ragas.io

Suggest alternative

Edit details

OpenPipe

Turn expensive prompts into cheap fine-tuned models (by OpenPipe)

AI llm llmops prompt-engineering

Source Code

openpipe.ai

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

ragas		OpenPipe
	Project
10	Mentions	13
4,874	Stars	2,385
17.7%	Growth	2.2%
9.6	Activity	9.9
7 days ago	Latest Commit	about 2 months ago
Python	Language	TypeScript
Apache License 2.0	License	Apache License 2.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

ragas

Posts with mentions or reviews of ragas. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-03-21.

Show HN: Ragas – the de facto open-source standard for evaluating RAG pipelines
4 projects | news.ycombinator.com | 21 Mar 2024

congrats on launching! i think my continuing struggle with looking at Ragas as a company rather than an oss library is that the core of it is like 8 metrics (https://github.com/explodinggradients/ragas/tree/main/src/ra...) that are each 1-200 LOC. i can inline that easily in my app and retain full control, or model that in langchain or haystack or whatever.
why is Ragas a library and a company, rather than an overall "standard" or philosophy (eg like Heroku's 12 Factor Apps) that could maybe be more robust?
(just giving an opp to pitch some underappreciated benefits of using this library)
FLaNK 04 March 2024
26 projects | dev.to | 4 Mar 2024
FLaNK Stack 05 Feb 2024
49 projects | dev.to | 5 Feb 2024
SuperDuperDB - how to use it to talk to your documents locally using llama 7B or Mistral 7B?
7 projects | /r/LocalLLaMA | 9 Dec 2023

Also, at some point you'll need to get serious about evaluation (trust me, you will). You may be interested in https://github.com/explodinggradients/ragas
Ragas – Framework for RAG Evaluation
1 project | news.ycombinator.com | 22 Nov 2023
Ragas: Open-source Evaluation framework for RAG pipelines
1 project | news.ycombinator.com | 27 Oct 2023
Building a customer support chatbot using GPT-3.5 and lLamaIndex🚀
3 projects | dev.to | 19 Sep 2023

The problem becomes worse if you want to inspect outputs from not just one, but several different queries. Luckily, there are several free open source packages such as ragas and DeepEval that can help evaluate your chatbot so you don't have to manually do it 😌
Patterns for Building LLM-Based Systems and Products
6 projects | news.ycombinator.com | 1 Aug 2023

We have build RAGAS framework for this https://github.com/explodinggradients/ragas
[R] All about evaluating Large language models
1 project | /r/MachineLearning | 10 Jul 2023

Hi u/thecuteturtle, I am building open-source projects for evaluating LLM-based applications. Check it out https://github.com/explodinggradients/ragas and if you like to collaborate let me know :)

OpenPipe

Posts with mentions or reviews of OpenPipe. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-04-23.

Ask HN: How does deploying a fine-tuned model work
4 projects | news.ycombinator.com | 23 Apr 2024

- Fireworks: $0.20
If you're looking for an end-to-end flow that will help you gather the training data, validate it, run the fine tune and then define evaluations, you could also check out my company, OpenPipe (https://openpipe.ai/). In addition to hosting your model, we help you organize your training data, relabel if necessary, define evaluations on the finished fine-tune, and monitor its performance in production. Our inference prices are higher than the above providers, but once you're happy with your model you can always export your weights and host them on one of the above!
OpenAI: Improvements to the fine-tuning API and expanding our cus
1 project | news.ycombinator.com | 4 Apr 2024

Btw, if you've tried fine-tuning OpenAI models before January and came away unimpressed with the quality of the finished model, it's worth trying again. They made some unannounced changes in the last few months that make the fine-tuned models much stronger.
That said, we've found that Mixtral fine-tunes still typically outperform GPT-3.5 fine tunes, and are far cheaper to serve. It's a bit of a plug, but I honestly think we have the simplest platform to fine-tune multiple models (both API-based like OpenAI as well as open source alternatives) side by side and compare quality. https://openpipe.ai
GPT-4, without specialized training, beat a GPT-3.5 class model that cost $10B
3 projects | news.ycombinator.com | 24 Mar 2024

IMO it's possible to over-generalize from this datapoint (lol). While it's true that creating a general "finance" model that's stronger than GPT-4 is hard, training a task-specific model is much easier. Eg. "a model that's better than GPT-4 at answering finance-related questions": very hard. "A model that's better than GPT-4 at extracting forward-looking financial projections in a standard format": very easy.
And in practice, most tasks people are using GPT-4 for in production are more like the latter than the former.
(Disclaimer: building https://openpipe.ai, which makes it super easy to productize this workflow).
Fine Tuning LLMs to Process Massive Amounts of Data 50x Cheaper than GPT-4
3 projects | dev.to | 8 Jan 2024

In this article I'll share how I used OpenPipe to effortlessly fine tune Mistral 7B, reducing the cost of one of my prompts by 50x. I included tips and recommendations if you are doing this for the first time, because I definitely left some performance increases on the table. Skip to Fine Tuning Open Recommender if you are specifically interested in what the fine tuning process looks like. You can always DM me on Twitter (@experilearning) or leave a comment if you have questions!
OpenAI Switch Kit: Swap OpenAI with any open-source model
5 projects | news.ycombinator.com | 6 Dec 2023

The problem is that most non-OpenAI models haven't actually been fine-tuned with function calling in mind, and getting a model to output function-calling-like syntax without having been trained on it is quite unreliable. There are a few alternatives that have been (OpenHermes 2.5 has some function calling in its dataset and does a decent job with it, and the latest Claude does as well), but for now it just doesn't work great.
That said, it's not that hard to fine-tune a model to understand function calling -- we do that as part of all of our OpenPipe fine tunes, and you can see the serialization method we use here: https://github.com/OpenPipe/OpenPipe/blob/main/app/src/model...
It isn't particularly difficult, and I'd expect more general-purpose fine-tunes will start doing something similar as they get more mature!
OpenAI is too cheap to beat
4 projects | news.ycombinator.com | 12 Oct 2023

Eh, OpenAI is too cheap to beat at their own game.
But there are a ton of use-cases where a 1 to 7B parameter fine-tuned model will be faster, cheaper and easier to deploy than a prompted or fine-tuned GPT-3.5-sized model.
In fact, it might be a strong statement but I'd argue that most current use-cases for (non-fine-tuned) GPT-3.5 fit in that bucket.
(Disclaimer: currently building https://openpipe.ai; making it trivial for product engineers to replace OpenAI prompts with their own fine-tuned models.)
Show HN: Fine-tune your own Llama 2 to replace GPT-3.5/4
8 projects | news.ycombinator.com | 12 Sep 2023

Yep! The linked notebook includes an example of exactly that (fine-tuning a 7b model to match the syntax of GPT-4 function call responses): https://github.com/OpenPipe/OpenPipe/blob/main/examples/clas...
Show HN: Automatically convert your GPT-3.5 prompt to Llama 2
1 project | news.ycombinator.com | 9 Aug 2023

Hey HN! I'm working on OpenPipe, an open source prompt workshop. I wanted to share a feature we recently released: prompt translations. Prompt translations allow you to quickly convert a prompt between GPT 3.5, Llama 2, and Claude 1/2 compatible formats. The common case would be if you’re using GPT 3.5 in production and are interested in evaluating a Claude or Llama 2 model for your use case. Here's a screen recording to show how it works in our UI: https://twitter.com/OpenPipeLab/status/1687875354311180288
We’ve found a lot of our users are interested in evaluating Claude or Llama 2, but weren’t sure what changes they need to make to their prompts to get the best performance out of those models. Prompt translations make that easier.
A bit more background: OpenPipe is an open-source prompt studio that lets you test your LLM prompts against scenarios from your real workloads. We currently support GPT 3.5/4, Claude 1/2, and Llama 2. The full codebase (including prompt translations) is available at https://github.com/OpenPipe/OpenPipe. If you’d prefer a managed experience, you can also sign up for our hosted version at at https://openpipe.ai/.
Happy to answer any questions!
Join the Prompt Engineering World Championships -- Kickoff August 14, $15,000 prize!
1 project | /r/ChatGPT | 4 Aug 2023

Star our Github repo at https://github.com/openpipe/openpipe
Patterns for Building LLM-Based Systems and Products
6 projects | news.ycombinator.com | 1 Aug 2023

This is fantastic! I found myself nodding along in many places. I've definitely found in practice that evals are critical to shipping LLM-based apps with confidence. I'm actually working on an open-source tool in this space: https://github.com/openpipe/openpipe. Would love any feedback on ways to make it more useful. :)

What are some alternatives?

When comparing ragas and OpenPipe you can also consider the following projects:

deepeval - The LLM Evaluation Framework

ollama - Get up and running with Llama 3, Mistral, Gemma, and other large language models.

chameleon-llm - Codes for "Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models".

agenta - The all-in-one LLM developer platform: prompt management, evaluation, human feedback, and deployment all in one place.

Local-LLM-Langchain - Load local LLMs effortlessly in a Jupyter notebook for testing purposes alongside Langchain or other agents. Contains Oobagooga and KoboldAI versions of the langchain notebooks with examples.

axolotl - Go ahead and axolotl questions

FastLoRAChat - Instruct-tune LLaMA on consumer hardware with shareGPT data

vllm - A high-throughput and memory-efficient inference and serving engine for LLMs

llama - Inference code for Llama models

text-generation-webui-colab - A colab gradio web UI for running Large Language Models

Llama-2-Onnx

ragas vs deepeval OpenPipe vs ollama ragas vs chameleon-llm OpenPipe vs agenta ragas vs Local-LLM-Langchain OpenPipe vs axolotl ragas vs FastLoRAChat OpenPipe vs vllm ragas vs agenta OpenPipe vs llama ragas vs text-generation-webui-colab OpenPipe vs Llama-2-Onnx

Compare ragas vs OpenPipe and see what are their differences.

ragas

OpenPipe

ragas

OpenPipe

What are some alternatives?