marsha VS OpenPipe

Compare marsha vs OpenPipe and see what are their differences.

marsha

Marsha is a functional, higher-level, English-based programming language that gets compiled into tested Python software by an LLM (by alantech)

OpenPipe

Turn expensive prompts into cheap fine-tuned models (by OpenPipe)
Scout Monitoring - Free Django app performance insights with Scout Monitoring
Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
www.scoutapm.com
featured
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
marsha OpenPipe
12 13
461 2,406
0.2% 1.3%
8.4 9.8
7 months ago 16 days ago
Python TypeScript
MIT License Apache License 2.0
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

marsha

Posts with mentions or reviews of marsha. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-07-31.
  • LLMs as compilers
    2 projects | dev.to | 31 Jul 2023
    There is already a lot of hay to mow with the current state of affairs in generative AI. LLMs as proper compilers, compiLLMers if you will, can produce correct code reliably enough today given enough guidance. Getting an LLM to generate correct code requires providing various examples and descriptive instructions. The UX of a chat interface to an LLM inherently leads people to write prompts that do not meet these criteria. We need to make it easy for people to give LLMs precise descriptions and numerous examples as concisely as possible via syntaxes that are similar to English so they remain easy to learn and use. Coq is a great example of a functional programming syntax that is verbose and distant from English, but example-driven via assertions. David Ellis, Alejandro Guillen and I recently introduced Marsha as a proposal for what a syntax that meets the requirements outlined can look like. It is still early, but LLMs will increasingly give us the power to create more accessible representations of computer programs that look close to English. These representations will be distilled by LLMs into the complexities of the current high-level languages. Knowing Java or Python will become a rare skill, akin to individuals specializing in low-level optimizations using C or assembly language these days. Instead, the focus of developer experience will shift to the higher-level abstractions that are built on top of LLMs and composing these abstractions for different tasks. Compillmers will make programming more accessible in the near future such that writing software becomes part of the resume of most knowledge workers.
  • Show HN: Marsha – An LLM-Based Programming Language
    1 project | /r/hypeurls | 27 Jul 2023
    1 project | /r/hackernews | 27 Jul 2023
    7 projects | news.ycombinator.com | 25 Jul 2023
    > You're a bit too black-and-white on this situation.

    While I agree with your other points, I feel this argument doesn't really hold water.

    The output of the c compiler is deterministic.

    I struggle very hard to believe that the floating point rounding errors when you compile C will cause it to occasionally emit a binary that is not byte-identical multiple sequential runs in a row.

    What any program does at runtime is essentially non-deterministic, and that's 100% not what we're talking about here.

    If you consider https://github.com/alantech/marsha/blob/main/examples/web/we... ...

    The generated output of this file is a probability distribution with a sweet spot where the code does what you want; there are multiple outputs of code that sit in the sweet spot. You want one of these.

    The actual output of this file is a probability distribution that includes the examples, but may or may not overlap the sweet spot of 'actually does the right thing'.

    ...in fact, and there's no specific reason to expect that, regardless of the number of examples you provide, the distribution that includes those examples also includes the sweet spot.

    For common examples it will, but I'd argue that it's actually provable that there are times (eg. where the output length of a valid solution would be > the possible out of the model), that regardless of the examples / tests, it's not actually possible to generate a valid solution from. Just like how constraint solvers will sometimes tell you there's no solution that matches all the constraints.

    So, that would be like a compiler error. "You've asked for something impossible".

    ...but I imagine it would be very very difficult to tell the difference between inputs that overlap the sweet spot and those that don't; the ones that don't will have solutions that look right, but actually only cover the examples; and there's literally no way of telling the difference between that and a correct solution without HFRL.

    It seem like an intractable problem to me.

    > Different tools for different scenarios, so if that is a huge problem, don't use Marsha as it currently is.

    As you say~

  • Marsha, a ChatGPT-based programming language
    1 project | /r/ChatGPTCoding | 27 Jul 2023
  • Marsha is a functional, higher-level, English-based programming language that gets compiled into tested Python software more reliably by ChatGPT
    1 project | /r/programming | 27 Jul 2023
  • Llama 2 – Meta AI
    16 projects | news.ycombinator.com | 18 Jul 2023
    So this comment inspired me to write a Roman Numeral to Integer function in out LLM-based programming language, Marsha: https://github.com/alantech/marsha/blob/main/examples/genera...

OpenPipe

Posts with mentions or reviews of OpenPipe. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-04-23.
  • Ask HN: How does deploying a fine-tuned model work
    4 projects | news.ycombinator.com | 23 Apr 2024
    - Fireworks: $0.20

    If you're looking for an end-to-end flow that will help you gather the training data, validate it, run the fine tune and then define evaluations, you could also check out my company, OpenPipe (https://openpipe.ai/). In addition to hosting your model, we help you organize your training data, relabel if necessary, define evaluations on the finished fine-tune, and monitor its performance in production. Our inference prices are higher than the above providers, but once you're happy with your model you can always export your weights and host them on one of the above!

  • OpenAI: Improvements to the fine-tuning API and expanding our cus
    1 project | news.ycombinator.com | 4 Apr 2024
    Btw, if you've tried fine-tuning OpenAI models before January and came away unimpressed with the quality of the finished model, it's worth trying again. They made some unannounced changes in the last few months that make the fine-tuned models much stronger.

    That said, we've found that Mixtral fine-tunes still typically outperform GPT-3.5 fine tunes, and are far cheaper to serve. It's a bit of a plug, but I honestly think we have the simplest platform to fine-tune multiple models (both API-based like OpenAI as well as open source alternatives) side by side and compare quality. https://openpipe.ai

  • GPT-4, without specialized training, beat a GPT-3.5 class model that cost $10B
    3 projects | news.ycombinator.com | 24 Mar 2024
    IMO it's possible to over-generalize from this datapoint (lol). While it's true that creating a general "finance" model that's stronger than GPT-4 is hard, training a task-specific model is much easier. Eg. "a model that's better than GPT-4 at answering finance-related questions": very hard. "A model that's better than GPT-4 at extracting forward-looking financial projections in a standard format": very easy.

    And in practice, most tasks people are using GPT-4 for in production are more like the latter than the former.

    (Disclaimer: building https://openpipe.ai, which makes it super easy to productize this workflow).

  • Fine Tuning LLMs to Process Massive Amounts of Data 50x Cheaper than GPT-4
    3 projects | dev.to | 8 Jan 2024
    In this article I'll share how I used OpenPipe to effortlessly fine tune Mistral 7B, reducing the cost of one of my prompts by 50x. I included tips and recommendations if you are doing this for the first time, because I definitely left some performance increases on the table. Skip to Fine Tuning Open Recommender if you are specifically interested in what the fine tuning process looks like. You can always DM me on Twitter (@experilearning) or leave a comment if you have questions!
  • OpenAI Switch Kit: Swap OpenAI with any open-source model
    5 projects | news.ycombinator.com | 6 Dec 2023
    The problem is that most non-OpenAI models haven't actually been fine-tuned with function calling in mind, and getting a model to output function-calling-like syntax without having been trained on it is quite unreliable. There are a few alternatives that have been (OpenHermes 2.5 has some function calling in its dataset and does a decent job with it, and the latest Claude does as well), but for now it just doesn't work great.

    That said, it's not that hard to fine-tune a model to understand function calling -- we do that as part of all of our OpenPipe fine tunes, and you can see the serialization method we use here: https://github.com/OpenPipe/OpenPipe/blob/main/app/src/model...

    It isn't particularly difficult, and I'd expect more general-purpose fine-tunes will start doing something similar as they get more mature!

  • OpenAI is too cheap to beat
    4 projects | news.ycombinator.com | 12 Oct 2023
    Eh, OpenAI is too cheap to beat at their own game.

    But there are a ton of use-cases where a 1 to 7B parameter fine-tuned model will be faster, cheaper and easier to deploy than a prompted or fine-tuned GPT-3.5-sized model.

    In fact, it might be a strong statement but I'd argue that most current use-cases for (non-fine-tuned) GPT-3.5 fit in that bucket.

    (Disclaimer: currently building https://openpipe.ai; making it trivial for product engineers to replace OpenAI prompts with their own fine-tuned models.)

  • Show HN: Fine-tune your own Llama 2 to replace GPT-3.5/4
    8 projects | news.ycombinator.com | 12 Sep 2023
    Yep! The linked notebook includes an example of exactly that (fine-tuning a 7b model to match the syntax of GPT-4 function call responses): https://github.com/OpenPipe/OpenPipe/blob/main/examples/clas...
  • Show HN: Automatically convert your GPT-3.5 prompt to Llama 2
    1 project | news.ycombinator.com | 9 Aug 2023
    Hey HN! I'm working on OpenPipe, an open source prompt workshop. I wanted to share a feature we recently released: prompt translations. Prompt translations allow you to quickly convert a prompt between GPT 3.5, Llama 2, and Claude 1/2 compatible formats. The common case would be if you’re using GPT 3.5 in production and are interested in evaluating a Claude or Llama 2 model for your use case. Here's a screen recording to show how it works in our UI: https://twitter.com/OpenPipeLab/status/1687875354311180288

    We’ve found a lot of our users are interested in evaluating Claude or Llama 2, but weren’t sure what changes they need to make to their prompts to get the best performance out of those models. Prompt translations make that easier.

    A bit more background: OpenPipe is an open-source prompt studio that lets you test your LLM prompts against scenarios from your real workloads. We currently support GPT 3.5/4, Claude 1/2, and Llama 2. The full codebase (including prompt translations) is available at https://github.com/OpenPipe/OpenPipe. If you’d prefer a managed experience, you can also sign up for our hosted version at at https://openpipe.ai/.

    Happy to answer any questions!

  • Join the Prompt Engineering World Championships -- Kickoff August 14, $15,000 prize!
    1 project | /r/ChatGPT | 4 Aug 2023
    Star our Github repo at https://github.com/openpipe/openpipe
  • Patterns for Building LLM-Based Systems and Products
    6 projects | news.ycombinator.com | 1 Aug 2023
    This is fantastic! I found myself nodding along in many places. I've definitely found in practice that evals are critical to shipping LLM-based apps with confidence. I'm actually working on an open-source tool in this space: https://github.com/openpipe/openpipe. Would love any feedback on ways to make it more useful. :)

What are some alternatives?

When comparing marsha and OpenPipe you can also consider the following projects:

maccarone - AI-managed code blocks in Python ⏪⏩

ollama - Get up and running with Llama 3, Mistral, Gemma, and other large language models.

llama2-chatbot - LLaMA v2 Chatbot

agenta - The all-in-one LLM developer platform: prompt management, evaluation, human feedback, and deployment all in one place.

llama - Inference code for LLaMA models on CPU and Mac M1/M2 GPU

axolotl - Go ahead and axolotl questions

vllm - A high-throughput and memory-efficient inference and serving engine for LLMs

cog-llama-template - LLaMA Cog template

llama - Inference code for Llama models

programming-languages-genealogical-tree - Programming languages genealogical tree

Llama-2-Onnx

Scout Monitoring - Free Django app performance insights with Scout Monitoring
Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
www.scoutapm.com
featured
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured