marsha
OpenPipe
marsha | OpenPipe | |
---|---|---|
12 | 13 | |
461 | 2,406 | |
0.2% | 1.3% | |
8.4 | 9.8 | |
7 months ago | 16 days ago | |
Python | TypeScript | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
marsha
-
LLMs as compilers
There is already a lot of hay to mow with the current state of affairs in generative AI. LLMs as proper compilers, compiLLMers if you will, can produce correct code reliably enough today given enough guidance. Getting an LLM to generate correct code requires providing various examples and descriptive instructions. The UX of a chat interface to an LLM inherently leads people to write prompts that do not meet these criteria. We need to make it easy for people to give LLMs precise descriptions and numerous examples as concisely as possible via syntaxes that are similar to English so they remain easy to learn and use. Coq is a great example of a functional programming syntax that is verbose and distant from English, but example-driven via assertions. David Ellis, Alejandro Guillen and I recently introduced Marsha as a proposal for what a syntax that meets the requirements outlined can look like. It is still early, but LLMs will increasingly give us the power to create more accessible representations of computer programs that look close to English. These representations will be distilled by LLMs into the complexities of the current high-level languages. Knowing Java or Python will become a rare skill, akin to individuals specializing in low-level optimizations using C or assembly language these days. Instead, the focus of developer experience will shift to the higher-level abstractions that are built on top of LLMs and composing these abstractions for different tasks. Compillmers will make programming more accessible in the near future such that writing software becomes part of the resume of most knowledge workers.
-
Show HN: Marsha – An LLM-Based Programming Language
> You're a bit too black-and-white on this situation.
While I agree with your other points, I feel this argument doesn't really hold water.
The output of the c compiler is deterministic.
I struggle very hard to believe that the floating point rounding errors when you compile C will cause it to occasionally emit a binary that is not byte-identical multiple sequential runs in a row.
What any program does at runtime is essentially non-deterministic, and that's 100% not what we're talking about here.
If you consider https://github.com/alantech/marsha/blob/main/examples/web/we... ...
The generated output of this file is a probability distribution with a sweet spot where the code does what you want; there are multiple outputs of code that sit in the sweet spot. You want one of these.
The actual output of this file is a probability distribution that includes the examples, but may or may not overlap the sweet spot of 'actually does the right thing'.
...in fact, and there's no specific reason to expect that, regardless of the number of examples you provide, the distribution that includes those examples also includes the sweet spot.
For common examples it will, but I'd argue that it's actually provable that there are times (eg. where the output length of a valid solution would be > the possible out of the model), that regardless of the examples / tests, it's not actually possible to generate a valid solution from. Just like how constraint solvers will sometimes tell you there's no solution that matches all the constraints.
So, that would be like a compiler error. "You've asked for something impossible".
...but I imagine it would be very very difficult to tell the difference between inputs that overlap the sweet spot and those that don't; the ones that don't will have solutions that look right, but actually only cover the examples; and there's literally no way of telling the difference between that and a correct solution without HFRL.
It seem like an intractable problem to me.
> Different tools for different scenarios, so if that is a huge problem, don't use Marsha as it currently is.
As you say~
- Marsha, a ChatGPT-based programming language
- Marsha is a functional, higher-level, English-based programming language that gets compiled into tested Python software more reliably by ChatGPT
-
Llama 2 – Meta AI
So this comment inspired me to write a Roman Numeral to Integer function in out LLM-based programming language, Marsha: https://github.com/alantech/marsha/blob/main/examples/genera...
OpenPipe
-
Ask HN: How does deploying a fine-tuned model work
- Fireworks: $0.20
If you're looking for an end-to-end flow that will help you gather the training data, validate it, run the fine tune and then define evaluations, you could also check out my company, OpenPipe (https://openpipe.ai/). In addition to hosting your model, we help you organize your training data, relabel if necessary, define evaluations on the finished fine-tune, and monitor its performance in production. Our inference prices are higher than the above providers, but once you're happy with your model you can always export your weights and host them on one of the above!
-
OpenAI: Improvements to the fine-tuning API and expanding our cus
Btw, if you've tried fine-tuning OpenAI models before January and came away unimpressed with the quality of the finished model, it's worth trying again. They made some unannounced changes in the last few months that make the fine-tuned models much stronger.
That said, we've found that Mixtral fine-tunes still typically outperform GPT-3.5 fine tunes, and are far cheaper to serve. It's a bit of a plug, but I honestly think we have the simplest platform to fine-tune multiple models (both API-based like OpenAI as well as open source alternatives) side by side and compare quality. https://openpipe.ai
-
GPT-4, without specialized training, beat a GPT-3.5 class model that cost $10B
IMO it's possible to over-generalize from this datapoint (lol). While it's true that creating a general "finance" model that's stronger than GPT-4 is hard, training a task-specific model is much easier. Eg. "a model that's better than GPT-4 at answering finance-related questions": very hard. "A model that's better than GPT-4 at extracting forward-looking financial projections in a standard format": very easy.
And in practice, most tasks people are using GPT-4 for in production are more like the latter than the former.
(Disclaimer: building https://openpipe.ai, which makes it super easy to productize this workflow).
-
Fine Tuning LLMs to Process Massive Amounts of Data 50x Cheaper than GPT-4
In this article I'll share how I used OpenPipe to effortlessly fine tune Mistral 7B, reducing the cost of one of my prompts by 50x. I included tips and recommendations if you are doing this for the first time, because I definitely left some performance increases on the table. Skip to Fine Tuning Open Recommender if you are specifically interested in what the fine tuning process looks like. You can always DM me on Twitter (@experilearning) or leave a comment if you have questions!
-
OpenAI Switch Kit: Swap OpenAI with any open-source model
The problem is that most non-OpenAI models haven't actually been fine-tuned with function calling in mind, and getting a model to output function-calling-like syntax without having been trained on it is quite unreliable. There are a few alternatives that have been (OpenHermes 2.5 has some function calling in its dataset and does a decent job with it, and the latest Claude does as well), but for now it just doesn't work great.
That said, it's not that hard to fine-tune a model to understand function calling -- we do that as part of all of our OpenPipe fine tunes, and you can see the serialization method we use here: https://github.com/OpenPipe/OpenPipe/blob/main/app/src/model...
It isn't particularly difficult, and I'd expect more general-purpose fine-tunes will start doing something similar as they get more mature!
-
OpenAI is too cheap to beat
Eh, OpenAI is too cheap to beat at their own game.
But there are a ton of use-cases where a 1 to 7B parameter fine-tuned model will be faster, cheaper and easier to deploy than a prompted or fine-tuned GPT-3.5-sized model.
In fact, it might be a strong statement but I'd argue that most current use-cases for (non-fine-tuned) GPT-3.5 fit in that bucket.
(Disclaimer: currently building https://openpipe.ai; making it trivial for product engineers to replace OpenAI prompts with their own fine-tuned models.)
-
Show HN: Fine-tune your own Llama 2 to replace GPT-3.5/4
Yep! The linked notebook includes an example of exactly that (fine-tuning a 7b model to match the syntax of GPT-4 function call responses): https://github.com/OpenPipe/OpenPipe/blob/main/examples/clas...
-
Show HN: Automatically convert your GPT-3.5 prompt to Llama 2
Hey HN! I'm working on OpenPipe, an open source prompt workshop. I wanted to share a feature we recently released: prompt translations. Prompt translations allow you to quickly convert a prompt between GPT 3.5, Llama 2, and Claude 1/2 compatible formats. The common case would be if you’re using GPT 3.5 in production and are interested in evaluating a Claude or Llama 2 model for your use case. Here's a screen recording to show how it works in our UI: https://twitter.com/OpenPipeLab/status/1687875354311180288
We’ve found a lot of our users are interested in evaluating Claude or Llama 2, but weren’t sure what changes they need to make to their prompts to get the best performance out of those models. Prompt translations make that easier.
A bit more background: OpenPipe is an open-source prompt studio that lets you test your LLM prompts against scenarios from your real workloads. We currently support GPT 3.5/4, Claude 1/2, and Llama 2. The full codebase (including prompt translations) is available at https://github.com/OpenPipe/OpenPipe. If you’d prefer a managed experience, you can also sign up for our hosted version at at https://openpipe.ai/.
Happy to answer any questions!
-
Join the Prompt Engineering World Championships -- Kickoff August 14, $15,000 prize!
Star our Github repo at https://github.com/openpipe/openpipe
-
Patterns for Building LLM-Based Systems and Products
This is fantastic! I found myself nodding along in many places. I've definitely found in practice that evals are critical to shipping LLM-based apps with confidence. I'm actually working on an open-source tool in this space: https://github.com/openpipe/openpipe. Would love any feedback on ways to make it more useful. :)
What are some alternatives?
maccarone - AI-managed code blocks in Python ⏪⏩
ollama - Get up and running with Llama 3, Mistral, Gemma, and other large language models.
llama2-chatbot - LLaMA v2 Chatbot
agenta - The all-in-one LLM developer platform: prompt management, evaluation, human feedback, and deployment all in one place.
llama - Inference code for LLaMA models on CPU and Mac M1/M2 GPU
axolotl - Go ahead and axolotl questions
vllm - A high-throughput and memory-efficient inference and serving engine for LLMs
cog-llama-template - LLaMA Cog template
llama - Inference code for Llama models
programming-languages-genealogical-tree - Programming languages genealogical tree
Llama-2-Onnx