Llama-2-Onnx
dify
Llama-2-Onnx | dify | |
---|---|---|
3 | 14 | |
998 | 33,181 | |
1.5% | 22.7% | |
6.7 | 9.9 | |
5 months ago | 6 days ago | |
Python | TypeScript | |
GNU General Public License v3.0 or later | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Llama-2-Onnx
-
Show HN: Fine-tune your own Llama 2 to replace GPT-3.5/4
System: Here's some docs, answer concisely in a sentence.
YMMV on cost still, depends on cloud vendor, and my intuition & viewpoint agrees with yours, GPT-3.5 is priced low enough that there isn't a case where it makes sense to use another model.
It strikes me now that _very_ likely and not just our intuition: OpenAI's $/GPU hour is likely <= any other vendor's.
The next big step will come from formalizing the stuff rolling around the local LLM community, for months now it's either been one-off $X.c stunts that run on desktop, and the vast majority of the _actual_ usage and progress is coming from porn-y stuff, like all nascent tech.
Microsoft has LLaMa-2 ONNX available on GitHub[1]. There's budding but very small projects in different languages to wrap ONNX. Once there's a genuine cross-platform[2] ONNX wrapper that makes running LLaMa-2 easy, there will be a step change. It'll be "free"[3] to run your fine-tuned model that does as well as GPT-4 .
It's not clear to me exactly when this will occur. It's "difficult" now, but only because the _actual usage_ in the local LLM community doesn't have a reason to invest in ONNX, and it's extremely intimidating to figure out how exactly to get LLaMa-2 running in ONNX. Microsoft kinda threw it up on GitHub and moved on, the sample code even still needs a PyTorch model. I see at least one very small company on HuggingFace that _may_ have figured out full ONNX.
[1] https://github.com/microsoft/Llama-2-Onnx
- FLaNK Stack Weekly for 14 Aug 2023
- Llama 2 on ONNX runs locally
dify
-
What We've Learned from a Year of Building with LLMs
Perhaps this would be of use? https://github.com/langgenius/dify/ I use it for quick workflows and it's pretty intuitive.
-
Ask HN: LLM workflows to avoid copying and pasting from the web interfaces?
This visual IDE for LLM pipelines was posted recently: https://github.com/langgenius/dify
See if it helps.
- FLaNK AI Weekly for 29 April 2024
-
Dify, a visual workflow to build/test LLM applications
> https://github.com/langgenius/dify/blob/main/LICENSE
everyone is apparently a license pioneer
- Dify, an end-to-end, visualized workflow to build/test LLM applications
-
GreptimeAI + Xinference - Efficient Deployment and Monitoring of Your LLM Applications
Xorbits Inference (Xinference) is an open-source platform to streamline the operation and integration of a wide array of AI models. With Xinference, you’re empowered to run inference using any open-source LLMs, embedding models, and multimodal models either in the cloud or on your own premises, and create robust AI-driven applications. It provides a RESTful API compatible with OpenAI API, Python SDK, CLI, and WebUI. Furthermore, it integrates third-party developer tools like LangChain, LlamaIndex, and Dify, facilitating model integration and development.
-
Which LLM framework(s) do you use in production and why?
If you are looking to develop QnA or chat based apps then check out https://dify.ai. Do a quick check and see if it fit your requirements. You can integrate it with your app using the apis it provides
-
New Discoveries in No-Code AI App Building with ChatGPT
As an AI newbie, I used to find coding apps from scratch an absolute nightmare! The learning curve was steep as a ski slope, debugging took endless hours, and developing even a simple AI app nearly drove me insane! But since discovering Dify, it has totally revolutionized my life by enabling app development without any coding skills!
- FLaNK Stack Weekly for 14 Aug 2023
- Interesting LLMOps Tools Dify.ai
What are some alternatives?
vllm - A high-throughput and memory-efficient inference and serving engine for LLMs
langchain-llm-katas - This is a an open-source project designed to help you improve your skills with AI engineering using LLMs and the langchain library
pkgx - the last thing you’ll install
litellm - Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs)
onnx-coreml - ONNX to Core ML Converter
chainlit - Build Conversational AI in minutes ⚡️
awesome-data-temporality - A curated list to help you manage temporal data across many modalities 🚀.
duet-gpt - A conversational semi-autonomous developer assistant. AI pair programming without the copypasta.
OpenPipe - Turn expensive prompts into cheap fine-tuned models
kudu - Mirror of Apache Kudu
llama.cpp - LLM inference in C/C++
IncognitoPilot - An AI code interpreter for sensitive data, powered by GPT-4 or Code Llama / Llama 2.