lmql
llama.cpp
lmql | llama.cpp | |
---|---|---|
30 | 773 | |
3,342 | 57,463 | |
2.9% | - | |
9.5 | 10.0 | |
6 days ago | about 11 hours ago | |
Python | C++ | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
lmql
- Show HN: Fructose, LLM calls as strongly typed functions
-
Prompting LLMs to constrain output
have been experimenting with guidance and lmql. a bit too early to give any well formed opinions but really do like the idea of constraining llm output.
-
[D] Prompt Engineering Seems Like Guesswork - How To Evaluate LLM Application Properly?
the only time i've ever felt like it was anything other than guesswork was using LMQL . not coincidentally, LMQL works with LLMs as autocomplete engines rather than q&a ones.
-
Guidance for selecting a function-calling library?
lqml
-
Show HN: Magentic – Use LLMs as simple Python functions
This is also similar in spirit to LMQL
https://github.com/eth-sri/lmql
- Show HN: LLMs can generate valid JSON 100% of the time
- LangChain Agent Simulation – Multi-Player Dungeons and Dragons
-
The Problem with LangChain
LLM calls are just function calls, so most functional composition is already afforded by any general-purpose language out there. If you need fancy stuff, use something like Python‘s functools.
Working on https://github.com/eth-sri/lmql (shameless plug, sorry), we have always found that compositional abstractions on top of LMQL are mostly there already, once you internalize prompts being functions.
- Is there a UI that can limit LLM tokens to a preset list?
-
Local LLMs: After Novelty Wanes
LMQL is another.
llama.cpp
-
Better and Faster Large Language Models via Multi-Token Prediction
For anyone interested in exploring this, llama.cpp has an example implementation here:
https://github.com/ggerganov/llama.cpp/tree/master/examples/...
- Llama.cpp Bfloat16 Support
-
Fine-tune your first large language model (LLM) with LoRA, llama.cpp, and KitOps in 5 easy steps
Getting started with LLMs can be intimidating. In this tutorial we will show you how to fine-tune a large language model using LoRA, facilitated by tools like llama.cpp and KitOps.
- GGML Flash Attention support merged into llama.cpp
-
Phi-3 Weights Released
well https://github.com/ggerganov/llama.cpp/issues/6849
- Lossless Acceleration of LLM via Adaptive N-Gram Parallel Decoding
- Llama.cpp Working on Support for Llama3
-
Embeddings are a good starting point for the AI curious app developer
Have just done this recently for local chat with pdf feature in https://recurse.chat. (It's a macOS app that has built-in llama.cpp server and local vector database)
Running an embedding server locally is pretty straightforward:
- Get llama.cpp release binary: https://github.com/ggerganov/llama.cpp/releases
- Mixtral 8x22B
- Llama.cpp: Improve CPU prompt eval speed
What are some alternatives?
guidance - A guidance language for controlling large language models.
ollama - Get up and running with Llama 3, Mistral, Gemma, and other large language models.
guidance - A guidance language for controlling large language models. [Moved to: https://github.com/guidance-ai/guidance]
gpt4all - gpt4all: run open-source LLMs anywhere
simpleaichat - Python package for easily interfacing with chat apps, with robust features and minimal code complexity.
text-generation-webui - A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.
NeMo-Guardrails - NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems.
GPTQ-for-LLaMa - 4 bits quantization of LLaMA using GPTQ
guardrails - Adding guardrails to large language models.
ggml - Tensor library for machine learning
basaran - Basaran is an open-source alternative to the OpenAI text completion API. It provides a compatible streaming API for your Hugging Face Transformers-based text generation models.
alpaca.cpp - Locally run an Instruction-Tuned Chat-Style LLM