llama_index
text-generation-webui
llama_index | text-generation-webui | |
---|---|---|
78 | 884 | |
40,945 | 43,250 | |
4.1% | 1.5% | |
9.9 | 9.7 | |
7 days ago | 2 days ago | |
Python | Python | |
MIT License | GNU Affero General Public License v3.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
llama_index
-
Complete Large Language Model (LLM) Learning Roadmap
Resource: LlamaIndex Documentation
-
Quick tip: Replace MongoDB® Atlas with SingleStore Kai in LlamaIndex
The notebook is adapted from the LlamaIndex GitHub repo.
- Show HN: Route your prompts to the best LLM
- LlamaIndex: A data framework for your LLM applications
- FLaNK AI - 01 April 2024
-
Show HN: Ragdoll Studio (fka Arthas.AI) is the FOSS alternative to character.ai
For anyone curious llamaindex's "prompt mixins", they're actually dead simple: https://github.com/run-llama/llama_index/blob/8a8324008764a7... - and maybe no longer supported.
I basically reinvented this wheel in ragdoll but made it more dynamic: https://github.com/bennyschmidt/ragdoll/blob/master/src/util...
- LlamaIndex is a data framework for your LLM applications
- How to verify that a snippet of Python code doesn't access protected members
-
🆓 Local & Open Source AI: a kind ollama & LlamaIndex intro
Being able to plug third party frameworks (Langchain, LlamaIndex) so you can build complex projects
-
I made an app that runs Mistral 7B 0.2 LLM locally on iPhone Pros
Mistral Instruct does use a system prompt.
You can see the raw format here: https://www.promptingguide.ai/models/mistral-7b#chat-templat... and you can see how LllamaIndex uses it here (as an example): https://github.com/run-llama/llama_index/blob/1d861a9440cdc9...
text-generation-webui
-
1,156 Questions Censored by DeepSeek
total time = 392339.02 ms / 2221 tokens
And my exact command was:
llama-server --model DeepSeek-R1-UD-Q2_K_XL-00001-of-00005.gguf --temp 0.6 -c 9000 --min-p 0.1 --top-k 0 --top-p 1 --timeout 3600 --slot-save-path ~/llama_kv_path --port 8117 -ctk q8_0
(IIRC slot save path argument does absolutely nothing unless and is superfluous, but I have been pasting a similar command around and been too lazy to remove it). -ctk q8_0 reduces memory use a bit for context.
I think my 256gb is right at the limit of spilling a bit into swap, so I'm pushing the limits :)
To explain to anyone not aware of llama-server: it exposes (a somewhat) OpenAI-compatible API and then you can use it with any software that speaks that. llama-server itself also has a UI, but I haven't used it.
I had some SSH tunnels set up to use the server interface with https://github.com/oobabooga/text-generation-webui where I hacked an "OpenAI" client to it (that UI doesn't have it natively). The only reason I use the oobabooga UI is out of habit so I don't recommend this setup to others.
-
DeepSeek-R1 with Dynamic 1.58-bit Quantization
Can't this kind of repetition be dealt with at the decoder level, like for any models? (see DRY decoder for instance: https://github.com/oobabooga/text-generation-webui/pull/5677)
-
I Run LLMs Locally
Still nothing better than oobabooga (https://github.com/oobabooga/text-generation-webui) in terms of maximalism/"Pro"/"Prosumer" LLM UI/UX ALA Blender, Photoshop, Final Cut Pro, etc.
Embarrassing and any VCs reading this can contact me to talk about how to fix that. lm-studio is today the closest competition (but not close enough) and Adobe or Microsoft could do it if they fired their current folks which prevent this from happening.
If you're not using Oobabooga, you're likely not playing with the settings on models, and if you're not playing with your models settings, you're hardly even scratching the surface on its total capabilities.
-
Yi-Coder: A Small but Mighty LLM for Code
I understand your situation. It sounds super simple to me now but I remember having to spend at least a week trying to get the concepts and figuring out what prerequisite knowledge I would need between a continium of just using chatgpt and learning relevant vector math etc. It is much closer to the chatgpt side fortunately. I don't like ollama per se (because i can't reuse its models due to it compressing them in its own format) but it's still a very good place to start. Any interface that lets you download models as gguf from huggingface will do just fine. Don't be turned off by the roleplaying/waifu sounding frontend names. They are all fine. This is what I mostly prefer: https://github.com/oobabooga/text-generation-webui
- XTC: An LLM sampler that boosts creativity, breaks writing clichés
-
Codestral Mamba
Why do people recommend this instead of the much better oobabooga text-gen-webui?
https://github.com/oobabooga/text-generation-webui
It's like you hate settings, features, and access to many backends!
-
Why I made TabbyAPI
The issue is running the model. Exl2 is part of the ExllamaV2 library, but to run a model, a user needs an API server. The only option out there was using text-generation-webui (TGW), a program that bundled every loader out there into a Gradio webui. Gradio is a common “building-block” UI framework for python development and is often used for AI applications. This setup was good for a while, until it wasn’t.
-
Take control! Run ChatGPT and Github Copilot yourself!
What I described here is most optimal workflow I found to be working for me. There are multiple ways to run open source models locally worth mentioning like Oobabooga WebUI or LM Studio, however I didn't found them to be so seamless, and fit my workflow.
-
Ask HN: What is the current (Apr. 2024) gold standard of running an LLM locally?
Some of the tools offer a path to doing tool use (fetching URLs and doing things with them) or RAG (searching your documents). I think Oobabooga https://github.com/oobabooga/text-generation-webui offers the latter through plugins.
Our tool, https://github.com/transformerlab/transformerlab-app also supports the latter (document search) using local llms.
-
Ask HN: How to get started with local language models?
You can use webui https://github.com/oobabooga/text-generation-webui
Once you get a version up and running I make a copy before I update it as several times updates have broken my working version and caused headaches.
a decent explanation of parameters outside of reading archive papers: https://github.com/oobabooga/text-generation-webui/wiki/03-%...
a news ai website:
What are some alternatives?
langchain - 🦜🔗 Build context-aware reasoning applications
ollama - Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, Mistral Small 3.1 and other large language models.
gpt-llama.cpp - A llama.cpp drop-in replacement for OpenAI's GPT endpoints, allowing GPT-powered apps to run off local llama.cpp models instead of OpenAI.
koboldcpp - Run GGUF models easily with a KoboldAI UI. One File. Zero Install.
langchain - ⚡ Building applications with LLMs through composability ⚡ [Moved to: https://github.com/langchain-ai/langchain]
SillyTavern - LLM Frontend for Power Users.