Large language models are having their Stable Diffusion moment

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • text-generation-webui

    A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

    Massively diverse community working on the AUTOMATIC1111 of textgen at https://github.com/oobabooga/text-generation-webui/

    Ooba's textgen webui runs LLaMA 4bit on 8GB VRAM consumer video cards.

    KoboldAI (https://github.com/henk717/KoboldAI/) by the creator the AI Horde of stable diffusion fame also supports LLaMA and has forked the Stable Horde for textgen. Currently horde is running half a dozen LLaMA models accessible at http://lite.koboldai.net/

  • llama

    Inference code for Llama models

    The link from GP is the CPU only one implemented in C++.

    The python + GPU one can be found on the official facebook repo: https://github.com/facebookresearch/llama (Presumably GP thought this was already known to everyone so they pasted the other link)

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

  • stable-diffusion-webui

    Stable Diffusion web UI

    One thing I think will be different and that had totally escaped my radar until recently is just the enormous and diverse community that has been developing around Stable Diffusion, which I think will be less likely to form with language models.

    I just recently tried out one of the most popular [0] Stable Diffusion WebUIs locally, and I'm positively surprised at how different it is to the rest of the space around ML research/computing. I consider myself to be a competent software engineer, but I still often find it pretty tricky to get e.g. HuggingFace models running and doing what I envision them to do. SpeechT5 for instance is reported to do voice transformations, but it took me a good bit of time and hair-pulling to figure out how to extract voice embeddings from .wav files. I'm sure the way to do this is obvious to most researchers, maybe to the point of feeling like this needs not a mention in the documentation, but it certainly wasn't clear to me.

    The community around Stable Diffusion is much more inclusive, though. Tools go the extra effort to be easy to use, and documentation for community created models/scripts/tools is so accessible as to be perfectly usable by a non-technical user who is willing to adventure a little bit into the world of hardcore computing by following instructions. Sure, nothing is too polished and you often get the feeling that it's "an ugly thing, but an ugly thing that works", but the point is that it's incredibly accessible. People get to actually use these models to build their stories, fantasy worlds, to work, and things get progressively more impressive as the community builds upon itself (I loved the style of [1] and even effortlessly merged its style with another one in the WebUI, and ControlNet [2] is amazing and gives me ideas for integrating my photography with AI).

    I think the general interest in creating images is larger than for LLMs with their current limitations (especially in current consumer-available hardware). I do wonder how much this community interest will boost the spaces in the longer run, but right now I can't help but be impressed by the difference in usability and collaborative development between image generative and other types of models.

    [0] https://github.com/AUTOMATIC1111/stable-diffusion-webui

    [1] https://civitai.com/models/4998/vivid-watercolors

    [2] https://github.com/Mikubill/sd-webui-controlnet

  • sd-webui-controlnet

    WebUI extension for ControlNet

    One thing I think will be different and that had totally escaped my radar until recently is just the enormous and diverse community that has been developing around Stable Diffusion, which I think will be less likely to form with language models.

    I just recently tried out one of the most popular [0] Stable Diffusion WebUIs locally, and I'm positively surprised at how different it is to the rest of the space around ML research/computing. I consider myself to be a competent software engineer, but I still often find it pretty tricky to get e.g. HuggingFace models running and doing what I envision them to do. SpeechT5 for instance is reported to do voice transformations, but it took me a good bit of time and hair-pulling to figure out how to extract voice embeddings from .wav files. I'm sure the way to do this is obvious to most researchers, maybe to the point of feeling like this needs not a mention in the documentation, but it certainly wasn't clear to me.

    The community around Stable Diffusion is much more inclusive, though. Tools go the extra effort to be easy to use, and documentation for community created models/scripts/tools is so accessible as to be perfectly usable by a non-technical user who is willing to adventure a little bit into the world of hardcore computing by following instructions. Sure, nothing is too polished and you often get the feeling that it's "an ugly thing, but an ugly thing that works", but the point is that it's incredibly accessible. People get to actually use these models to build their stories, fantasy worlds, to work, and things get progressively more impressive as the community builds upon itself (I loved the style of [1] and even effortlessly merged its style with another one in the WebUI, and ControlNet [2] is amazing and gives me ideas for integrating my photography with AI).

    I think the general interest in creating images is larger than for LLMs with their current limitations (especially in current consumer-available hardware). I do wonder how much this community interest will boost the spaces in the longer run, but right now I can't help but be impressed by the difference in usability and collaborative development between image generative and other types of models.

    [0] https://github.com/AUTOMATIC1111/stable-diffusion-webui

    [1] https://civitai.com/models/4998/vivid-watercolors

    [2] https://github.com/Mikubill/sd-webui-controlnet

  • sentencepiece

    Unsupervised text tokenizer for Neural Network-based text generation.

  • KoboldAI

    Massively diverse community working on the AUTOMATIC1111 of textgen at https://github.com/oobabooga/text-generation-webui/

    Ooba's textgen webui runs LLaMA 4bit on 8GB VRAM consumer video cards.

    KoboldAI (https://github.com/henk717/KoboldAI/) by the creator the AI Horde of stable diffusion fame also supports LLaMA and has forked the Stable Horde for textgen. Currently horde is running half a dozen LLaMA models accessible at http://lite.koboldai.net/

  • gpt_index

    Discontinued LlamaIndex (GPT Index) is a project that provides a central interface to connect your LLM's with external data. [Moved to: https://github.com/jerryjliu/llama_index]

    This is exactly what LlamaIndex is meant to solve!

    A set of data structures to augment LLM's with your data: https://github.com/jerryjliu/gpt_index

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • langchain

    Discontinued ⚡ Building applications with LLMs through composability ⚡ [Moved to: https://github.com/langchain-ai/langchain] (by hwchase17)

    The langchain project is an example of the iterative queries approach. It comes with constructs for working memory, factual lookup / calculation agents, etc.

    https://github.com/hwchase17/langchain

    The general (non-technical) guideline is that the LLMs can "answer" anything you just gave them the answer for. So you give it a problem, ask it how to solve it, tell it to use that method and explain the data it needs, give it that data, and then show it everything at once: "With this data you requested and summarized, use this technique to answer this question".

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts