State-of-the-art open-source chatbot, Vicuna-13B, just released model weights

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • FastChat

    An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

  • Yes, you need to convert the original LLaMA model to the huggingface format, according to https://github.com/lm-sys/FastChat#vicuna-weights and https://huggingface.co/docs/transformers/main/model_doc/llam...

  • ai-guide

    Discontinued A guide for getting started with FOSS text generation.

  • Hi! Funnily enough I couldn't find much on it either, so that's exactly what I've been working on myself for the past few months: just in case this kind of question got asked.

    I've recently opened a GitHub repository which includes information for both AI model series[0] and frontends you can use to run them[1]. I've also wrote a Reddit post that's messier, but a lot more technical[2].

    I try to keep them as up-to-date as possible, but I might've missed something or my info may not be completely accurate. It's mostly to help get people's feet wet.

    [0] - https://github.com/Crataco/ai-guide/blob/main/guide/models.m...

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • llama-tools

    Tools for the LLaMA language model

  • > Unfortunately there's a mismatch between the model generated by the delta patcher and the tokenizer (32001 vs 32000 tokens). There's a tool to fix this at llama-tools (https://github.com/Ronsor/llama-tools). Add 1 token like (C controltoken), and then run the conversion script.

  • rllama

    Rust+OpenCL+AVX2 implementation of LLaMA inference code

  • No, my project is called rllama. No relation to GGML. https://github.com/Noeda/rllama

  • dalai

    The simplest way to run LLaMA on your local machine

  • -> https://old.reddit.com/user/Crataco/comments/zuowi9/opensour...

    https://github.com/cocktailpeanut/dalai

    the 4-bit quantized version of LLaMA 13B runs on my laptop without a dedicated GPU and I guess the same would apply to quantized vicuna 13B but I haven't tried that yet (converted as in this link but for 13B instead of 7B https://github.com/ggerganov/llama.cpp#usage )

    GPT4All Lora's also works, perhaps the most compelling results I've got yet in my local computer - I have to try quantized Vicuna to see how that one goes, but processing the files to get a 4bit quantized version will take many hours so I'm a bit hesitant

    PS: converting 13B Llama took my laptop's i7 around 20 hours and required a large swap file on top of its 16GB of RAM

    feel free to answer back if you're trying any of these things this week (later I might lose track)

  • llama.cpp

    LLM inference in C/C++

  • -> https://old.reddit.com/user/Crataco/comments/zuowi9/opensour...

    https://github.com/cocktailpeanut/dalai

    the 4-bit quantized version of LLaMA 13B runs on my laptop without a dedicated GPU and I guess the same would apply to quantized vicuna 13B but I haven't tried that yet (converted as in this link but for 13B instead of 7B https://github.com/ggerganov/llama.cpp#usage )

    GPT4All Lora's also works, perhaps the most compelling results I've got yet in my local computer - I have to try quantized Vicuna to see how that one goes, but processing the files to get a 4bit quantized version will take many hours so I'm a bit hesitant

    PS: converting 13B Llama took my laptop's i7 around 20 hours and required a large swap file on top of its 16GB of RAM

    feel free to answer back if you're trying any of these things this week (later I might lose track)

  • text-generation-webui

    A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

  • The default loader doesn't seem to let you load quantized models but if you use something like https://github.com/oobabooga/text-generation-webui you can 1) use the model with `--load-in-8bit` which halves the memory (runs on my 24GB consumer card w/o an issue then, probably would fit on a 16GB card). There are also 4-bit quantized models and you can run probably `anon8231489123/vicuna-13b-GPTQ-4bit-128g --model_type LLaMA --wbits 4 --groupsize 128` although there have been reports that bitsandbytes have problems w/ 4bit perf on some cards: https://github.com/TimDettmers/bitsandbytes/issues/181

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • bitsandbytes

    Accessible large language models via k-bit quantization for PyTorch.

  • The default loader doesn't seem to let you load quantized models but if you use something like https://github.com/oobabooga/text-generation-webui you can 1) use the model with `--load-in-8bit` which halves the memory (runs on my 24GB consumer card w/o an issue then, probably would fit on a 16GB card). There are also 4-bit quantized models and you can run probably `anon8231489123/vicuna-13b-GPTQ-4bit-128g --model_type LLaMA --wbits 4 --groupsize 128` although there have been reports that bitsandbytes have problems w/ 4bit perf on some cards: https://github.com/TimDettmers/bitsandbytes/issues/181

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • More Agents Is All You Need: LLMs performance scales with the number of agents

    2 projects | news.ycombinator.com | 6 Apr 2024
  • Show HN: macOS GUI for running LLMs locally

    1 project | news.ycombinator.com | 18 Sep 2023
  • Ask HN: What are the capabilities of consumer grade hardware to work with LLMs?

    1 project | news.ycombinator.com | 3 Aug 2023
  • Meta to release open-source commercial AI model

    3 projects | news.ycombinator.com | 14 Jul 2023
  • How can I run a large language model locally?

    1 project | /r/learnprogramming | 11 Jul 2023