State-of-the-art open-source chatbot, Vicuna-13B, just released model weights

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

FastChat

83 33,877 9.6 Python

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Yes, you need to convert the original LLaMA model to the huggingface format, according to https://github.com/lm-sys/FastChat#vicuna-weights and https://huggingface.co/docs/transformers/main/model_doc/llam...

ai-guide

4 112 10.0

Discontinued A guide for getting started with FOSS text generation.

Hi! Funnily enough I couldn't find much on it either, so that's exactly what I've been working on myself for the past few months: just in case this kind of question got asked.
I've recently opened a GitHub repository which includes information for both AI model series[0] and frontends you can use to run them[1]. I've also wrote a Reddit post that's messier, but a lot more technical[2].
I try to keep them as up-to-date as possible, but I might've missed something or my info may not be completely accurate. It's mostly to help get people's feet wet.
[0] - https://github.com/Crataco/ai-guide/blob/main/guide/models.m...

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
llama-tools

1 11 2.4 Python

Tools for the LLaMA language model

> Unfortunately there's a mismatch between the model generated by the delta patcher and the tokenizer (32001 vs 32000 tokens). There's a tool to fix this at llama-tools (https://github.com/Ronsor/llama-tools). Add 1 token like (C controltoken), and then run the conversion script.

rllama

7 519 6.2 Rust

Rust+OpenCL+AVX2 implementation of LLaMA inference code

No, my project is called rllama. No relation to GGML. https://github.com/Noeda/rllama

dalai

59 13,051 6.5 CSS

The simplest way to run LLaMA on your local machine

-> https://old.reddit.com/user/Crataco/comments/zuowi9/opensour...
https://github.com/cocktailpeanut/dalai
the 4-bit quantized version of LLaMA 13B runs on my laptop without a dedicated GPU and I guess the same would apply to quantized vicuna 13B but I haven't tried that yet (converted as in this link but for 13B instead of 7B https://github.com/ggerganov/llama.cpp#usage )
GPT4All Lora's also works, perhaps the most compelling results I've got yet in my local computer - I have to try quantized Vicuna to see how that one goes, but processing the files to get a 4bit quantized version will take many hours so I'm a bit hesitant
PS: converting 13B Llama took my laptop's i7 around 20 hours and required a large swap file on top of its 16GB of RAM
feel free to answer back if you're trying any of these things this week (later I might lose track)

llama.cpp

773 56,891 10.0 C++

LLM inference in C/C++

-> https://old.reddit.com/user/Crataco/comments/zuowi9/opensour...
https://github.com/cocktailpeanut/dalai
the 4-bit quantized version of LLaMA 13B runs on my laptop without a dedicated GPU and I guess the same would apply to quantized vicuna 13B but I haven't tried that yet (converted as in this link but for 13B instead of 7B https://github.com/ggerganov/llama.cpp#usage )
GPT4All Lora's also works, perhaps the most compelling results I've got yet in my local computer - I have to try quantized Vicuna to see how that one goes, but processing the files to get a 4bit quantized version will take many hours so I'm a bit hesitant
PS: converting 13B Llama took my laptop's i7 around 20 hours and required a large swap file on top of its 16GB of RAM
feel free to answer back if you're trying any of these things this week (later I might lose track)

text-generation-webui

876 36,293 9.9 Python

A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

The default loader doesn't seem to let you load quantized models but if you use something like https://github.com/oobabooga/text-generation-webui you can 1) use the model with `--load-in-8bit` which halves the memory (runs on my 24GB consumer card w/o an issue then, probably would fit on a 16GB card). There are also 4-bit quantized models and you can run probably `anon8231489123/vicuna-13b-GPTQ-4bit-128g --model_type LLaMA --wbits 4 --groupsize 128` although there have been reports that bitsandbytes have problems w/ 4bit perf on some cards: https://github.com/TimDettmers/bitsandbytes/issues/181

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
bitsandbytes

61 5,447 9.4 Python

Accessible large language models via k-bit quantization for PyTorch.

The default loader doesn't seem to let you load quantized models but if you use something like https://github.com/oobabooga/text-generation-webui you can 1) use the model with `--load-in-8bit` which halves the memory (runs on my 24GB consumer card w/o an issue then, probably would fit on a 16GB card). There are also 4-bit quantized models and you can run probably `anon8231489123/vicuna-13b-GPTQ-4bit-128g --model_type LLaMA --wbits 4 --groupsize 128` although there have been reports that bitsandbytes have problems w/ 4bit perf on some cards: https://github.com/TimDettmers/bitsandbytes/issues/181

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

More Agents Is All You Need: LLMs performance scales with the number of agents

2 projects | news.ycombinator.com | 6 Apr 2024
Show HN: macOS GUI for running LLMs locally

1 project | news.ycombinator.com | 18 Sep 2023
Ask HN: What are the capabilities of consumer grade hardware to work with LLMs?

1 project | news.ycombinator.com | 3 Aug 2023
Meta to release open-source commercial AI model

3 projects | news.ycombinator.com | 14 Jul 2023
How can I run a large language model locally?

1 project | /r/learnprogramming | 11 Jul 2023

State-of-the-art open-source chatbot, Vicuna-13B, just released model weights

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
AI llama llm
Post date: 3 Apr 2023

FastChat

ai-guide

InfluxDB

llama-tools

rllama

dalai

llama.cpp

text-generation-webui

SaaSHub

bitsandbytes

Related posts

More Agents Is All You Need: LLMs performance scales with the number of agents

Show HN: macOS GUI for running LLMs locally

Ask HN: What are the capabilities of consumer grade hardware to work with LLMs?

Meta to release open-source commercial AI model

How can I run a large language model locally?

State-of-the-art open-source chatbot, Vicuna-13B, just released model weights

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com AI llama llm Post date: 3 Apr 2023

Related posts

More Agents Is All You Need: LLMs performance scales with the number of agents

Show HN: macOS GUI for running LLMs locally

Ask HN: What are the capabilities of consumer grade hardware to work with LLMs?

Meta to release open-source commercial AI model

How can I run a large language model locally?

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
AI llama llm
Post date: 3 Apr 2023