Llama.cpp 30B runs with only 6GB of RAM now

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

llama.cpp

777 57,984 10.0 C++

LLM inference in C/C++

Some results here: https://github.com/ggerganov/llama.cpp/discussions/406
tl;dr quantizing the 13B model gives up about 30% of the improvement you get from moving from 7B to 13B - so quantized 13B is still much better than unquantized 7B. Similar results for the larger models.

GPTQ-for-LLaMa

75 2,924 8.6 Python

4 bits quantization of LLaMA using GPTQ

I wonder where such difference between llama.cpp and [1] repo comes from. F16 difference in perplexity is .3 on 7B model, which is not insignificant. ggml quirks are definitely need to be fixed.
[1] https://github.com/qwopqwop200/GPTQ-for-LLaMa

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
text-generation-webui

876 36,827 9.9 Python

A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

Define "comprehensive?"
There are some benchmarks here: https://www.reddit.com/r/LocalLLaMA/comments/1248183/i_am_cu... and here: https://nolanoorg.substack.com/p/int-4-llama-is-not-enough-i...
Check out the original paper on quantization, which has some benchmarks: https://arxiv.org/pdf/2210.17323.pdf and this paper, which also has benchmarks and explains how they determined that 4-bit quantization is optimal compared to 3-bit: https://arxiv.org/pdf/2212.09720.pdf
I also think the discussion of that second paper here is interesting, though it doesn't have its own benchmarks: https://github.com/oobabooga/text-generation-webui/issues/17...

chatai

3 31 6.9 Python

Official Repo for Chat AI
sparsellama

1 39 1.8 Python

I'd assume that 33B model should fit with this(only repo that I know of that implements SparseGPT and GPTQ for LLaMa) https://github.com/lachlansneff/sparsellama

llm

41 5,931 9.4 Rust

An ecosystem of Rust libraries for working with large language models

I'm curious if someone will have to port these enhancements elsewhere, ie: https://github.com/rustformers/llama-rs

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

LangChain Go

2 projects | dev.to | 11 May 2024
SB-1047 will stifle open-source AI and decrease safety

2 projects | news.ycombinator.com | 29 Apr 2024
Show HN: LlamaGym – fine-tune LLM agents with online reinforcement learning

2 projects | news.ycombinator.com | 10 Mar 2024
Show HN: Geppetto, an open source AI companion for your Slack teams

3 projects | news.ycombinator.com | 6 Feb 2024
Rabbit R1, Designed by Teenage Engineering

4 projects | news.ycombinator.com | 9 Jan 2024

Llama.cpp 30B runs with only 6GB of RAM now

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
AI Artificial intelligence Chatbot Python Python3
Post date: 31 Mar 2023

llama.cpp

GPTQ-for-LLaMa

InfluxDB

text-generation-webui

chatai

sparsellama

llm

Related posts

LangChain Go

SB-1047 will stifle open-source AI and decrease safety

Show HN: LlamaGym – fine-tune LLM agents with online reinforcement learning

Show HN: Geppetto, an open source AI companion for your Slack teams

Rabbit R1, Designed by Teenage Engineering

Llama.cpp 30B runs with only 6GB of RAM now

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com AI Artificial intelligence Chatbot Python Python3 Post date: 31 Mar 2023

llama.cpp

GPTQ-for-LLaMa

InfluxDB

text-generation-webui

chatai

sparsellama

llm

Related posts

LangChain Go

SB-1047 will stifle open-source AI and decrease safety

Show HN: LlamaGym – fine-tune LLM agents with online reinforcement learning

Show HN: Geppetto, an open source AI companion for your Slack teams

Rabbit R1, Designed by Teenage Engineering

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
AI Artificial intelligence Chatbot Python Python3
Post date: 31 Mar 2023