Llama.cpp 30B runs with only 6GB of RAM now

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • llama.cpp

    LLM inference in C/C++

  • Some results here: https://github.com/ggerganov/llama.cpp/discussions/406

    tl;dr quantizing the 13B model gives up about 30% of the improvement you get from moving from 7B to 13B - so quantized 13B is still much better than unquantized 7B. Similar results for the larger models.

  • GPTQ-for-LLaMa

    4 bits quantization of LLaMA using GPTQ

  • I wonder where such difference between llama.cpp and [1] repo comes from. F16 difference in perplexity is .3 on 7B model, which is not insignificant. ggml quirks are definitely need to be fixed.

    [1] https://github.com/qwopqwop200/GPTQ-for-LLaMa

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • text-generation-webui

    A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

  • Define "comprehensive?"

    There are some benchmarks here: https://www.reddit.com/r/LocalLLaMA/comments/1248183/i_am_cu... and here: https://nolanoorg.substack.com/p/int-4-llama-is-not-enough-i...

    Check out the original paper on quantization, which has some benchmarks: https://arxiv.org/pdf/2210.17323.pdf and this paper, which also has benchmarks and explains how they determined that 4-bit quantization is optimal compared to 3-bit: https://arxiv.org/pdf/2212.09720.pdf

    I also think the discussion of that second paper here is interesting, though it doesn't have its own benchmarks: https://github.com/oobabooga/text-generation-webui/issues/17...

  • chatai

    Official Repo for Chat AI

  • sparsellama

  • I'd assume that 33B model should fit with this(only repo that I know of that implements SparseGPT and GPTQ for LLaMa) https://github.com/lachlansneff/sparsellama

  • llm

    An ecosystem of Rust libraries for working with large language models

  • I'm curious if someone will have to port these enhancements elsewhere, ie: https://github.com/rustformers/llama-rs

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • LangChain Go

    2 projects | dev.to | 11 May 2024
  • SB-1047 will stifle open-source AI and decrease safety

    2 projects | news.ycombinator.com | 29 Apr 2024
  • Show HN: LlamaGym – fine-tune LLM agents with online reinforcement learning

    2 projects | news.ycombinator.com | 10 Mar 2024
  • Show HN: Geppetto, an open source AI companion for your Slack teams

    3 projects | news.ycombinator.com | 6 Feb 2024
  • Rabbit R1, Designed by Teenage Engineering

    4 projects | news.ycombinator.com | 9 Jan 2024