[P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • llama.cpp

    LLM inference in C/C++

  • I'm running it using https://github.com/ggerganov/llama.cpp. The 4-bit version of 13b runs ok without GPU acceleration.

  • yal-discord-bot

    Yet Another LLaMA/ALPACA Discord Bot

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • text-generation-webui

    A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

  • My question seemed to have been answered here, and it is a VRAM limitation. Also, that last link seems to support 4-bit models as well. Doesn't seem too bad to set up.... Though I installed A1111 when it first came out, so I learned through the garbage of that. Lol.

  • transformers

    🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

  • GPTQ-for-LLaMa

    4 bits quantization of LLaMA using GPTQ

  • pifs

    πfs - the data-free filesystem!

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Ask HN: Self-hosted/open-source ChatGPT alternative? Like Stable Diffusion

    4 projects | news.ycombinator.com | 12 Dec 2022
  • Lossless Acceleration of LLM via Adaptive N-Gram Parallel Decoding

    3 projects | news.ycombinator.com | 21 Apr 2024
  • AI enthusiasm #6 - Finetune any LLM you want💡

    2 projects | dev.to | 16 Apr 2024
  • Zephyr 141B, a Mixtral 8x22B fine-tune, is now available in Hugging Chat

    3 projects | news.ycombinator.com | 12 Apr 2024
  • Schedule-Free Learning – A New Way to Train

    3 projects | news.ycombinator.com | 6 Apr 2024