[P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

llama.cpp

773 56,891 10.0 C++

LLM inference in C/C++

I'm running it using https://github.com/ggerganov/llama.cpp. The 4-bit version of 13b runs ok without GPU acceleration.

yal-discord-bot

5 72 6.5 Python

Yet Another LLaMA/ALPACA Discord Bot
InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
text-generation-webui

876 36,293 9.9 Python

A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

My question seemed to have been answered here, and it is a VRAM limitation. Also, that last link seems to support 4-bit models as well. Doesn't seem too bad to set up.... Though I installed A1111 when it first came out, so I learned through the garbage of that. Lol.

transformers

176 125,369 10.0 Python

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
GPTQ-for-LLaMa

1 - -
GPTQ-for-LLaMa

75 2,916 8.6 Python

4 bits quantization of LLaMA using GPTQ
pifs

65 6,555 0.0 C

πfs - the data-free filesystem!
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Ask HN: Self-hosted/open-source ChatGPT alternative? Like Stable Diffusion

4 projects | news.ycombinator.com | 12 Dec 2022
Lossless Acceleration of LLM via Adaptive N-Gram Parallel Decoding

3 projects | news.ycombinator.com | 21 Apr 2024
AI enthusiasm #6 - Finetune any LLM you want💡

2 projects | dev.to | 16 Apr 2024
Zephyr 141B, a Mixtral 8x22B fine-tune, is now available in Hugging Chat

3 projects | news.ycombinator.com | 12 Apr 2024
Schedule-Free Learning – A New Way to Train

3 projects | news.ycombinator.com | 6 Apr 2024

[P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning
NLP Chatbot Natural Language Processing chatgpt Pytorch
Post date: 12 Mar 2023

llama.cpp

yal-discord-bot

InfluxDB

text-generation-webui

transformers

GPTQ-for-LLaMa

GPTQ-for-LLaMa

pifs

SaaSHub

Related posts

Ask HN: Self-hosted/open-source ChatGPT alternative? Like Stable Diffusion

Lossless Acceleration of LLM via Adaptive N-Gram Parallel Decoding

AI enthusiasm #6 - Finetune any LLM you want💡

Zephyr 141B, a Mixtral 8x22B fine-tune, is now available in Hugging Chat

Schedule-Free Learning – A New Way to Train

[P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in &lt;9 GiB VRAM

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning NLP Chatbot Natural Language Processing chatgpt Pytorch Post date: 12 Mar 2023

Related posts

Ask HN: Self-hosted/open-source ChatGPT alternative? Like Stable Diffusion

Lossless Acceleration of LLM via Adaptive N-Gram Parallel Decoding

AI enthusiasm #6 - Finetune any LLM you want💡

Zephyr 141B, a Mixtral 8x22B fine-tune, is now available in Hugging Chat

Schedule-Free Learning – A New Way to Train

[P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning
NLP Chatbot Natural Language Processing chatgpt Pytorch
Post date: 12 Mar 2023