WizardLM-13B-Uncensored

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

text-generation-webui

876 36,827 9.9 Python

A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

Thank you for bringing that to my attention ! I can't (without starving to death) spend more than around 100 until i can afford another real computer. I guess i'll poke around and check anyway this part about "docker". However i'll need to poke around since : https://github.com/oobabooga/text-generation-webui Mention that i should be using " TORCH_CUDA_ARCH_LIST" Based on my gpu and i have no knowledge what is the replacement for my poor's man GPU intel graphic.

mlc-llm

89 17,150 9.9 Python

Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.

I run vicuna-7b in browser on my MacBook Pro M1 via https://github.com/mlc-ai/mlc-llm

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
koboldcpp

180 3,887 10.0 C++

A simple one-file way to run various GGML and GGUF models with KoboldAI's UI

As far as I know, you only need a single ggml .bin file for CPU inference. I use koboldcpp and it's just drag&drop .bin on top of .exe to make it work.

llama.cpp

778 57,984 10.0 C++

LLM inference in C/C++

Of course, the bigger the model, the longer it takes. 7B q5_1 generations take about 400-450 ms/Token, 13B q5_1 about 700-800 ms/T. Thanks to a flood of optimizations, things have been improving steadily, and stuff like Proof of concept: GPU-accelerated token generation will soon provide another much needed and welcome boost.

WizardVicunaLM

12 708 6.8

LLM that combines the principles of wizardLM and vicunaLM
ggml

69 9,802 9.8 C

Tensor library for machine learning

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Ai on a android phone?

2 projects | /r/LocalLLaMA | 8 Dec 2023
MLC vs llama.cpp

2 projects | /r/LocalLLaMA | 7 Nov 2023
[Project] Scaling LLama2 70B with Multi NVIDIA and AMD GPUs under 3k budget

1 project | /r/LocalLLaMA | 21 Oct 2023
Scaling LLama2-70B with Multi Nvidia/AMD GPU

2 projects | news.ycombinator.com | 19 Oct 2023
ROCm Is AMD's #1 Priority, Executive Says

5 projects | news.ycombinator.com | 26 Sep 2023

WizardLM-13B-Uncensored

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA
llm machine-learning-compilation language-model tvm
Post date: 9 May 2023

text-generation-webui

mlc-llm

InfluxDB

koboldcpp

llama.cpp

WizardVicunaLM

ggml

Related posts

Ai on a android phone?

MLC vs llama.cpp

[Project] Scaling LLama2 70B with Multi NVIDIA and AMD GPUs under 3k budget

Scaling LLama2-70B with Multi Nvidia/AMD GPU

ROCm Is AMD's #1 Priority, Executive Says

WizardLM-13B-Uncensored

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA llm machine-learning-compilation language-model tvm Post date: 9 May 2023

text-generation-webui

mlc-llm

InfluxDB

koboldcpp

llama.cpp

WizardVicunaLM

ggml

Related posts

Ai on a android phone?

MLC vs llama.cpp

[Project] Scaling LLama2 70B with Multi NVIDIA and AMD GPUs under 3k budget

Scaling LLama2-70B with Multi Nvidia/AMD GPU

ROCm Is AMD's #1 Priority, Executive Says

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA
llm machine-learning-compilation language-model tvm
Post date: 9 May 2023