4096 Context length (and beyond)

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

alpaca_lora_4bit

1 8 7.5 Python

I made a fork of alpaca_lora_4bit that contains the whole project plus some notes. There really aren't any changes from the main repo besides a small hack to read plaintext training data and to modify the configured sequence length beyond the default 2048, and then this horribly messy attention patch which awkwardly bodges a pre-allocated K/V cache scheme into the HF Llama implementation.

exllama

64 2,594 9.0 Python

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

The README.md has some details about what I did and how it went, but it ends on a list of next steps that I've yet to get to because I want to work some more on this other project first. The reason being that the Transformers library is just too limiting to work with. It's very poorly suited for these kinds of experiments. You end up patching functionality in and out, instantiating models in weird and hacky ways only to overwrite their weights afterwards, shuffling layers around, wondering where all your VRAM went, etc. I hope to be able to use this new project as a better platform for experimenting with LoRAs, among other things, and then I'll get back to the long-range adapter. I still haven't concluded that it can't work, just that it takes more than ten hours of training on an A100, and I pay for that by the hour so I want to make it count. ;)

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
rwkv.cpp

12 1,100 6.8 C++

INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model

There's https://github.com/saharNooby/rwkv.cpp which seems to work, and might be compatible with text-generation-webui.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Eagle 7B: Soaring past Transformers

2 projects | news.ycombinator.com | 28 Jan 2024
[R] RWKV: Reinventing RNNs for the Transformer Era

1 project | /r/MachineLearning | 23 May 2023
rwkv.cpp: FP16 & INT4 inference on CPU for RWKV language model (r/MachineLearning)

1 project | /r/datascienceproject | 2 Apr 2023
FLaNK AI - 01 April 2024

31 projects | dev.to | 1 Apr 2024
Half-Quadratic Quantization of Large Machine Learning Models

1 project | news.ycombinator.com | 14 Mar 2024

4096 Context length (and beyond)

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA
Deep Learning language-model llm Machine Learning quantization
Post date: 11 May 2023

alpaca_lora_4bit

exllama

InfluxDB

rwkv.cpp

Related posts

Eagle 7B: Soaring past Transformers

[R] RWKV: Reinventing RNNs for the Transformer Era

rwkv.cpp: FP16 & INT4 inference on CPU for RWKV language model (r/MachineLearning)

FLaNK AI - 01 April 2024

Half-Quadratic Quantization of Large Machine Learning Models

4096 Context length (and beyond)

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA Deep Learning language-model llm Machine Learning quantization Post date: 11 May 2023

alpaca_lora_4bit

exllama

InfluxDB

rwkv.cpp

Related posts

Eagle 7B: Soaring past Transformers

[R] RWKV: Reinventing RNNs for the Transformer Era

rwkv.cpp: FP16 &amp; INT4 inference on CPU for RWKV language model (r/MachineLearning)

FLaNK AI - 01 April 2024

Half-Quadratic Quantization of Large Machine Learning Models

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA
Deep Learning language-model llm Machine Learning quantization
Post date: 11 May 2023

rwkv.cpp: FP16 & INT4 inference on CPU for RWKV language model (r/MachineLearning)