4096 Context length (and beyond)

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • alpaca_lora_4bit

  • I made a fork of alpaca_lora_4bit that contains the whole project plus some notes. There really aren't any changes from the main repo besides a small hack to read plaintext training data and to modify the configured sequence length beyond the default 2048, and then this horribly messy attention patch which awkwardly bodges a pre-allocated K/V cache scheme into the HF Llama implementation.

  • exllama

    A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

  • The README.md has some details about what I did and how it went, but it ends on a list of next steps that I've yet to get to because I want to work some more on this other project first. The reason being that the Transformers library is just too limiting to work with. It's very poorly suited for these kinds of experiments. You end up patching functionality in and out, instantiating models in weird and hacky ways only to overwrite their weights afterwards, shuffling layers around, wondering where all your VRAM went, etc. I hope to be able to use this new project as a better platform for experimenting with LoRAs, among other things, and then I'll get back to the long-range adapter. I still haven't concluded that it can't work, just that it takes more than ten hours of training on an A100, and I pay for that by the hour so I want to make it count. ;)

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • rwkv.cpp

    INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model

  • There's https://github.com/saharNooby/rwkv.cpp which seems to work, and might be compatible with text-generation-webui.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Eagle 7B: Soaring past Transformers

    2 projects | news.ycombinator.com | 28 Jan 2024
  • [R] RWKV: Reinventing RNNs for the Transformer Era

    1 project | /r/MachineLearning | 23 May 2023
  • rwkv.cpp: FP16 & INT4 inference on CPU for RWKV language model (r/MachineLearning)

    1 project | /r/datascienceproject | 2 Apr 2023
  • FLaNK AI - 01 April 2024

    31 projects | dev.to | 1 Apr 2024
  • Half-Quadratic Quantization of Large Machine Learning Models

    1 project | news.ycombinator.com | 14 Mar 2024