[P] rwkv.cpp: FP16 & INT4 inference on CPU for RWKV language model

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • rwkv.cpp

    INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model

  • cformers

    SoTA Transformers with C-backend for fast inference on your CPU.

  • it's a combination of things, and removing python from the loop isn't essential to achieving most of these performance gains. the main trick is quantizing the weights and compiling the model. concrete example that builds on top of ggml with python APIs: https://github.com/NolanoOrg/cformers

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • RWKV-LM

    RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

  • Author of RWKV shows that an X billion model is comparable to an X billion GPT model: https://github.com/BlinkDL/RWKV-LM/blob/main/RWKV-eval2.png

  • whisper.cpp

    Port of OpenAI's Whisper model in C/C++

  • llama.cpp

    LLM inference in C/C++

  • llm

    An ecosystem of Rust libraries for working with large language models

  • alpaca.cpp

    Discontinued Locally run an Instruction-Tuned Chat-Style LLM

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • gpt4all.cpp

    Locally run an Assistant-Tuned Chat-Style LLM

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Does openai whisper works on termux ?

    2 projects | /r/termux | 26 May 2023
  • Show HN: Ermine.ai – Record and transcribe speech, 100% client-side (WASM)

    5 projects | news.ycombinator.com | 4 Apr 2023
  • Library for running deep learning models in the browser

    2 projects | /r/webdev | 17 Feb 2023
  • Show HN: I created automatic subtitling app to boost short videos

    1 project | news.ycombinator.com | 9 Apr 2024
  • Prompt Engineering Guide

    1 project | news.ycombinator.com | 30 Mar 2024