Explore large language models on any computer with 512MB of RAM

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

languagemodels

9 1,161 9.4 Python

Explore large language models in 512MB of RAM
LaMini-LM

9 801 7.8

LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
CTranslate2

14 2,825 8.9 C++

Fast inference engine for Transformer models

FLAN-T5 models generally perform well for their size, but they are encode-decoder models, and they aren't as widely supported for efficient inference. I wanted students to be able to run everything locally on CPU, so I was ideally hoping for something that supported quantization for CPU inference. I explored llama.cpp and GGML, but ultimately landed on ctranslate2 for inference.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

StreamingLLM: Efficient streaming technique enable infinite sequence lengths

2 projects | news.ycombinator.com | 3 Oct 2023
CTranslate2: An efficient inference engine for Transformer models

1 project | news.ycombinator.com | 21 May 2023
[D] Faster Flan-T5 inference

1 project | /r/MachineLearning | 22 Feb 2023
[P] CTranslate2: an efficient inference engine for Transformer models

1 project | /r/MachineLearning | 23 May 2022
GDlog: A GPU-Accelerated Deductive Engine

16 projects | news.ycombinator.com | 3 Dec 2023

Explore large language models on any computer with 512MB of RAM

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA
neural-machine-translation CPP Mkl quantization Cuda
Post date: 17 Jun 2023

languagemodels

LaMini-LM

InfluxDB

CTranslate2

Related posts

StreamingLLM: Efficient streaming technique enable infinite sequence lengths

CTranslate2: An efficient inference engine for Transformer models

[D] Faster Flan-T5 inference

[P] CTranslate2: an efficient inference engine for Transformer models

GDlog: A GPU-Accelerated Deductive Engine

Explore large language models on any computer with 512MB of RAM

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA neural-machine-translation CPP Mkl quantization Cuda Post date: 17 Jun 2023

languagemodels

LaMini-LM

InfluxDB

CTranslate2

Related posts

StreamingLLM: Efficient streaming technique enable infinite sequence lengths

CTranslate2: An efficient inference engine for Transformer models

[D] Faster Flan-T5 inference

[P] CTranslate2: an efficient inference engine for Transformer models

GDlog: A GPU-Accelerated Deductive Engine

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA
neural-machine-translation CPP Mkl quantization Cuda
Post date: 17 Jun 2023