Explore large language models on any computer with 512MB of RAM

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • languagemodels

    Explore large language models in 512MB of RAM

  • LaMini-LM

    LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • CTranslate2

    Fast inference engine for Transformer models

  • FLAN-T5 models generally perform well for their size, but they are encode-decoder models, and they aren't as widely supported for efficient inference. I wanted students to be able to run everything locally on CPU, so I was ideally hoping for something that supported quantization for CPU inference. I explored llama.cpp and GGML, but ultimately landed on ctranslate2 for inference.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • StreamingLLM: Efficient streaming technique enable infinite sequence lengths

    2 projects | news.ycombinator.com | 3 Oct 2023
  • CTranslate2: An efficient inference engine for Transformer models

    1 project | news.ycombinator.com | 21 May 2023
  • [D] Faster Flan-T5 inference

    1 project | /r/MachineLearning | 22 Feb 2023
  • [P] CTranslate2: an efficient inference engine for Transformer models

    1 project | /r/MachineLearning | 23 May 2022
  • GDlog: A GPU-Accelerated Deductive Engine

    16 projects | news.ycombinator.com | 3 Dec 2023