FlexGen: Running large language models on a single GPU

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • text-generation-webui

    A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

  • OK, looks like the "LLM retard guide" is "run this installer": https://github.com/oobabooga/text-generation-webui/releases/...

    That's the only thing that worked for me, but it was so easy and worked instantly.

  • FlexGen

    Running large language models on a single GPU for throughput-oriented scenarios.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • llama.cpp

    LLM inference in C/C++

  • Copy paste the parts of the install steps for LLaMA C++ into ChatGPT and ask it to explain things simply and include any prerequisite steps you might need to do. If you get stuck, just ask ChatGPT and include any error messages.

    https://github.com/ggerganov/llama.cpp

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Run 70B LLM Inference on a Single 4GB GPU with This New Technique

    3 projects | news.ycombinator.com | 3 Dec 2023
  • Colorful Custom RTX 4060 Ti GPU Clocks Outed, 8 GB VRAM Confirmed

    1 project | /r/hardware | 17 Apr 2023
  • FlexGen: Running large language models on a single GPU

    1 project | /r/hypeurls | 26 Mar 2023
  • FlexGen: Running large language models on a single GPU

    1 project | /r/patient_hackernews | 26 Mar 2023
  • FlexGen: Running large language models on a single GPU

    1 project | /r/hackernews | 26 Mar 2023