[D] Is it possible to run Meta's LLaMA 65B model on consumer-grade hardware?

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • llama-int8

    Quantized inference code for LLaMA models

  • text-g

  • https://github.com/oobabooga/text-g eneration-webui/issues/147#issuecom ment-1454798725

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • llama

    Inference code for Llama models

  • FlexGen

    Running large language models on a single GPU for throughput-oriented scenarios.

  • With flexgen I believe it should be possible to run on a typical high end system. They have run a 175B parameter model with it. See here: https://github.com/FMInference/FlexGen

  • wrapyfi-examples_llama

    Inference code for facebook LLaMA models with Wrapyfi support

  • transformers

    🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

  • text-generation-webui

    A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

  • See here for full details: https://github.com/oobabooga/text-generation-webui/issues/147

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • llama-cpu

    Fork of Facebooks LLaMa model to run on CPU

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts