Now that ExLlama is out with reduced VRAM usage, are there any GPTQ models bigger than 7b which can fit onto an 8GB card?

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • exllama

    A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

  • exllama is an optimized implementation of GPTQ-for-LLaMa, allowing you to run 4-bit quantized language models with GPU at great speeds.

  • GPTQ-for-LLaMa

    4 bits quantization of LLaMA using GPTQ

  • exllama is an optimized implementation of GPTQ-for-LLaMa, allowing you to run 4-bit quantized language models with GPU at great speeds.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • OPA, Cedar, OpenFGA: Why are Policy Languages Trending Right Now?

    1 project | dev.to | 2 May 2024
  • Ask HN: Why not more fuse filesystems?

    1 project | news.ycombinator.com | 2 May 2024
  • OneTwo: LangChain Replacement from Google DeepMind

    1 project | news.ycombinator.com | 2 May 2024
  • LangFun: Object oriented data programs using LLMs

    1 project | news.ycombinator.com | 2 May 2024
  • A smooth and sharp image interpolation you probably haven't heard of

    2 projects | news.ycombinator.com | 2 May 2024