Do large language models need all those layers?

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • gptq

    Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

  • I think it's not that LLMs have redundant layers in general - it's a specific problem with OPT-66B, not anything else.

    An 2022 paper "Scaling Language Models: Methods, Analysis & Insights from Training Gopher" (http://arxiv.org/abs/2112.11446) has captured it well on page 103, Appendix G:

    > The general finding is that whilst compressing models for a particular application has seen success, it is difficult to compress them for the objective of language modelling over a diverse corpus.

    The appendix G explores various techniques like pruning and distillation but found that neither method was an efficient way to obtain better loss at lower number of parameters.

    So why does pruning work for OPT-66B in particular? I'm not sure but there are evidence that OPT-66B is an outlier: one evidence is in the GPTQ paper ("GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers", https://arxiv.org/abs/2210.17323) that mentions in its footnote on its 7th page:

    > [2] Upon closer inspection of the OPT-66B model, it appears that this is correlated with the fact that this trained

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • MicroPython in WASM

    1 project | news.ycombinator.com | 14 May 2024
  • Dagger.io : La nouvelle ère du CI/CD dans le monde DevOps

    3 projects | dev.to | 14 May 2024
  • Building a Parenting Assistant using Lyzr SDK

    1 project | dev.to | 14 May 2024
  • Show HN: EmuBert – the first open encoder model for Australian law

    1 project | news.ycombinator.com | 14 May 2024
  • GPT-4o

    9 projects | news.ycombinator.com | 13 May 2024