The Secret Sauce behind 100K context window in LLMs: all tricks in one place

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • RWKV-LM

    RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

  • I've been pondering the same thing, as simply extending the context window in a straightforward manner would lead to a significant increase in computational resources. I've had the opportunity to experiment with Anthropics' 100k model, and it's evident that they're employing some clever techniques to make it work, albeit with some imperfections. One interesting observation is that their prompt guide recommends placing instructions after the reference text when inputting lengthy text bodies. I noticed that the model often disregarded the instructions if placed beforehand. It's clear that the model doesn't allocate the same level of "attention" to all parts of the input across the entire context window.

    Moreover, the inability to cache transformers makes the use of large context windows quite costly, as all previous messages must be sent with each call. In this context, the RWKV-LM project on GitHub (https://github.com/BlinkDL/RWKV-LM) might offer a solution. They claim to achieve performance comparable to transformers using an RNN, which could potentially handle a 100-page document and cache it, thereby eliminating the need to process the entire document with each subsequent query. However, I suspect RWKV might fall short in handling complex tasks that require maintaining multiple variables in memory, such as mathematical computations, but it should suffice for many scenarios.

    On a related note, I believe Anthropics' Claude is somewhat underappreciated. In some instances, it outperforms GPT4, and I'd rank it somewhere between GPT4 and Bard overall.

  • long-range-arena

    Long Range Arena for Benchmarking Efficient Transformers

  • https://github.com/google-research/long-range-arena

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • DDG does have their own index, but also use Bing and many other sources. See the CEO's numerous comments to that effect: https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu..., https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts