StreamingLLM: Efficient streaming technique enable infinite sequence lengths

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • streaming-llm

    [ICLR 2024] Efficient Streaming Language Models with Attention Sinks

  • CTranslate2

    Fast inference engine for Transformer models

  • Etc.

    Now, what this allows you to do is reuse the attention computed from the previous turns (since the prefix is the same).

    In practice, people often have a system prompt before the conversation history, which (as far a I can tell) makes this technique not applicable (the input prefix will change as soon as the conversation history is long enough that we need to start dropping the oldest turns).

    In such case, what you could do is to cache at least the system prompt. This is also possible with https://github.com/OpenNMT/CTranslate2/blob/2203ad5c8baf878a...

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Explore large language models on any computer with 512MB of RAM

    4 projects | /r/LocalLLaMA | 17 Jun 2023
  • CTranslate2: An efficient inference engine for Transformer models

    1 project | news.ycombinator.com | 21 May 2023
  • [D] Faster Flan-T5 inference

    1 project | /r/MachineLearning | 22 Feb 2023
  • [P] CTranslate2: an efficient inference engine for Transformer models

    1 project | /r/MachineLearning | 23 May 2022
  • GDlog: A GPU-Accelerated Deductive Engine

    16 projects | news.ycombinator.com | 3 Dec 2023