[D] Why are decoder only models used for autoregressive generation instead of encoder-only models? What value is the causal mask if the new token doesn't exist yet?

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

Scout Monitoring - Free Django app performance insights with Scout Monitoring
Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
www.scoutapm.com
featured
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
  • streaming-llm

    [ICLR 2024] Efficient Streaming Language Models with Attention Sinks

  • That is a great question. I wish I had a mathematical explanation for it, but I can only provide some intuitive "yeah but then again"... Fwiw, there was a paper recently that indeed showed that the first few tokens of any sequence, starting with the special '[START]' token does hold special information (they call it the Attention Sink) compared to all other tokens. Here is a link to that paper:https://arxiv.org/abs/2309.17453

  • Scout Monitoring

    Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

    Scout Monitoring logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • GlueCannon: Simplify VPN Container Orchestration and Deployment with Gluetun

    1 project | news.ycombinator.com | 6 Jun 2024
  • 10 Open Source Tools for Building MLOps Pipelines

    9 projects | dev.to | 6 Jun 2024
  • Crowdfunding app built using just Flask, Jinja and SQLite

    1 project | news.ycombinator.com | 6 Jun 2024
  • GLM-4-9B: open-source model with superior performance to Llama-3-8B

    2 projects | news.ycombinator.com | 5 Jun 2024
  • Show HN: Synthesize TikZ Graphics Programs for Scientific Figures and Sketches

    1 project | news.ycombinator.com | 6 Jun 2024