Show HN: NeuralFlow – Visualize the intermediate output of Mistral 7B

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Scout Monitoring - Free Django app performance insights with Scout Monitoring
Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
www.scoutapm.com
featured
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
  • NeuralFlow

    Visualize the intermediate output of Mistral 7B

  • A few days ago I saw a post using NeuralFlow to help explain the repetition problem.

    https://old.reddit.com/r/LocalLLaMA/comments/1ap8mxh/what_ca...

    > I’ve done some investigation into this. In a well trained model, if you plot the intermediate output for the last token in the sequence, you see the values update gradually layer to layer. In a model that produces repeating sequences I almost always see a sudden discontinuity at some specific layer. The residual connections are basically flooding the next layer with a distribution of values outside anything else in the dataset.

    > The discontinuity is pretty classic overfitting. You’ve both trained a specific token to attend primarily to itself and also incentivized that token to be sampled more often. The result is that if that token is ever included at the end of the context the model is incentivized to repeat it again.

    ...

    > Literally just plotting the output of the layer normalized between zero and one. For one token in mistral 7B it’s a 4096 dimension tensor. Because of the residual connections if you plot that graph for every layer you get a really nice visualization.

    > Edit: Here's my visualization. It’s a simple idea but I've never personally seen it done before. AFAIK this is a somewhat novel way to look at transformer layer output.

    > Initial output: https://imgur.com/sMwEFEw

    > Over-fit output: https://imgur.com/a0obyUj

    > Second edit: Code to generate the visualization: https://github.com/valine/NeuralFlow

  • Scout Monitoring

    Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

    Scout Monitoring logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Omost: A project to convert LLM's coding capability to image generation

    1 project | news.ycombinator.com | 31 May 2024
  • Take control! Run ChatGPT and Github Copilot yourself!

    3 projects | dev.to | 31 May 2024
  • The DevRel Digest May 2024: Documentation and the Developer Journey

    1 project | dev.to | 31 May 2024
  • Why your Linux kernel bug report might be fruitless

    1 project | news.ycombinator.com | 31 May 2024
  • Show HN: Play Doom in Zork

    6 projects | news.ycombinator.com | 30 May 2024