Show HN: NeuralFlow – Visualize the intermediate output of Mistral 7B

Scout Monitoring - Free Django app performance insights with Scout Monitoring

Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

www.scoutapm.com

featured

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

NeuralFlow

4 267 7.2 Python

Visualize the intermediate output of Mistral 7B

A few days ago I saw a post using NeuralFlow to help explain the repetition problem.
https://old.reddit.com/r/LocalLLaMA/comments/1ap8mxh/what_ca...
> I’ve done some investigation into this. In a well trained model, if you plot the intermediate output for the last token in the sequence, you see the values update gradually layer to layer. In a model that produces repeating sequences I almost always see a sudden discontinuity at some specific layer. The residual connections are basically flooding the next layer with a distribution of values outside anything else in the dataset.
> The discontinuity is pretty classic overfitting. You’ve both trained a specific token to attend primarily to itself and also incentivized that token to be sampled more often. The result is that if that token is ever included at the end of the context the model is incentivized to repeat it again.
...
> Literally just plotting the output of the layer normalized between zero and one. For one token in mistral 7B it’s a 4096 dimension tensor. Because of the residual connections if you plot that graph for every layer you get a really nice visualization.
> Edit: Here's my visualization. It’s a simple idea but I've never personally seen it done before. AFAIK this is a somewhat novel way to look at transformer layer output.
> Initial output: https://imgur.com/sMwEFEw
> Over-fit output: https://imgur.com/a0obyUj
> Second edit: Code to generate the visualization: https://github.com/valine/NeuralFlow

Scout Monitoring

www.scoutapm.com featured

Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Omost: A project to convert LLM's coding capability to image generation

1 project | news.ycombinator.com | 31 May 2024
Take control! Run ChatGPT and Github Copilot yourself!

3 projects | dev.to | 31 May 2024
The DevRel Digest May 2024: Documentation and the Developer Journey

1 project | dev.to | 31 May 2024
Why your Linux kernel bug report might be fruitless

1 project | news.ycombinator.com | 31 May 2024
Show HN: Play Doom in Zork

6 projects | news.ycombinator.com | 30 May 2024

Show HN: NeuralFlow – Visualize the intermediate output of Mistral 7B

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Post date: 14 Feb 2024

NeuralFlow

Scout Monitoring

Related posts

Omost: A project to convert LLM's coding capability to image generation

Take control! Run ChatGPT and Github Copilot yourself!

The DevRel Digest May 2024: Documentation and the Developer Journey

Why your Linux kernel bug report might be fruitless

Show HN: Play Doom in Zork