Transformers from Scratch

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • scratch-www

    Standalone web client for Scratch

  • The capital letter on "Scratch" in the title made me think that the article was about implementing transformers on https://scratch.mit.edu/ -- which would be amazing, but it's not the case.

  • workshop

  • - There are a few common ways you might see this done, but they broadly work by assigning fixed or learned embeddings to each position in the input token sequence. These embeddings can be added to our matrix above so that the first row gets the embedding for the first position added to it, the second row gets the embedding for the second position, and so on. Now if the tokens are reordered, the embedding matrix will not be the same. Alternatively, these embeddings can be concatenated horizontally to our matrix: this guarantees the positional information is kept entirely separate from the linguistic (at the cost of having a larger combined embedding that the block must support).

    I put together this repository at the end of last year to better help visualize the internals of a transformer block when applied to a toy problem: https://github.com/rstebbing/workshop/tree/main/experiments/.... It is not super long, and the point is to try and better distinguish between the quantities you referred to by seeing them (which is possible when embeddings are in a low dimension).

    I hope this helps!

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • picoGPT

    An unnecessarily tiny implementation of GPT-2 in NumPy.

  • I wrote a minimal implementation in NumPy here (the forward pass code is only 40 lines): https://github.com/jaymody/picoGPT

    Although this is for a decoder-only transformer (aka GPT) and doesnt include the encoder part.

  • potatogpt

    Pure Typescript, dependency free, ridiculously slow implementation of GPT2 for educational purposes

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • The Impact of API Response Time on Performance: What You Need to Know

    2 projects | dev.to | 16 May 2024
  • Ask HN: Running LLMs Locally

    2 projects | news.ycombinator.com | 15 May 2024
  • GPUsGoBurr: Get up to 2x higher performance by Tuning LLM Inference Deployment

    1 project | news.ycombinator.com | 15 May 2024
  • Show HN: Tarsier – vision for text-only LLM web agents that beats GPT-4o

    8 projects | news.ycombinator.com | 15 May 2024
  • PaliGemma: Open-Source Multimodal Model by Google

    5 projects | news.ycombinator.com | 15 May 2024