-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
potatogpt
Pure Typescript, dependency free, ridiculously slow implementation of GPT2 for educational purposes
The capital letter on "Scratch" in the title made me think that the article was about implementing transformers on https://scratch.mit.edu/ -- which would be amazing, but it's not the case.
- There are a few common ways you might see this done, but they broadly work by assigning fixed or learned embeddings to each position in the input token sequence. These embeddings can be added to our matrix above so that the first row gets the embedding for the first position added to it, the second row gets the embedding for the second position, and so on. Now if the tokens are reordered, the embedding matrix will not be the same. Alternatively, these embeddings can be concatenated horizontally to our matrix: this guarantees the positional information is kept entirely separate from the linguistic (at the cost of having a larger combined embedding that the block must support).
I put together this repository at the end of last year to better help visualize the internals of a transformer block when applied to a toy problem: https://github.com/rstebbing/workshop/tree/main/experiments/.... It is not super long, and the point is to try and better distinguish between the quantities you referred to by seeing them (which is possible when embeddings are in a low dimension).
I hope this helps!
I wrote a minimal implementation in NumPy here (the forward pass code is only 40 lines): https://github.com/jaymody/picoGPT
Although this is for a decoder-only transformer (aka GPT) and doesnt include the encoder part.
Related posts
-
The Impact of API Response Time on Performance: What You Need to Know
-
Ask HN: Running LLMs Locally
-
GPUsGoBurr: Get up to 2x higher performance by Tuning LLM Inference Deployment
-
Show HN: Tarsier – vision for text-only LLM web agents that beats GPT-4o
-
PaliGemma: Open-Source Multimodal Model by Google