The GPT Architecture, on a Napkin

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Scout Monitoring - Free Django app performance insights with Scout Monitoring
Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
www.scoutapm.com
featured
CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai
featured
  • minGPT

    A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

    Don't know. Karpathy has a very compact implementation of GPT [0] using standard technology (could be even more compact but is reimplementing for example the attention layer for teaching purposes) and while he presumably has no access to how the real model was trained exactly, if there would be more to it I think he would know and point it out.

    [0] https://github.com/karpathy/minGPT/tree/master/mingpt

  • Scout Monitoring

    Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

    Scout Monitoring logo
  • metaseq

    Discontinued Repo for external large-scale work

    I work in this field (PhD candidate), and what you say is true for smaller models, but not GPT-3 scale models. Training large scale models involved a lot more, as the OP said. It's not just learning rate schedulers, it's a whole bunch of stuff.

    See this logbook from training the GPT-3 sized OPT model - https://github.com/facebookresearch/metaseq/blob/main/projec...

  • x-transformers

    A concise but complete full-attention transformer with a set of promising experimental features from various papers

    it is all documented here, in writing and in code https://github.com/lucidrains/x-transformers

    you will want to use rotary embeddings, if you do not need length extrapolation

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • x-transformers

    1 project | news.ycombinator.com | 31 Mar 2024
  • A single API call using almost the whole 32k context window costs around 2$.

    1 project | /r/OpenAI | 15 Mar 2023
  • GPT-4 architecture: what we can deduce from research literature

    1 project | news.ycombinator.com | 14 Mar 2023
  • You’ll be able to run chatgpt on your own device quite easily very soon

    2 projects | /r/OpenAI | 13 Mar 2023
  • Thoughts on AI image generators from text

    1 project | /r/conspiracy | 9 Aug 2022

Did you konow that Python is
the 1st most popular programming language
based on number of metions?