Neural Networks: Zero to Hero

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • micrograd

    A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API

  • I'm doing an ML apprenticeship [1] these weeks and Karpathy's videos are part of it. We've been deep down into them. I found them excellent. All concepts he illustrates are crystal clear in his mind (even though they are complicated concepts themselves) and that shows in his explanations.

    Also, the way he builds up everything is magnificent. Starting from basic python classes, to derivatives and gradient descent, to micrograd [2] and then from a bigram counting model [3] to makemore [4] and nanoGPT [5]

    [1]: https://www.foundersandcoders.com/ml

    [2]: https://github.com/karpathy/micrograd

    [3]: https://github.com/karpathy/randomfun/blob/master/lectures/m...

    [4]: https://github.com/karpathy/makemore

    [5]: https://github.com/karpathy/nanoGPT

  • randomfun

    Notebooks and various random fun

  • I'm doing an ML apprenticeship [1] these weeks and Karpathy's videos are part of it. We've been deep down into them. I found them excellent. All concepts he illustrates are crystal clear in his mind (even though they are complicated concepts themselves) and that shows in his explanations.

    Also, the way he builds up everything is magnificent. Starting from basic python classes, to derivatives and gradient descent, to micrograd [2] and then from a bigram counting model [3] to makemore [4] and nanoGPT [5]

    [1]: https://www.foundersandcoders.com/ml

    [2]: https://github.com/karpathy/micrograd

    [3]: https://github.com/karpathy/randomfun/blob/master/lectures/m...

    [4]: https://github.com/karpathy/makemore

    [5]: https://github.com/karpathy/nanoGPT

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • makemore

    An autoregressive character-level language model for making more things

  • I'm doing an ML apprenticeship [1] these weeks and Karpathy's videos are part of it. We've been deep down into them. I found them excellent. All concepts he illustrates are crystal clear in his mind (even though they are complicated concepts themselves) and that shows in his explanations.

    Also, the way he builds up everything is magnificent. Starting from basic python classes, to derivatives and gradient descent, to micrograd [2] and then from a bigram counting model [3] to makemore [4] and nanoGPT [5]

    [1]: https://www.foundersandcoders.com/ml

    [2]: https://github.com/karpathy/micrograd

    [3]: https://github.com/karpathy/randomfun/blob/master/lectures/m...

    [4]: https://github.com/karpathy/makemore

    [5]: https://github.com/karpathy/nanoGPT

  • nanoGPT

    The simplest, fastest repository for training/finetuning medium-sized GPTs.

  • I'm doing an ML apprenticeship [1] these weeks and Karpathy's videos are part of it. We've been deep down into them. I found them excellent. All concepts he illustrates are crystal clear in his mind (even though they are complicated concepts themselves) and that shows in his explanations.

    Also, the way he builds up everything is magnificent. Starting from basic python classes, to derivatives and gradient descent, to micrograd [2] and then from a bigram counting model [3] to makemore [4] and nanoGPT [5]

    [1]: https://www.foundersandcoders.com/ml

    [2]: https://github.com/karpathy/micrograd

    [3]: https://github.com/karpathy/randomfun/blob/master/lectures/m...

    [4]: https://github.com/karpathy/makemore

    [5]: https://github.com/karpathy/nanoGPT

  • hlb-gpt

    Minimalistic, extremely fast, and hackable researcher's toolbench for GPT models in 307 lines of code. Reaches <3.8 validation loss on wikitext-103 on a single A100 in <100 seconds. Scales to larger models with one parameter change (feature currently in alpha).

  • I made a smaller GPT model that started from Andrej's code that converges to a decent loss in a short amount of time on an A100 -- just under 2.5 minutes or so: https://github.com/tysam-code/hlb-gpt

    With the original hyperparameters, it was 30-60 minutes, with a pruned down network and adjusted hyperparameters, about 6 minutes, and a variety of optimizations beyond that to bring it down.

    If you want the nano-GPT basically feature-identical (but pruned down) version, 0.0.0 at ~6 minutes or so is your best bet.

    You can get A100s cheaply and securely through Colab or LambdaLabs.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts