Neural Networks: Zero to Hero

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

micrograd

22 8,273 0.0 Jupyter Notebook

A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API

I'm doing an ML apprenticeship [1] these weeks and Karpathy's videos are part of it. We've been deep down into them. I found them excellent. All concepts he illustrates are crystal clear in his mind (even though they are complicated concepts themselves) and that shows in his explanations.
Also, the way he builds up everything is magnificent. Starting from basic python classes, to derivatives and gradient descent, to micrograd [2] and then from a bigram counting model [3] to makemore [4] and nanoGPT [5]
[1]: https://www.foundersandcoders.com/ml
[2]: https://github.com/karpathy/micrograd
[3]: https://github.com/karpathy/randomfun/blob/master/lectures/m...
[4]: https://github.com/karpathy/makemore
[5]: https://github.com/karpathy/nanoGPT

randomfun

3 1,045 2.2 Jupyter Notebook

Notebooks and various random fun

I'm doing an ML apprenticeship [1] these weeks and Karpathy's videos are part of it. We've been deep down into them. I found them excellent. All concepts he illustrates are crystal clear in his mind (even though they are complicated concepts themselves) and that shows in his explanations.
Also, the way he builds up everything is magnificent. Starting from basic python classes, to derivatives and gradient descent, to micrograd [2] and then from a bigram counting model [3] to makemore [4] and nanoGPT [5]
[1]: https://www.foundersandcoders.com/ml
[2]: https://github.com/karpathy/micrograd
[3]: https://github.com/karpathy/randomfun/blob/master/lectures/m...
[4]: https://github.com/karpathy/makemore
[5]: https://github.com/karpathy/nanoGPT

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
makemore

3 2,064 0.0 Python

An autoregressive character-level language model for making more things

I'm doing an ML apprenticeship [1] these weeks and Karpathy's videos are part of it. We've been deep down into them. I found them excellent. All concepts he illustrates are crystal clear in his mind (even though they are complicated concepts themselves) and that shows in his explanations.
Also, the way he builds up everything is magnificent. Starting from basic python classes, to derivatives and gradient descent, to micrograd [2] and then from a bigram counting model [3] to makemore [4] and nanoGPT [5]
[1]: https://www.foundersandcoders.com/ml
[2]: https://github.com/karpathy/micrograd
[3]: https://github.com/karpathy/randomfun/blob/master/lectures/m...
[4]: https://github.com/karpathy/makemore
[5]: https://github.com/karpathy/nanoGPT

nanoGPT

69 31,914 5.4 Python

The simplest, fastest repository for training/finetuning medium-sized GPTs.

I'm doing an ML apprenticeship [1] these weeks and Karpathy's videos are part of it. We've been deep down into them. I found them excellent. All concepts he illustrates are crystal clear in his mind (even though they are complicated concepts themselves) and that shows in his explanations.
Also, the way he builds up everything is magnificent. Starting from basic python classes, to derivatives and gradient descent, to micrograd [2] and then from a bigram counting model [3] to makemore [4] and nanoGPT [5]
[1]: https://www.foundersandcoders.com/ml
[2]: https://github.com/karpathy/micrograd
[3]: https://github.com/karpathy/randomfun/blob/master/lectures/m...
[4]: https://github.com/karpathy/makemore
[5]: https://github.com/karpathy/nanoGPT

hlb-gpt

5 249 3.7 Python

Minimalistic, extremely fast, and hackable researcher's toolbench for GPT models in 307 lines of code. Reaches <3.8 validation loss on wikitext-103 on a single A100 in <100 seconds. Scales to larger models with one parameter change (feature currently in alpha).

I made a smaller GPT model that started from Andrej's code that converges to a decent loss in a short amount of time on an A100 -- just under 2.5 minutes or so: https://github.com/tysam-code/hlb-gpt
With the original hyperparameters, it was 30-60 minutes, with a pruned down network and adjusted hyperparameters, about 6 minutes, and a variety of optimizations beyond that to bring it down.
If you want the nano-GPT basically feature-identical (but pruned down) version, 0.0.0 at ~6 minutes or so is your best bet.
You can get A100s cheaply and securely through Colab or LambdaLabs.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project