MIT 6.S191: Recurrent Neural Networks, Transformers, and Attention

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

hlb-CIFAR10

36 1,188 3.5 Python

Train CIFAR-10 in <7 seconds on an A100, the current world record.

Karpathy's zero to hero series is excellent, and I really recommend it.
I also made a few repos that are geared around readability and being a good 'working code demonstration' of certain best-practices in neural networks. If you're like me and you grok code better than symbols, this could be a helpful adjunct as well if you're wanting to dig deep a bit.
https://github.com/tysam-code/hlb-CIFAR10

hlb-gpt

5 251 3.7 Python

Minimalistic, extremely fast, and hackable researcher's toolbench for GPT models in 307 lines of code. Reaches <3.8 validation loss on wikitext-103 on a single A100 in <100 seconds. Scales to larger models with one parameter change (feature currently in alpha).

https://github.com/tysam-code/hlb-gpt
Both of these implementations are pretty straightforward for what they do but CIFAR-10 has less dynamic scheduling and stuff so it might be easier to fit in your head. However, both are meant to be simple (and extremely hackable if you want to poke around and take apart some pieces/add different watchpoints to see how different pieces evolve, etc. I am partially inspired by, among many things, one of those see-through engine kits that I saw in a magazine growing up as a child that I thought was a very cool, dynamic, and hands-on way to just watch how the pieces moved in a difficult topic. Sometimes that is the best way that our brains can learn, though we are all different and learn best differently through different mediums in my experience).
Feel free to let me know if you have any specific questions and I'll endeavor to do my best to help you here. Welcome to an interest in the field!
I guess to briefly touch on one topic -- some people focus on the technical only first, like backprop, and though math is required heavily for more advanced research, I don't learn concepts very well through details only. Knowing that backprop is "Calculate the slope for the error in this high-dimensional space for how a neural network was wrong at a certain point, then take a tiny step towards minimizing the error. After N steps, we converge to a representation that is like a zip file of our input data within a mathematical function" is probably enough for 90-95% of the usecases you will do as a ML practitioner, if you do so. The math is cool but there are more important things to sweat over IMO, and I think messaging to the contrary raises the barrier to entry to the field and distracts from the important things, which we do not need as much. It's good to learn after you have space in your brain for it after you understand how the whole thing works together, though that is just my personal opinion after all.
Much love and care and all that and again feel free to let me know if you have any questions please. :) <3

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Train to 94% on CIFAR-10 in 3.29 seconds on a single A100

2 projects | news.ycombinator.com | 4 Apr 2024
Deep Dive into the Vision Transformers Paper (ViT)

3 projects | news.ycombinator.com | 1 Dec 2023
The Mathematics of Training LLMs

3 projects | news.ycombinator.com | 16 Aug 2023
There is no hard takeoff

2 projects | news.ycombinator.com | 11 Aug 2023
In Defense of Pure 16-Bit Floating-Point Neural Networks

2 projects | news.ycombinator.com | 23 May 2023

MIT 6.S191: Recurrent Neural Networks, Transformers, and Attention

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Machine Learning Deep Learning world-record single-GPU simple-experimentation-codebase
Post date: 2 Apr 2023

hlb-CIFAR10

hlb-gpt

InfluxDB

Related posts

Train to 94% on CIFAR-10 in 3.29 seconds on a single A100

Deep Dive into the Vision Transformers Paper (ViT)

The Mathematics of Training LLMs

There is no hard takeoff

In Defense of Pure 16-Bit Floating-Point Neural Networks

MIT 6.S191: Recurrent Neural Networks, Transformers, and Attention

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Machine Learning Deep Learning world-record single-GPU simple-experimentation-codebase Post date: 2 Apr 2023

hlb-CIFAR10

hlb-gpt

InfluxDB

Related posts

Train to 94% on CIFAR-10 in 3.29 seconds on a single A100

Deep Dive into the Vision Transformers Paper (ViT)

The Mathematics of Training LLMs

There is no hard takeoff

In Defense of Pure 16-Bit Floating-Point Neural Networks

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Machine Learning Deep Learning world-record single-GPU simple-experimentation-codebase
Post date: 2 Apr 2023