MIT 6.S191: Recurrent Neural Networks, Transformers, and Attention

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • hlb-CIFAR10

    Train CIFAR-10 in <7 seconds on an A100, the current world record.

  • Karpathy's zero to hero series is excellent, and I really recommend it.

    I also made a few repos that are geared around readability and being a good 'working code demonstration' of certain best-practices in neural networks. If you're like me and you grok code better than symbols, this could be a helpful adjunct as well if you're wanting to dig deep a bit.

    https://github.com/tysam-code/hlb-CIFAR10

  • hlb-gpt

    Minimalistic, extremely fast, and hackable researcher's toolbench for GPT models in 307 lines of code. Reaches <3.8 validation loss on wikitext-103 on a single A100 in <100 seconds. Scales to larger models with one parameter change (feature currently in alpha).

  • https://github.com/tysam-code/hlb-gpt

    Both of these implementations are pretty straightforward for what they do but CIFAR-10 has less dynamic scheduling and stuff so it might be easier to fit in your head. However, both are meant to be simple (and extremely hackable if you want to poke around and take apart some pieces/add different watchpoints to see how different pieces evolve, etc. I am partially inspired by, among many things, one of those see-through engine kits that I saw in a magazine growing up as a child that I thought was a very cool, dynamic, and hands-on way to just watch how the pieces moved in a difficult topic. Sometimes that is the best way that our brains can learn, though we are all different and learn best differently through different mediums in my experience).

    Feel free to let me know if you have any specific questions and I'll endeavor to do my best to help you here. Welcome to an interest in the field!

    I guess to briefly touch on one topic -- some people focus on the technical only first, like backprop, and though math is required heavily for more advanced research, I don't learn concepts very well through details only. Knowing that backprop is "Calculate the slope for the error in this high-dimensional space for how a neural network was wrong at a certain point, then take a tiny step towards minimizing the error. After N steps, we converge to a representation that is like a zip file of our input data within a mathematical function" is probably enough for 90-95% of the usecases you will do as a ML practitioner, if you do so. The math is cool but there are more important things to sweat over IMO, and I think messaging to the contrary raises the barrier to entry to the field and distracts from the important things, which we do not need as much. It's good to learn after you have space in your brain for it after you understand how the whole thing works together, though that is just my personal opinion after all.

    Much love and care and all that and again feel free to let me know if you have any questions please. :) <3

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts