The Mathematics of Training LLMs

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • ollama

    Get up and running with Llama 3, Mistral, Gemma, and other large language models.

  • Thanks for sharing this. How do you think the local LLM movement will evolve? Especially as in the post, you mentioned startups and VCs both hoarding GPUs to attract talent.

    There seems to be a good demand behind tools like llama.cpp or ollama (https://github.com/jmorganca/ollama) to run models locally.

    Maybe as the local runners become more efficient, we'll start seeing more trainings for smaller models or fine-tuning done locally? I too am still trying to wrap my head around this.

  • hlb-CIFAR10

    Train CIFAR-10 in <7 seconds on an A100, the current world record.

  • Sure. Basically everything in https://github.com/tysam-code/hlb-CIFAR10 was directly founded on the concepts in the paper, down to the coding, commenting, and layout styles (hence why I advocate so strongly for it as a requirement for ML. The empirical benefits are clear to me).

    Before I sat down and wrote my first line, I spent a very long time thinking about how to optimize the repo. Not just in terms of information flow during training, but how the code was laid out (minimize the expected value of deltas for changes from a superset of possible code changes), and comments (ratio of space vs mental effort to decode the repo for experienced vs inexperienced developers).

    It's not perfect, but I've used info theory as a strong guiding light for that repo. There's more to say here, but it's a long conversation about the expected utility of doing research a few different kinds of ways.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • segment-anything

    The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

  • Yeah, they are great and some of the reason (up the causal chain) for some of the work I've done! Seems really fun! <3 :))))

    Facebook's Segment Anything Model I think has a lot of potentially really fun usecases. Plaintext description -> Network segmentation (https://github.com/facebookresearch/segment-anything/blob/ma...) Not sure if that's what you're looking for or not, but I love that impressing your kids is where your heart is. That kind of parenting makes me very, very, very, happy. :') <3

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts