The Mathematics of Training LLMs

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

ollama

198 62,615 9.9 Go

Get up and running with Llama 3, Mistral, Gemma, and other large language models.

Thanks for sharing this. How do you think the local LLM movement will evolve? Especially as in the post, you mentioned startups and VCs both hoarding GPUs to attract talent.
There seems to be a good demand behind tools like llama.cpp or ollama (https://github.com/jmorganca/ollama) to run models locally.
Maybe as the local runners become more efficient, we'll start seeing more trainings for smaller models or fine-tuning done locally? I too am still trying to wrap my head around this.

hlb-CIFAR10

36 1,187 3.5 Python

Train CIFAR-10 in <7 seconds on an A100, the current world record.

Sure. Basically everything in https://github.com/tysam-code/hlb-CIFAR10 was directly founded on the concepts in the paper, down to the coding, commenting, and layout styles (hence why I advocate so strongly for it as a requirement for ML. The empirical benefits are clear to me).
Before I sat down and wrote my first line, I spent a very long time thinking about how to optimize the repo. Not just in terms of information flow during training, but how the code was laid out (minimize the expected value of deltas for changes from a superset of possible code changes), and comments (ratio of space vs mental effort to decode the repo for experienced vs inexperienced developers).
It's not perfect, but I've used info theory as a strong guiding light for that repo. There's more to say here, but it's a long conversation about the expected utility of doing research a few different kinds of ways.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
segment-anything

56 44,158 0.0 Jupyter Notebook

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Yeah, they are great and some of the reason (up the causal chain) for some of the work I've done! Seems really fun! <3 :))))
Facebook's Segment Anything Model I think has a lot of potentially really fun usecases. Plaintext description -> Network segmentation (https://github.com/facebookresearch/segment-anything/blob/ma...) Not sure if that's what you're looking for or not, but I love that impressing your kids is where your heart is. That kind of parenting makes me very, very, very, happy. :') <3

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Train to 94% on CIFAR-10 in 3.29 seconds on a single A100

2 projects | news.ycombinator.com | 4 Apr 2024
Deep Dive into the Vision Transformers Paper (ViT)

3 projects | news.ycombinator.com | 1 Dec 2023
There is no hard takeoff

2 projects | news.ycombinator.com | 11 Aug 2023
In Defense of Pure 16-Bit Floating-Point Neural Networks

2 projects | news.ycombinator.com | 23 May 2023
Neural Network Architecture Beyond Width and Depth

1 project | news.ycombinator.com | 21 May 2023

The Mathematics of Training LLMs

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Machine Learning Deep Learning world-record single-GPU simple-experimentation-codebase
Post date: 16 Aug 2023

ollama

hlb-CIFAR10

InfluxDB

segment-anything

Related posts

Train to 94% on CIFAR-10 in 3.29 seconds on a single A100

Deep Dive into the Vision Transformers Paper (ViT)

There is no hard takeoff

In Defense of Pure 16-Bit Floating-Point Neural Networks

Neural Network Architecture Beyond Width and Depth

The Mathematics of Training LLMs

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Machine Learning Deep Learning world-record single-GPU simple-experimentation-codebase Post date: 16 Aug 2023

ollama

hlb-CIFAR10

InfluxDB

segment-anything

Related posts

Train to 94% on CIFAR-10 in 3.29 seconds on a single A100

Deep Dive into the Vision Transformers Paper (ViT)

There is no hard takeoff

In Defense of Pure 16-Bit Floating-Point Neural Networks

Neural Network Architecture Beyond Width and Depth

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Machine Learning Deep Learning world-record single-GPU simple-experimentation-codebase
Post date: 16 Aug 2023