[P] Introducing hlb-gpt: A rapid prototyping toolbench in <350 lines of code to speed up your LLM research exploration

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

hlb-CIFAR10

36 1,189 3.5 Python

Train CIFAR-10 in <7 seconds on an A100, the current world record.

It's release day again and today we're releasing a new repository: hlb-gpt. It's based on nanoGPT, but smaller with an aggressively-trimmed feature set. In this initial release, the training performs almost exactly the same as Andrej's library, but a tiny bit faster and a tiny bit more accurate due to using PyTorch-native operators. We keep the complexity down by targeting tiny, rapid experiments on a single GPU only. The baseline network we're releasing gets <3.8 validation loss in just over 6 minutes. Having a rapidly training network offers a variety of benefits -- this is something that helped a lot when working on hlb-cifar10. Cycle times are king in research, and we rarely need giant models to get enough of a loss signal when prototyping/experimenting with a method.

hlb-gpt

5 253 3.7 Python

Minimalistic, extremely fast, and hackable researcher's toolbench for GPT models in 307 lines of code. Reaches <3.8 validation loss on wikitext-103 on a single A100 in <100 seconds. Scales to larger models with one parameter change (feature currently in alpha).

You can find the code for hlb-gpt here: https://github.com/tysam-code/hlb-gpt

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Train to 94% on CIFAR-10 in 3.29 seconds on a single A100

2 projects | news.ycombinator.com | 4 Apr 2024
Deep Dive into the Vision Transformers Paper (ViT)

3 projects | news.ycombinator.com | 1 Dec 2023
The Mathematics of Training LLMs

3 projects | news.ycombinator.com | 16 Aug 2023
There is no hard takeoff

2 projects | news.ycombinator.com | 11 Aug 2023
In Defense of Pure 16-Bit Floating-Point Neural Networks

2 projects | news.ycombinator.com | 23 May 2023

[P] Introducing hlb-gpt: A rapid prototyping toolbench in <350 lines of code to speed up your LLM research exploration

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning
Machine Learning Deep Learning world-record single-GPU simple-experimentation-codebase
Post date: 5 Mar 2023

hlb-CIFAR10

hlb-gpt

InfluxDB

Related posts

Train to 94% on CIFAR-10 in 3.29 seconds on a single A100

Deep Dive into the Vision Transformers Paper (ViT)

The Mathematics of Training LLMs

There is no hard takeoff

In Defense of Pure 16-Bit Floating-Point Neural Networks

[P] Introducing hlb-gpt: A rapid prototyping toolbench in &lt;350 lines of code to speed up your LLM research exploration

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning Machine Learning Deep Learning world-record single-GPU simple-experimentation-codebase Post date: 5 Mar 2023

hlb-CIFAR10

hlb-gpt

InfluxDB

Related posts

Train to 94% on CIFAR-10 in 3.29 seconds on a single A100

Deep Dive into the Vision Transformers Paper (ViT)

The Mathematics of Training LLMs

There is no hard takeoff

In Defense of Pure 16-Bit Floating-Point Neural Networks

[P] Introducing hlb-gpt: A rapid prototyping toolbench in <350 lines of code to speed up your LLM research exploration

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning
Machine Learning Deep Learning world-record single-GPU simple-experimentation-codebase
Post date: 5 Mar 2023