Deep Dive into the Vision Transformers Paper (ViT)

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

Compact-Transformers

4 448 1.1 Python

Escaping the Big Data Paradigm with Compact Transformers, 2021 (Train your Vision Transformers in 30 mins on CIFAR-10 with a single GPU!)

Logged into my personal account for this one! I'm a lead author on a paper that explored exactly. It does enable faster training and smaller model sizes. For reference, you can get 80% accuracy on CIFAR-10 in ~30 minutes of CPU (not using crazy optimizations). There are open questions about scaling but at the time we did not have access to big compute (really still don't) and our goals were focused on addressing the original ViT's claims of data constraints and necessities of pretraining for smaller datasets (spoiler, augmentation + overlapping patches plays a huge role). Basically we wanted to make a network that allowed people to train transformers from scratch for their data projects because pretrained models aren't always the best solutions or practical.
Paper: https://arxiv.org/abs/2104.05704
Blog: https://medium.com/pytorch/training-compact-transformers-fro...
CPU compute: https://twitter.com/WaltonStevenj/status/1382045610283397120
Crazy optimizations (no affiliation): 94% on CIFAR-10 in <6.3 seconds on a single A100 : https://github.com/tysam-code/hlb-CIFAR10
I also want to give maybe some better information about ViTs in general. Lucas Beyer is a good source and has some lectures as well as Hila Chefer and Sayak Paul's tutorials.
Lucas Beyer: https://twitter.com/giffmana/status/1570152923233144832
Chefer & Paul's All Things ViT: https://all-things-vits.github.io/atv/

hlb-CIFAR10

36 1,187 3.5 Python

Train CIFAR-10 in <7 seconds on an A100, the current world record.

Logged into my personal account for this one! I'm a lead author on a paper that explored exactly. It does enable faster training and smaller model sizes. For reference, you can get 80% accuracy on CIFAR-10 in ~30 minutes of CPU (not using crazy optimizations). There are open questions about scaling but at the time we did not have access to big compute (really still don't) and our goals were focused on addressing the original ViT's claims of data constraints and necessities of pretraining for smaller datasets (spoiler, augmentation + overlapping patches plays a huge role). Basically we wanted to make a network that allowed people to train transformers from scratch for their data projects because pretrained models aren't always the best solutions or practical.
Paper: https://arxiv.org/abs/2104.05704
Blog: https://medium.com/pytorch/training-compact-transformers-fro...
CPU compute: https://twitter.com/WaltonStevenj/status/1382045610283397120
Crazy optimizations (no affiliation): 94% on CIFAR-10 in <6.3 seconds on a single A100 : https://github.com/tysam-code/hlb-CIFAR10
I also want to give maybe some better information about ViTs in general. Lucas Beyer is a good source and has some lectures as well as Hila Chefer and Sayak Paul's tutorials.
Lucas Beyer: https://twitter.com/giffmana/status/1570152923233144832
Chefer & Paul's All Things ViT: https://all-things-vits.github.io/atv/

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Train to 94% on CIFAR-10 in 3.29 seconds on a single A100

2 projects | news.ycombinator.com | 4 Apr 2024
The Mathematics of Training LLMs

3 projects | news.ycombinator.com | 16 Aug 2023
There is no hard takeoff

2 projects | news.ycombinator.com | 11 Aug 2023
In Defense of Pure 16-Bit Floating-Point Neural Networks

2 projects | news.ycombinator.com | 23 May 2023
Neural Network Architecture Beyond Width and Depth

1 project | news.ycombinator.com | 21 May 2023

Deep Dive into the Vision Transformers Paper (ViT)

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Machine Learning Deep Learning world-record single-GPU simple-experimentation-codebase
Post date: 1 Dec 2023

Compact-Transformers

hlb-CIFAR10

InfluxDB

Related posts

Train to 94% on CIFAR-10 in 3.29 seconds on a single A100

The Mathematics of Training LLMs

There is no hard takeoff

In Defense of Pure 16-Bit Floating-Point Neural Networks

Neural Network Architecture Beyond Width and Depth

Deep Dive into the Vision Transformers Paper (ViT)

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Machine Learning Deep Learning world-record single-GPU simple-experimentation-codebase Post date: 1 Dec 2023

Compact-Transformers

hlb-CIFAR10

InfluxDB

Related posts

Train to 94% on CIFAR-10 in 3.29 seconds on a single A100

The Mathematics of Training LLMs

There is no hard takeoff

In Defense of Pure 16-Bit Floating-Point Neural Networks

Neural Network Architecture Beyond Width and Depth

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Machine Learning Deep Learning world-record single-GPU simple-experimentation-codebase
Post date: 1 Dec 2023