Direct Pixel-Space Megapixel Image Generation with Diffusion Models

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

NATTEN

1 276 7.6 Cuda

Neighborhood Attention Extension. Bringing attention to a neighborhood near you!

this arch is of course nice for high-resolution synthesis, but there's some other cool stuff worth mentioning..
activations are small! so you can enjoy bigger batch sizes. this is due to the 4x patching we do on the ingress to the model, and the effectiveness of neighbourhood attention in joining patches at the seams.
the model's inductive biases are pretty different than (for example) a convolutional UNet's. the innermost levels seem to train easily, so images can have good global coherence early in training.
there's no convolutions! so you don't need to worry about artifacts stemming from convolution padding, or having canvas edge padding artifacts leak an implicit position bias.
we can finally see what high-resolution diffusion outputs look like _without_ latents! personally I think current latent VAEs don't _really_ achieve the high resolutions they claim (otherwise fine details like text would survive a VAE roundtrip faithfully); it's common to see latent diffusion outputs with smudgy skin or blurry fur. what I'd like to see in the future of latent diffusion is to listen to the Emu paper and use more channels, or a less ambitious upsample.
it's a transformer! so we can try applying to it everything we know about transformers, like sigma reparameterisation or multimodality. some tricks like masked training will require extra support in [NATTEN](https://github.com/SHI-Labs/NATTEN), but we're very happy with its featureset and performance so far.
but honestly I'm most excited about the efficiency. there's too little work on making pretraining possible at GPU-poor scale. so I was very happy to see HDiT could succeed at small-scale tasks within the resources I had at home (you can get nice oxford flowers samples at 256x256px with half an hour on a 4090). I think with models that are better fits for the problem, perhaps we can get good results with smaller models. and I'd like to see big tech go that direction too!
-Alex Birch

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Ask HN: Yo Nephew, in E. Africa, wants to train an LLM with on disk Wikipedia
1 project | news.ycombinator.com | 24 Apr 2024
Show HN: One Billion Rows in CUDA
1 project | news.ycombinator.com | 13 Apr 2024
The Simple Beauty of XOR Floating Point Compression
1 project | news.ycombinator.com | 11 Apr 2024
Show HN: Faster sorting with register shuffling in CUDA
1 project | news.ycombinator.com | 15 Mar 2024
Raft: Fundamental widely-used algorithms and primitives for machine learning
1 project | news.ycombinator.com | 22 Feb 2024

Direct Pixel-Space Megapixel Image Generation with Diffusion Models

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Post date: 23 Jan 2024

NATTEN

WorkOS

Related posts