NATTEN
DOKSparse
NATTEN | DOKSparse | |
---|---|---|
1 | 2 | |
304 | 2 | |
8.6% | - | |
7.7 | 4.2 | |
11 days ago | 11 months ago | |
Cuda | Cuda | |
GNU General Public License v3.0 or later | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
NATTEN
-
Direct Pixel-Space Megapixel Image Generation with Diffusion Models
this arch is of course nice for high-resolution synthesis, but there's some other cool stuff worth mentioning..
activations are small! so you can enjoy bigger batch sizes. this is due to the 4x patching we do on the ingress to the model, and the effectiveness of neighbourhood attention in joining patches at the seams.
the model's inductive biases are pretty different than (for example) a convolutional UNet's. the innermost levels seem to train easily, so images can have good global coherence early in training.
there's no convolutions! so you don't need to worry about artifacts stemming from convolution padding, or having canvas edge padding artifacts leak an implicit position bias.
we can finally see what high-resolution diffusion outputs look like _without_ latents! personally I think current latent VAEs don't _really_ achieve the high resolutions they claim (otherwise fine details like text would survive a VAE roundtrip faithfully); it's common to see latent diffusion outputs with smudgy skin or blurry fur. what I'd like to see in the future of latent diffusion is to listen to the Emu paper and use more channels, or a less ambitious upsample.
it's a transformer! so we can try applying to it everything we know about transformers, like sigma reparameterisation or multimodality. some tricks like masked training will require extra support in [NATTEN](https://github.com/SHI-Labs/NATTEN), but we're very happy with its featureset and performance so far.
but honestly I'm most excited about the efficiency. there's too little work on making pretraining possible at GPU-poor scale. so I was very happy to see HDiT could succeed at small-scale tasks within the resources I had at home (you can get nice oxford flowers samples at 256x256px with half an hour on a 4090). I think with models that are better fits for the problem, perhaps we can get good results with smaller models. and I'd like to see big tech go that direction too!
-Alex Birch
DOKSparse
- GDlog: A GPU-Accelerated Deductive Engine
-
tensor.to_sparse() Memory Allocation
If using sparse tensors is a must, you can look into DOK sparse format, which is supported for 2d matrices in scipy. it kinda allows you to access any element of the sparse tensor in constant time, which makes it possible to create your tensor directly in sparse format, skipping the need to create a dense numpy array first. In case you need a GPU version of this, I have a library that implements sparse dok tensor in pytorch and cuda. currently it's GPU only.
What are some alternatives?
cub - [ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl
MegBA - MegBA: A GPU-Based Distributed Library for Large-Scale Bundle Adjustment
CUDA-Guide - CUDA Guide
cuhnsw - CUDA implementation of Hierarchical Navigable Small World Graph algorithm
TorchPQ - Approximate nearest neighbor search with product quantization on GPU in pytorch and cuda
instant-ngp - Instant neural graphics primitives: lightning fast NeRF and more
cccl - CUDA C++ Core Libraries
warpcore - A Library for fast Hash Tables on GPUs
gdlog
FuXi - Chimezie Ogbuji's FuXi reasoner. NON-FUNCTIONING, RETAINED FOR ARCHIVAL PURPOSES. For working code plus version and associated support requirements see:
cugraph - cuGraph - RAPIDS Graph Analytics Library