Stable Diffusion PR optimizes VRAM, generate 576x1280 images with 6 GB VRAM

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

stable-diffusion

186 3,148 0.0 Jupyter Notebook

Optimized Stable Diffusion modified to run on lower GPU VRAM (by basujindal)

Impossible to pinpoint what changed thanks to thousands of lines of completely irrelevant changes and shitty commit messages. It seems the only changeset that might be relevant out of +2,273 −1,531 is the +11 -7 from https://github.com/basujindal/stable-diffusion/pull/103/comm...? Does it even work?

stable-diffusion

383 65,624 0.0 Jupyter Notebook

A latent text-to-image diffusion model

In case anyone is confused by the clashing repos, here is how I was able to easily run this updated code.
Clone the original SD repo, which is what this code was built off of, and follow all the installation instructions:
https://github.com/CompVis/stable-diffusion
In that repo, replace the file ldm/modules/attention.py with this file:
https://raw.githubusercontent.com/neonsecret/stable-diffusio...
Now run a new prompt with a larger image. Note that the original model was trained on 512x512 and may lead to repetition especially if you try to increase both dimensions (this is mentioned in the SD readme) so just run with one dimension increased.
For example try the following example:
python scripts/txt2img.py --prompt "a person gardening, by claude monet" --ddim_steps 50 --seed 12000 --scale 9 --n_iter=1 --n_samples=1 --H=512 --W=1024 --skip_grid
I confirmed that if I run that command with the original attention.py, it fails due to lack of memory. With the new attention.py, it succeeds.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
diffusers

266 22,763 9.9 Python

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

I wouldn’t recommend using that as-is. MPS doesn’t give deterministic random number generation, which means that seeds become meaningless and you won’t ever be able to reproduce something. You can work around it by generating random numbers on the CPU and then moving them to MPS, but that probably requires a fix in PyTorch.
The MPS support issue for diffusers is here:
https://github.com/huggingface/diffusers/issues/292
…and it links to the relevant PyTorch issue here:
https://github.com/pytorch/pytorch/issues/84288

ROCm-docker

3 392 5.1 Shell

Dockerfiles for the various software layers defined in the ROCm software platform

Not sure about the 6600, but there is a guide for Linux at least:
https://m.youtube.com/watch?v=d_CgaHyA_n4&feature=emb_logo
And this is somehow relevant (possibly), as I kept the link open.
https://github.com/RadeonOpenCompute/ROCm-docker/issues/38

Pytorch

341 78,205 10.0 Python

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Yeah, but with the CompVis derived repos, it’s pretty easy to go in and change all the calls to PyTorch random number generators.
Having said that, the last comment [0] on the PyTorch issue gave me the idea of monkey patching the random functions. The supplied code assumes you’re always passing in a generator, which is not true in this case, but if you monkey patch the three rand/randn/randn_like functions to do nothing but swap out the device parameter for 'cpu' and then call to('ops') on the return value, it’s enough to get stable seed functionality for the CompVis derived repos without modifying their code, so I’m guessing it will probably work for diffusers as well.
Also, it’s probably a bug in the CompVis code, but even after you fix the random number generator, the very first run in a session uses an incorrect seed. The workaround is to generate an image once to throw away whenever you start a new session.
[0] https://github.com/pytorch/pytorch/issues/84288#issuecomment...

stable-diffusion

111 1,749 10.0 Jupyter Notebook
stable-diffusion

20 338 0.0 Jupyter Notebook

Go to lstein/stable-diffusion for all the best stuff and a stable release. This repository is my testing ground and it's very likely that I've done something that will break it. (by magnusviri)

https://github.com/magnusviri/stable-diffusion/commit/d0b168...
Copying this change fixed seeds on M1 for me.

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

PyTorch 1.8 adds AMD ROCm support

3 projects | /r/Amd | 5 Mar 2021
PyTorch 2.3: User-Defined Triton Kernels, Tensor Parallelism in Distributed

1 project | news.ycombinator.com | 10 May 2024
Clasificador de imágenes con una red neuronal convolucional (CNN)

2 projects | dev.to | 1 May 2024
penzai: JAX research toolkit for building, editing, and visualizing neural nets

4 projects | news.ycombinator.com | 21 Apr 2024
Tinygrad: Hacked 4090 driver to enable P2P

5 projects | news.ycombinator.com | 12 Apr 2024

Stable Diffusion PR optimizes VRAM, generate 576x1280 images with 6 GB VRAM

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Deep Learning Rocm neural-network Docker Autograd
Post date: 3 Sep 2022

stable-diffusion

stable-diffusion

InfluxDB

diffusers

ROCm-docker

Pytorch

stable-diffusion

stable-diffusion

SaaSHub

Related posts

PyTorch 1.8 adds AMD ROCm support

PyTorch 2.3: User-Defined Triton Kernels, Tensor Parallelism in Distributed

Clasificador de imágenes con una red neuronal convolucional (CNN)

penzai: JAX research toolkit for building, editing, and visualizing neural nets

Tinygrad: Hacked 4090 driver to enable P2P

Stable Diffusion PR optimizes VRAM, generate 576x1280 images with 6 GB VRAM

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Deep Learning Rocm neural-network Docker Autograd Post date: 3 Sep 2022

Related posts

PyTorch 1.8 adds AMD ROCm support

PyTorch 2.3: User-Defined Triton Kernels, Tensor Parallelism in Distributed

Clasificador de imágenes con una red neuronal convolucional (CNN)

penzai: JAX research toolkit for building, editing, and visualizing neural nets

Tinygrad: Hacked 4090 driver to enable P2P

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Deep Learning Rocm neural-network Docker Autograd
Post date: 3 Sep 2022