Stable Diffusion 2.0 Release

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • stablediffusion

    High-Resolution Image Synthesis with Latent Diffusion Models

  • Did you miss the first part of the article?

    > It is our pleasure to announce the open-source release of Stable Diffusion Version 2.[0]

    > The original Stable Diffusion V1 led by CompVis changed the nature of open source AI models and spawned hundreds of other models and innovations all over the world. It had one of the fastest climbs to 10K Github stars of any software, rocketing through 33K stars in less than two months.

    [0] https://github.com/Stability-AI/stablediffusion

  • Dreambooth-Stable-Diffusion

    Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) with Stable Diffusion

  • To put things in perspective, the dataset it's trained on is ~240TB and Stability has over ~4000 thousand Nvidia A100 (which is much much faster than a 1080ti). Without those ingredients, you're not gonna get a model that's worth using (it'll produce mostly useless outputs).

    That argument also makes little sense when you consider than the model is a couple gigabytes itself, it can't memorize 240TB of data, so it "learned".

    But if you want to create custom versions of SD, you can always try out dreambooth: https://github.com/XavierXiao/Dreambooth-Stable-Diffusion, that one is actually feasible without spending millions of dollars on GPUs.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • open_clip

    An open source implementation of CLIP.

  • > Writing a training loop for CLIP manually wound up with me banging against all sorts of strange roadblocks and missing bits of documentation, and I still don't have it working.

    There is working training code for openCLIP https://github.com/mlfoundations/open_clip

    But training multi-modal text-to-image models is still a _very_ new thing, in terms of the software world. Given that, my experience has been that it's never been easier to get to work on this stuff from the software POV. The hardware is the tricky bit (and preventing bandwidth issues on distributed systems).

  • Real-ESRGAN

    Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.

  • You're probably better of using Real-ESRGAN: https://github.com/xinntao/Real-ESRGAN. It's pretty solid and fast. The upscaler that comes with this might work for you, but I suspect it'll probably do a better job at upscaling stable diffusion output rather than a natural image (might be wrong though).

  • xformers

    Hackable and optimized Transformers building blocks, supporting a composable construction.

  • Awesome. I'm installing on Ubuntu 22.04 right now.

    Ran into a few errors with the default instructions related to CUDA version mismatches with my nvidia driver. Now I'm trying without conda at all. Made a venv. I upgraded to the latest that Ubuntu provides and then downloaded and installed the appropriate CUDA from [1].

    That got me farther. Then ran into the fact that the xformers binaries I had in my earlier attempts is now incompatible with my current drivers and CUDA, so rebuiding that one. I'm in the 30-minute compile, but did the `pip install ninja` as recommended by [2] and it's running on a few of my 32 threads now. Ope! Done in 5 mins. Test info from `python -m xformers.info` looks good.

    Damn still hitting CUDA out of memory issues. I knew I should have bought a bigger GPU back in 2017.

    `torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 30.00 MiB (GPU 0; 5.93 GiB total capacity; 5.62 GiB already allocated; 15.44 MiB free; 5.67 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF`

    Guess I better go read those docs... to be continued.

    [1] https://developer.nvidia.com/cuda-downloads?target_os=Linux&...

    [2] https://github.com/facebookresearch/xformers

  • lucide

    Beautiful & consistent icon toolkit made by the community. Open-source project and a fork of Feather Icons.

  • Train the model with https://lucide.dev/ and ask it to generate a few more?

  • Txt2Vectorgraphics

    Custom Script for Automatics1111 StableDiffusion-WebUI.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • upscayl

    🆙 Upscayl - Free and Open Source AI Image Upscaler for Linux, MacOS and Windows built with Linux-First philosophy.

  • If you have a Nvidia GPU, I've been using Upscayl locally and for free with decent results: https://github.com/upscayl/upscayl

    Note that on some image types it tends to make things look digitally painted rather than detailed. I recommend you try a few different tools and see what works best for the type of photography you do.

  • stable-diffusion-webui

    Stable Diffusion web UI

  • SHARK

    SHARK - High Performance Machine Learning Distribution

  • Try SHARK on your AMD GPUs for SD. Follow the setup here: https://github.com/nod-ai/SHARK/tree/main/shark/examples/sha....

    It works with Pytorch -> torch-mlir -> MLIR / IREE -> vulkan. Works on both Windows and Linux. And has a simple gradio web UI https://github.com/nod-ai/SHARK/tree/main/web but we plan to enable better UI integrations very soon.

    Join us on discord https://discord.gg/RUqY2h2s9u if you have any trouble. Appreciate any / all feedback.

  • InvokeAI

    InvokeAI is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry leading WebUI, supports terminal use through a CLI, and serves as the foundation for multiple commercial products.

  • I'm not comparing with the others because I don't have experience with them, but https://invoke-ai.github.io/InvokeAI/ is great, with an easy install and active development.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts