39.7 it/s with a 4090 on Linux!

This page summarizes the projects mentioned and recommended in the original post on /r/StableDiffusion

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • SHARK

    SHARK - High Performance Machine Learning Distribution

  • stable-diffusion-webui

    Stable Diffusion web UI

  • python: 3.10.6  •  torch: 1.13.1+cu117  •  xformers: 0.0.16+814314d.d20230119  •  commit: 54674674  •  checkpoint: 61a37adf76 i get 18.79it/s .. with all shebangs installed ... triton, deepspeed, tensorrt .. did not tested with torch 2.0

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • DeepSpeed

    DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

  • I tried installing PyTorch 2.0.0, with triton from here microsoft/DeepSpeed#2694, compiling my own xformers and it made my inference even slower. From 17-18it/s 512x512, Batch size: 1, any sampling method to around 16-17it/s but especially with Batch size: 8, from 5.65it/s to 4.66it/s.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts