39.7 it/s with a 4090 on Linux!

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

SHARK

84 1,381 9.6 Python

SHARK - High Performance Machine Learning Distribution
stable-diffusion-webui

2,808 129,299 9.9 Python

Stable Diffusion web UI

python: 3.10.6 • torch: 1.13.1+cu117 • xformers: 0.0.16+814314d.d20230119 • commit: 54674674 • checkpoint: 61a37adf76 i get 18.79it/s .. with all shebangs installed ... triton, deepspeed, tensorrt .. did not tested with torch 2.0

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
DeepSpeed

51 32,550 9.8 Python

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

I tried installing PyTorch 2.0.0, with triton from here microsoft/DeepSpeed#2694, compiling my own xformers and it made my inference even slower. From 17-18it/s 512x512, Batch size: 1, any sampling method to around 16-17it/s but especially with Batch size: 8, from 5.65it/s to 4.66it/s.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project