Our great sponsors
-
axodox-machinelearning
This repository contains a pure C++ ONNX implementation of multiple offline AI models, such as StableDiffusion (1.5 and XL), ControlNet, Midas, HED and OpenPose.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
Olive
Olive is an easy-to-use hardware-aware model optimization tool that composes industry-leading techniques across model compression, optimization, and compilation. (by microsoft)
-
diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
And since I am a nice guy, I have decided to create an open source library (see the link for technical details) from it, so anybody can use it - and well hopefully enhance it further so we all benefit. I release this with the MIT license, so you can take and use it as you see fit in your own projects.
I also started to build an app of my own on top of it called Unpaint (which you can download and try following the link), targeting Windows and (for now) DirectML. The app provides the basic Stable Diffusion pipelines - it can do txt2img, img2img and inpainting, it also implements some advanced prompting features (attention, scheduling) and the safety checker. It is lightweight and starts up quickly, and it is just ~2.5GB with a model, so you can easily put it on your fastest drive. Performance wise with single images is on par for me with CUDA and Automatic1111 with a 3080 Ti, but it seems to use more VRAM at higher batch counts, however this is a good start in my opinion. It also has an integrated model manager powered by Hugging Face - though for now I restricted it to avoid vandalism, however you can still convert existing models and install them offline (I will make a guide soon). And as you can see on the above images: it also has a simple but nice user interface.
I use Microsoft Olive to optimize my networks. It works rather well as I did make my inference speed like 2-3 times faster, the models smaller and VRAM usage was reduced as well, though this needs the latest nvidia drivers to happen.
Sounds interesting, ONNX runtime - what I use - can also be run with WebAssembly and on CPU, on all major GPUs, and it supports many programming languages, though C++ is its direct form.
E.g. look at this repo: ggerganov/llama.cpp: Port of Facebook's LLaMA model in C/C++ (github.com), if python would be enough, why would anybody star this?
It would be great if there was some way automated way of converting .ckpt or .safetensors models built-into the app. I was able to do so using the scripts in https://github.com/huggingface/diffusers/tree/main/scripts but it was a two step process to first extract the model and then convert to ONNX. Although maybe it's not easy to do this without actually including all the Python libs since the checkpoint is basically a Python pickle file?
Related posts
- PySheets – Spreadsheet UI for Python
- AWS Serverless Diversity: Multi-Language Strategies for Optimal Solutions
- Building LinkedIn Elevator Pitch Generator with Lyzr SDK
- Show HN: Create typed declarative API clients quickly and easily (Python)
- What are LLMs? An intro into AI, models, tokens, parameters, weights, quantization and more