Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
In the aws-sagemaker-stable-diffusion repo you will find everything needed in order to spin-up your own personal public endpoint with a Stable Diffusion model deployed using AWS SageMaker to show your friends 😎
For python, it's recommended to use pyenv, which allows you to install several versions of python at the same time with simple commands like this: pyenv install 3.9.13
For my local computer, I used nvidia-docker to allow myself to run the code inside a container. This way I don't have to worry about installing the right version of CUDA (relative to pytorch) on my machine.
With that, we should have the what we need in the bucket. Now, let's create the SageMaker endpoint. To manage the endpoint, as stated before, we use the SageMaker toolkit for huggingface. As such, we have three python scripts that serve as functions to create, use and delete the endpoint.
However, since we will use diffusion type of model, and not transformers, the default inference code will need to be overwritten to use the diffusers package (also from huggingface), just like we do in the local/code.py
In order to overwrite it, the package readme has some general information about it, and also there is an example in this jupyter notebook. We are doing what is necessary via the files inside sagemaker/code, which has the inference code following SageMaker requirements, and a requirements.txt, that has the necessary dependencies that will be installed when the endpoint gets created
To be able to clone the repo with the ML model (stable-diffusion-v1-4/), we will also need git-lfs (that allows versioning of large files). The model is defined as a submodule of this repo and after attempting to clone the submodule, you will be requested to provide your huggingface account credentials in order to clone it. This it to confirm that you have accepted the license required to have access to the model weights.
Related posts
- Plex setup through Docker + Nvidia card, but hardware acceleration stops working after some time
- Seeking Guidance on Leveraging Local Models and Optimizing GPU Utilization in containerized packages
- Which GPU for HW transcoding in PMS: Intel Arc or Nvidia?
- [D] Would a Tesla M40 provide cheap inference acceleration for self-hosted LLMs?
- Help! Accelerated-GPU with Cuda and CuPy