Feb 10, 2025

Tags: easydiffusion, torchruntime, sdkit

Spent the last week or two getting torchruntime fully integrated into Easy Diffusion, and making sure that it handles all the edge-cases.

Easy Diffusion now uses torchruntime to automatically install the best-possible version of torch (on the users’ computer) and support a wider variety of GPUs (as well as older GPUs). And it uses a GPU-agnostic device API, so Easy Diffusion will automatically support additional GPUs when they are supported by torchruntime.

Jan 22, 2025

Tags: rocm, pytorch, easydiffusion, torchruntime

Continued from Part 1.

Spent a few days figuring out how to compile binary wheels of PyTorch and include all the necessary libraries (ROCm libs or CUDA libs).

tl;dr - In Part 2, the compiled PyTorch wheels now include the required libraries (including ROCm). But this isn’t over yet. Torch starts now, but adding two numbers with it produces garbage values (on the GPU). There’s probably a bug in the included ROCBLAS version, might need to recompile ROCBLAS for gfx803 separately. Will tackle that in Part 3 (tbd).

Jan 17, 2025

Tags: rocm, pytorch, easydiffusion, torchruntime

Continued in Part 2, where I figured out how to include the required libraries in the wheel.

Spent all of yesterday trying to compile pytorch with the compile-time PYTORCH_ROCM_ARCH=gfx803 environment variable.

tl;dr - In Part 1, I compiled wheels for PyTorch with ROCm, in order to add support for older AMD cards like RX 480. I managed to compile the wheels, but the wheel doesn’t include the required ROCm libraries. I figured that out in Part 2.

Jan 13, 2025

Tags: easydiffusion, torchruntime, torch, ml

Spent the last few days writing torchruntime, which will automatically install the correct torch distribution based on the user’s OS and graphics card. This package was written by extracting this logic out of Easy Diffusion, and refactoring it into a cleaner implementation (with tests).

It can be installed (on Win/Linux/Mac) using pip install torchruntime.

The main intention is that it’ll be easier for developers to contribute updates (for e.g. for newer or older GPUs). It wasn’t easy to find or modify this code previously, since it was buried deep inside Easy Diffusion’s internals.