Oct 16, 2024
Tags: stable-diffusion, c++, cuda, easydiffusion, lab, performance, featured
tl;dr - Today, I worked on using stable-diffusion.cpp in a simple C++ program. As a linked library, as well as compiling sd.cpp from scratch (with and without CUDA). The intent was to get a tiny and fast-starting executable UI for Stable Diffusion working. Also, ChatGPT is very helpful!
Part 1: Using sd.cpp as a library
First, I tried calling the stable-diffusion.cpp library from a simple C++ program (which just loads the model and renders an image). Via dynamic linking. That worked, and its performance was the same as the example sd.exe
CLI, and it detected and used the GPU correctly.
Sep 4, 2024
Tags: easydiffusion, ai, lab, performance, featured
tl;dr: Explored a possible optimization for Flux with diffusers
when using enable_sequential_cpu_offload()
. It did not work.
While trying to use Flux (nearly 22 GB of weights) with diffusers
on a 12 GB graphics card, I noticed that it barely used any GPU memory when using enable_sequential_cpu_offload()
. And it was super slow. It turns out that the largest module in Flux’s transformer model is around 108 MB, so because diffusers streams modules one-at-a-time, the peak VRAM usage never crossed above a few hundred MBs.