Nov 21, 2024
Tags: easydiffusion, stable-diffusion, c++
Spent some more time on the v4 experiments for Easy Diffusion (i.e. C++ based, fast-startup, lightweight). stable-diffusion.cpp
is missing a few features, which will be necessary for Easy Diffusion’s typical workflow. I wasn’t keen on forking stable-diffusion.cpp, but it’s probably faster to work on a fork for now.
For now, I’ve added live preview and per-step progress callbacks (based on a few pending pull-requests on sd.cpp). And protection from GGML_ASSERT
killing the entire process. I’ve been looking at the ability to load individual models (like the vae) without needing to reload the entire SD model.
Nov 19, 2024
Tags: easydiffusion, stable-diffusion
Spent a few days getting a C++ based version of Easy Diffusion working, using stable-diffusion.cpp. I’m working with a fork of stable-diffusion.cpp here, to add a few changes like per-step callbacks, live image previews etc.
It doesn’t have a UI yet, and currently hardcodes a model path. It exposes a RESTful API server (written using the Crow
C++ library), and uses a simple task manager that runs image generation tasks on a thread. The generated images are available at an API endpoint, and it shows the binary JPEG/PNG image (instead of base64 encoding).
Oct 16, 2024
Tags: stable-diffusion, c++, cuda, easydiffusion, lab, performance, featured
tl;dr - Today, I worked on using stable-diffusion.cpp in a simple C++ program. As a linked library, as well as compiling sd.cpp from scratch (with and without CUDA). The intent was to get a tiny and fast-starting executable UI for Stable Diffusion working. Also, ChatGPT is very helpful!
Part 1: Using sd.cpp as a library
First, I tried calling the stable-diffusion.cpp library from a simple C++ program (which just loads the model and renders an image). Via dynamic linking. That worked, and its performance was the same as the example sd.exe
CLI, and it detected and used the GPU correctly.
Sep 4, 2024
Tags: easydiffusion, ai, lab, performance, featured
tl;dr: Explored a possible optimization for Flux with diffusers
when using enable_sequential_cpu_offload()
. It did not work.
While trying to use Flux (nearly 22 GB of weights) with diffusers
on a 12 GB graphics card, I noticed that it barely used any GPU memory when using enable_sequential_cpu_offload()
. And it was super slow. It turns out that the largest module in Flux’s transformer model is around 108 MB, so because diffusers streams modules one-at-a-time, the peak VRAM usage never crossed above a few hundred MBs.