Sep 4, 2024

Tags: easydiffusion, ai, lab, performance, featured

tl;dr: Explored a possible optimization for Flux with diffusers when using enable_sequential_cpu_offload(). It did not work.

While trying to use Flux (nearly 22 GB of weights) with diffusers on a 12 GB graphics card, I noticed that it barely used any GPU memory when using enable_sequential_cpu_offload(). And it was super slow. It turns out that the largest module in Flux’s transformer model is around 108 MB, so because diffusers streams modules one-at-a-time, the peak VRAM usage never crossed above a few hundred MBs.