Sep 4, 2024
Tags: easydiffusion, ai, lab, performance, featured
tl;dr: Explored a possible optimization for Flux with diffusers
when using enable_sequential_cpu_offload()
. It did not work.
While trying to use Flux (nearly 22 GB of weights) with diffusers
on a 12 GB graphics card, I noticed that it barely used any GPU memory when using enable_sequential_cpu_offload()
. And it was super slow. It turns out that the largest module in Flux’s transformer model is around 108 MB, so because diffusers streams modules one-at-a-time, the peak VRAM usage never crossed above a few hundred MBs.