Easy Diffusion v3

# filter by: [posts | worklogs]

Jan 18, 2026

Started the long-pending rewrite of Easy Diffusion’s server code. v4 intends to replace the Python (and PyTorch) based server with a simple C++ version. The reason for rewriting the server in C++ is to achieve sub-second startup time for the UI, and to reduce the download size (won’t need to distribute Python along with Easy Diffusion) or mess with conda/venv etc. And it’s also something that I want to do for personal taste, i.e. de-bloating what doesn’t need to be bloated.

Jan 8, 2026

Tags: easydiffusion, sdkit, worklog

For Z-Image, the performance of the stock version of chromaForge is poorer than sd.cpp’s. Mainly because chromaForge isn’t able to run the smaller gguf quantized models that sd.cpp is able to run (chromaForge fails with the errors that I was fixing yesterday).

If I really want to push through with this, it would be good to fix the remaining issues with gguf models in chromaForge. Only then can the performance be truly compared (in order to decide whether to release this into ED 3.5). I want to compare the performance of the smaller gguf models, because that’s what ED’s users will run typically.

Jan 7, 2026

Tags: easydiffusion, sdkit, worklog

Worked on fixing Z-Image support in ED’s fork of chromaForge (a fork of Forge WebUI). Fixed a number of integration issues. It’s now crashing on a matrix multiplication error, which looks like an incorrectly transposed matrix (mostly due to reading the weights in the wrong order).

I’ll try to install a stock version of chromaForge to see its raw performance with Z-Image (and whether it’s worth pursuing the integration), and also use it to help investigate the matrix multiplication error (and any future errors).

Dec 25, 2025

Tags: worklog, easydiffusion

Collecting the worklog over the past few weeks.

Enabled Flash-Attention and CPU offloading by default in sdkit3 (i.e. Easy Diffusion v4).
Added optional VAE tiling (and VAE tile size configuration) via config.yaml in Easy Diffusion v4.
Created Easy Diffusion’s fork of Forge WebUI, in order to apply the patches required to run with ED. And also to try adding new features like Z-Image (which are missing in the seemingly-abandoned main Forge repo).
Improved the heuristics used for killing and restarting the backend child process, since /ping requests are unreliable if the backend is under heavy load.
Merged a few PRs (1 2) for torchruntime that improve support for pinning pre-cu128 torch versions and fix the order of detection of DirectML and CUDA (prefers CUDA).
Added progress bars when downloading v4 backend artifacts.

Nov 7, 2025

Tags: ml, compiler, onnx, ggml, sdkit, worklog

Wrote a simple script to convert ONNX to GGML. It auto-generates C++ code that calls the corresponding ggml functions (for each ONNX operator). This file can then be compiled and run like a normal C++ ggml program, and will produce the same results as the original model in PyTorch.

The generated file can work on multiple backends: CPU, CUDA, ROCm, Vulkan, Metal etc, by providing the correct compiler flags during cmake -B, e.g. -D GGML_CUDA=1 for CUDA.

Sep 1, 2025

Tags: easydiffusion, admin, worklog

Cleared the backlog of stale issues on ED’s github repo. This brought down the number of open issues from ~350 to 74.

A number of those suggestions and issues are already being tracked on my task board. The others had either been fixed, or were really old (i.e. not relevant to reply anymore).

While I’d have genuinely wanted to solve all of those unresolved issues, I was on a break from this project for nearly 1.5 years, so unfortunately it is what it is.

Jan 28, 2025

Tags: easydiffusion, sdkit, freebird, worklog

Continued to test and fix issues in sdkit, after the change to support DirectML. The change is fairly intrusive, since it removes direct references to torch.cuda with a layer of abstraction.

Fixed a few regressions, and it now passes all the regression tests for CPU and CUDA support (i.e. existing users). Will test for DirectML next, although it will fail (with out-of-memory) for anything but the simplest tests (since DirectML is quirky with memory allocation).