Nov 19, 2025
Following up to the previous post on sdkit v3’s design:
The initial experiments with generating ggml from onnx models were promising, and it looks like a fairly solid path forward. It produces numerically-identical results, and there’s a clear path to reach performance-parity with stable-diffusion.cpp with a few basic optimizations (since both will eventually generate the same underlying ggml graph).
But I think it’s better to use the simpler option first, i.e. use stable-diffusion.cpp directly. It mostly meets the design goals for sdkit v3 (after a bit of performance tuning). Everything else is premature optimization and scope bloat.
Here’s a possible roadmap instead:
- sdkit v3 - Use
stable-diffusion.cpp, and change whatever’s necessary to support Easy Diffusion’s requirements. Upstream the changes where possible. - sdkit v4 - Develop graph-compiler further to generate ggml automatically from onnx (for the required models). This will bypass the need for hand-written ports of models like in sd.cpp, and enable further high-level optimizations that’ll run automatically on the graph.
- sdkit v5 - Add automatic GPU kernel code-generation in
graph-compiler, bypassing the need for hand-written kernels like in ggml. This will enable further low-level optimizations around tiling, fusion, memory transfers etc.
The benefits are:
- Saves time. For e.g. I don’t need to reimplement LoRA, ControlNet etc right away. Or write my own GPU kernel code-generator.
- Keeps delivering value to users ASAP. I don’t need to wait for massive projects to finish before delivering value.
- Helps gain experience progressively. For e.g. the experience of manually optimizing the
Conv2Doperator bottleneck (and the overall graph) instable-diffusion.cppwill be useful later (when building an automatic optimizer).