Aug 25, 2025
Tags: tensorRT, torch, easydiffusion, ggml, cuda, vulkan
Experimented with TensorRT-RTX (a new library offered by NVIDIA).
The first step was a tiny toy model, just to get the build and test setup working.
The reference model in PyTorch:
import torch
import torch.nn as nn
class TinyCNN(nn.Module):
def __init__(self):
super().__init__()
self.conv = nn.Conv2d(3, 8, 3, stride=1, padding=1)
self.relu = nn.ReLU()
self.pool = nn.AdaptiveAvgPool2d((1, 1))
self.fc = nn.Linear(8, 4) # 4-class toy output
def forward(self, x):
x = self.relu(self.conv(x))
x = self.pool(x).flatten(1)
return self.fc(x)
I ran this on a NVIDIA 4060 8 GB (Laptop) for 10K iterations, on Windows and WSL-with-Ubuntu, with float32 data.
Oct 16, 2024
Tags: stable-diffusion, c++, cuda, easydiffusion, lab, performance, featured
tl;dr - Today, I worked on using stable-diffusion.cpp in a simple C++ program. As a linked library, as well as compiling sd.cpp from scratch (with and without CUDA). The intent was to get a tiny and fast-starting executable UI for Stable Diffusion working. Also, ChatGPT is very helpful!
Part 1: Using sd.cpp as a library
First, I tried calling the stable-diffusion.cpp library from a simple C++ program (which just loads the model and renders an image). Via dynamic linking. That worked, and its performance was the same as the example sd.exe
CLI, and it detected and used the GPU correctly.