Image Generation · GPU Cloud

Stable Diffusion Cloud GPU 2026: Run SDXL & Flux Without a Local GPU

SDXL needs 8 GB VRAM. Flux.1-dev needs 16 GB. Buying a GPU that sits idle 90% of the time makes no sense. Here is how to run the best open-source image models on cloud GPUs for fractions of a cent per image — GDPR-compliant, no setup, pay per second.

Updated: April 2026·12 min read·Covers: SDXL, Flux.1, ControlNet, SVD

TL;DR

• SDXL runs well on 8 GB VRAM (RTX 3080 class); Flux.1-dev needs 16 GB or fp8 quantization to 8 GB
• Cloud GPU costs: $0.50–$0.74/hr for an RTX 4090 — a 500-image batch costs under $0.25
• EU image studios need GDPR-compliant providers — most GPU clouds don't qualify
• GhostNexus offers RTX 4090s at $0.50/hr, EU-hosted, DPA-ready, billed per second

VRAM Requirements by Model

Your choice of model determines the GPU you need. Here's the minimum VRAM for each major image model in 2026 (float16 unless noted):

Model	Min VRAM	Batch size	Recommended GPU
SD 1.5	4 GB	1–4 at 512px	RTX 3070
SDXL 1.0	8 GB	1–2 at 1024px	RTX 3080
SDXL + ControlNet	12 GB	1 at 1024px	RTX 3080 Ti / 4070
Flux.1-schnell	12 GB	1–2 at 1024px	RTX 4070 / 3090
Flux.1-dev	16 GB	1 at 1024px	RTX 4080 / A100
Flux.1-dev (fp8)	8 GB	1 at 1024px	RTX 3080
Stable Video Diffusion	20 GB	1 video	RTX 3090 / A100

RTX 4090 (24 GB VRAM) covers all models above without quantization. It's the default GPU on GhostNexus at $0.50/hr.

Cost Per Image on Cloud GPU

An RTX 4090 generates SDXL images at roughly 4–6 seconds each (20 steps, DPM++ 2M). At $0.50/hr:

$0.0007

per SDXL image

~5s at $0.50/hr

$0.0035

per Flux.1-schnell image

~25s at $0.50/hr

$0.014

per Flux.1-dev image

~100s at $0.50/hr

A batch of 1,000 SDXL images costs under $0.70 on GhostNexus. Compare that to Midjourney Pro at $60/month for 1,000 images, or DALL-E 3 at $0.04–$0.08 per image.

GPU Cloud Provider Comparison (Image Generation)

Provider	GPU	Price	GDPR	Notes
GhostNexusEU	RTX 4090 (24 GB)	$0.50/hr	Yes	EU-hosted, pay-per-second
RunPod	RTX 4090	$0.74/hr	Partial	Community = no EU DPA
Vast.ai	RTX 4090	$0.35–0.55/hr	No	Random hosts, no compliance
Lambda Labs	A10 (24 GB)	$0.60/hr	No	US-only data centers
Google Colab Pro+	A100 (40 GB)	$57/mo flat	No	Session limits, queued access
Paperspace	A100 (40 GB)	$3.09/hr	No	US and EU nodes available

Run SDXL on Cloud GPU: Full Script

Submit this script to GhostNexus with the Python SDK — it generates 4 images and prints the generation time:

import torch
from diffusers import StableDiffusionXLPipeline, DPMSolverMultistepScheduler
import time

# Load SDXL with optimized scheduler
pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    use_safetensors=True,
)
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("cuda")
pipe.enable_xformers_memory_efficient_attention()

prompts = [
    "A cyberpunk city at night, neon lights, ultra detailed, 8k",
    "Portrait of an astronaut on Mars, cinematic, Hasselblad",
    "Ancient forest waterfall, golden hour, photorealistic",
    "Abstract geometric art, vibrant colors, minimalist",
]

t0 = time.perf_counter()
for i, prompt in enumerate(prompts):
    image = pipe(
        prompt=prompt,
        num_inference_steps=20,
        guidance_scale=7.5,
        width=1024, height=1024,
    ).images[0]
    image.save(f"/tmp/output_{i}.png")
    print(f"Image {i+1}/4 done — {time.perf_counter()-t0:.1f}s elapsed")

total = time.perf_counter() - t0
print(f"\n4 images in {total:.1f}s ({total/4:.1f}s/image)")
print(f"Estimated cost: ${total/3600 * 0.50:.5f}")

# Submit with the SDK:
import ghostnexus

client = ghostnexus.Client()
job = client.run("sdxl_batch.py", task_name="sdxl-batch-4")
for chunk in job.stream_logs():
    print(chunk, end="", flush=True)

# Output:
# Image 1/4 done — 5.2s elapsed
# Image 2/4 done — 10.4s elapsed
# Image 3/4 done — 15.7s elapsed
# Image 4/4 done — 20.9s elapsed
#
# 4 images in 20.9s (5.2s/image)
# Estimated cost: $0.00290

Running Flux.1 on Cloud GPU

Flux.1-dev produces significantly better image quality than SDXL but requires more VRAM and takes longer per image. On an RTX 4090 (24 GB):

→Flux.1-schnell: 4 inference steps, ~25s/image, best quality/speed ratio. Cost: $0.0035/image.
→Flux.1-dev: 20–50 steps, ~100s/image, maximum quality. Cost: $0.014/image.
→Flux.1-dev (fp8 quantized): Runs in 8 GB VRAM on RTX 3080, ~150s/image, minor quality loss.

import torch
from diffusers import FluxPipeline

pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-schnell",
    torch_dtype=torch.bfloat16
)
pipe = pipe.to("cuda")

image = pipe(
    "A hyperrealistic portrait of a woman in Renaissance style",
    num_inference_steps=4,  # schnell = 4 steps
    guidance_scale=0.0,
    height=1024, width=1024,
).images[0]
image.save("/tmp/flux_output.png")

GDPR and Image Generation

If you generate images of real people, use training data from EU users, or run an image generation service for EU customers, GDPR applies to your GPU infrastructure.

Most GPU cloud providers process data on US servers with no valid EU Standard Contractual Clauses (SCCs). That creates legal exposure under GDPR Art. 44–49 (international transfers).

GhostNexus runs on EU infrastructure (Frankfurt), signs a Data Processing Agreement on request, and maintains a sub-processor list in compliance with GDPR Art. 28.

Request a Data Processing Agreement

ControlNet and img2img on Cloud

ControlNet requires an additional 1–2 GB VRAM on top of the base model. On an RTX 4090, SDXL + ControlNet generates images in 8–12 seconds at 1024×1024:

✓Canny edge, depth, pose, and tile ControlNets all work out of the box
✓Upload your source images via inline=True with base64-encoded data
✓Output images returned in job logs as base64 or saved to a persistent volume (coming Q3 2026)

Try It Now — $15 Free Credits

Get $15 Free Credits View Pricing

Use code WELCOME15 at signup

Frequently Asked Questions

Can I use my own custom fine-tuned SDXL model?

Yes. Upload your model weights as part of your script or download them from Hugging Face at job start. The container has internet access during setup only (outbound HTTPS for model downloads).

How do I get the generated images back?

Encode images as base64 in your script and print them to stdout — they appear in the job output logs. Persistent storage and S3-compatible output volumes are on the Q3 2026 roadmap.

How long does it take to start a job?

Cold start (model download + container launch): 30–90 seconds for a 6 GB SDXL checkpoint. Subsequent jobs with cached models: under 10 seconds.

Can I run multiple GPU jobs at once?

Yes. Use the async client to dispatch jobs concurrently. Each job runs on a separate GPU node. There is no multi-GPU single-job support currently.

Is ComfyUI or InvokeAI supported?

These tools require a UI/browser session, which is not supported. The platform runs Python scripts only. You can use their underlying pipelines (diffusers, comfy-script) directly in your script.

RTX 4090 Cloud Rental: Full Comparison 2026 →Google Colab Alternative: GPU Cloud for ML in 2026 →Fine-Tune LLaMA 3 on Cloud GPU →RunPod Alternative: GDPR-Compliant GPU Cloud →