Skip to main content
Image Generation · GPU Cloud

Stable Diffusion Cloud GPU 2026: Run SDXL & Flux Without a Local GPU

SDXL needs 8 GB VRAM. Flux.1-dev needs 16 GB. Buying a GPU that sits idle 90% of the time makes no sense. Here is how to run the best open-source image models on cloud GPUs for fractions of a cent per image — GDPR-compliant, no setup, pay per second.

Updated: April 2026·12 min read·Covers: SDXL, Flux.1, ControlNet, SVD

TL;DR

  • • SDXL runs well on 8 GB VRAM (RTX 3080 class); Flux.1-dev needs 16 GB or fp8 quantization to 8 GB
  • • Cloud GPU costs: $0.50–$0.74/hr for an RTX 4090 — a 500-image batch costs under $0.25
  • • EU image studios need GDPR-compliant providers — most GPU clouds don't qualify
  • • GhostNexus offers RTX 4090s at $0.50/hr, EU-hosted, DPA-ready, billed per second

VRAM Requirements by Model

Your choice of model determines the GPU you need. Here's the minimum VRAM for each major image model in 2026 (float16 unless noted):

ModelMin VRAMBatch sizeRecommended GPU
SD 1.54 GB1–4 at 512pxRTX 3070
SDXL 1.08 GB1–2 at 1024pxRTX 3080
SDXL + ControlNet12 GB1 at 1024pxRTX 3080 Ti / 4070
Flux.1-schnell12 GB1–2 at 1024pxRTX 4070 / 3090
Flux.1-dev16 GB1 at 1024pxRTX 4080 / A100
Flux.1-dev (fp8)8 GB1 at 1024pxRTX 3080
Stable Video Diffusion20 GB1 videoRTX 3090 / A100

RTX 4090 (24 GB VRAM) covers all models above without quantization. It's the default GPU on GhostNexus at $0.50/hr.

Cost Per Image on Cloud GPU

An RTX 4090 generates SDXL images at roughly 4–6 seconds each (20 steps, DPM++ 2M). At $0.50/hr:

$0.0007
per SDXL image
~5s at $0.50/hr
$0.0035
per Flux.1-schnell image
~25s at $0.50/hr
$0.014
per Flux.1-dev image
~100s at $0.50/hr

A batch of 1,000 SDXL images costs under $0.70 on GhostNexus. Compare that to Midjourney Pro at $60/month for 1,000 images, or DALL-E 3 at $0.04–$0.08 per image.

GPU Cloud Provider Comparison (Image Generation)

ProviderGPUPriceGDPRNotes
GhostNexusEURTX 4090 (24 GB)$0.50/hrYesEU-hosted, pay-per-second
RunPodRTX 4090$0.74/hrPartialCommunity = no EU DPA
Vast.aiRTX 4090$0.35–0.55/hrNoRandom hosts, no compliance
Lambda LabsA10 (24 GB)$0.60/hrNoUS-only data centers
Google Colab Pro+A100 (40 GB)$57/mo flatNoSession limits, queued access
PaperspaceA100 (40 GB)$3.09/hrNoUS and EU nodes available

Run SDXL on Cloud GPU: Full Script

Submit this script to GhostNexus with the Python SDK — it generates 4 images and prints the generation time:

import torch
from diffusers import StableDiffusionXLPipeline, DPMSolverMultistepScheduler
import time

# Load SDXL with optimized scheduler
pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    use_safetensors=True,
)
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("cuda")
pipe.enable_xformers_memory_efficient_attention()

prompts = [
    "A cyberpunk city at night, neon lights, ultra detailed, 8k",
    "Portrait of an astronaut on Mars, cinematic, Hasselblad",
    "Ancient forest waterfall, golden hour, photorealistic",
    "Abstract geometric art, vibrant colors, minimalist",
]

t0 = time.perf_counter()
for i, prompt in enumerate(prompts):
    image = pipe(
        prompt=prompt,
        num_inference_steps=20,
        guidance_scale=7.5,
        width=1024, height=1024,
    ).images[0]
    image.save(f"/tmp/output_{i}.png")
    print(f"Image {i+1}/4 done — {time.perf_counter()-t0:.1f}s elapsed")

total = time.perf_counter() - t0
print(f"\n4 images in {total:.1f}s ({total/4:.1f}s/image)")
print(f"Estimated cost: ${total/3600 * 0.50:.5f}")
# Submit with the SDK:
import ghostnexus

client = ghostnexus.Client()
job = client.run("sdxl_batch.py", task_name="sdxl-batch-4")
for chunk in job.stream_logs():
    print(chunk, end="", flush=True)

# Output:
# Image 1/4 done — 5.2s elapsed
# Image 2/4 done — 10.4s elapsed
# Image 3/4 done — 15.7s elapsed
# Image 4/4 done — 20.9s elapsed
#
# 4 images in 20.9s (5.2s/image)
# Estimated cost: $0.00290

Running Flux.1 on Cloud GPU

Flux.1-dev produces significantly better image quality than SDXL but requires more VRAM and takes longer per image. On an RTX 4090 (24 GB):

  • Flux.1-schnell: 4 inference steps, ~25s/image, best quality/speed ratio. Cost: $0.0035/image.
  • Flux.1-dev: 20–50 steps, ~100s/image, maximum quality. Cost: $0.014/image.
  • Flux.1-dev (fp8 quantized): Runs in 8 GB VRAM on RTX 3080, ~150s/image, minor quality loss.
import torch
from diffusers import FluxPipeline

pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-schnell",
    torch_dtype=torch.bfloat16
)
pipe = pipe.to("cuda")

image = pipe(
    "A hyperrealistic portrait of a woman in Renaissance style",
    num_inference_steps=4,  # schnell = 4 steps
    guidance_scale=0.0,
    height=1024, width=1024,
).images[0]
image.save("/tmp/flux_output.png")

GDPR and Image Generation

If you generate images of real people, use training data from EU users, or run an image generation service for EU customers, GDPR applies to your GPU infrastructure.

Most GPU cloud providers process data on US servers with no valid EU Standard Contractual Clauses (SCCs). That creates legal exposure under GDPR Art. 44–49 (international transfers).

GhostNexus runs on EU infrastructure (Frankfurt), signs a Data Processing Agreement on request, and maintains a sub-processor list in compliance with GDPR Art. 28.

ControlNet and img2img on Cloud

ControlNet requires an additional 1–2 GB VRAM on top of the base model. On an RTX 4090, SDXL + ControlNet generates images in 8–12 seconds at 1024×1024:

  • Canny edge, depth, pose, and tile ControlNets all work out of the box
  • Upload your source images via inline=True with base64-encoded data
  • Output images returned in job logs as base64 or saved to a persistent volume (coming Q3 2026)

Try It Now — $15 Free Credits

Sign up and run your first SDXL or Flux.1 batch in under 5 minutes. No credit card required for the free credits.

Use code WELCOME15 at signup

Frequently Asked Questions

Can I use my own custom fine-tuned SDXL model?

Yes. Upload your model weights as part of your script or download them from Hugging Face at job start. The container has internet access during setup only (outbound HTTPS for model downloads).

How do I get the generated images back?

Encode images as base64 in your script and print them to stdout — they appear in the job output logs. Persistent storage and S3-compatible output volumes are on the Q3 2026 roadmap.

How long does it take to start a job?

Cold start (model download + container launch): 30–90 seconds for a 6 GB SDXL checkpoint. Subsequent jobs with cached models: under 10 seconds.

Can I run multiple GPU jobs at once?

Yes. Use the async client to dispatch jobs concurrently. Each job runs on a separate GPU node. There is no multi-GPU single-job support currently.

Is ComfyUI or InvokeAI supported?

These tools require a UI/browser session, which is not supported. The platform runs Python scripts only. You can use their underlying pipelines (diffusers, comfy-script) directly in your script.

Related Articles