GDPR & ComplianceFine-TuningApril 2026 · 12 min read

Fine-Tune Llama 3 on Medical Data — GDPR-Compliant, EU Cloud GPU

Your DPO blocked RunPod. AWS is US infrastructure. Your internal RTX 2080 Ti has a 3-week job queue.

This guide walks through a complete QLoRA fine-tuning pipeline for Llama 3 on sensitive EU data — medical NLP, legal documents, financial records — using GPU compute that never leaves the EEA. All code is runnable. The compliance section is written for DPOs, not just engineers.

Part 2 — Architecture: data flow and isolation

GhostNexus routes your job to a GPU node in Frankfurt or Helsinki. The training container runs with network access fully disabled. Model weights are returned to you. Training data never touches non-EU infrastructure at any point in the pipeline.

data flow diagram

[Your dataset (EU origin)]
        │
        │  HTTPS — encrypted in transit
        ▼
[GhostNexus API — Frankfurt]
        │
        │  Job dispatched to nearest EU node
        ▼
[GPU Node — Hetzner Frankfurt / Helsinki]
        │
        │  Docker --network=none  ← no outbound connections possible
        ▼
  ┌─────────────────────────────┐
  │   Training container        │
  │   Llama 3 + QLoRA adapters  │
  │   your dataset (ephemeral)  │
  └─────────────────────────────┘
        │
        │  Training complete
        ▼
[LoRA adapter weights returned to you]
[Training data deleted — zero retention]

  Data never touches non-EU infrastructure.

Data residency

EEA only

Hetzner DE + FI

Network isolation

--network=none

Docker flag, no exfiltration path

Data retention

Zero

Scripts and data ephemeral

Part 3 — Dataset preparation and anonymisation

Before fine-tuning, you should apply at minimum pseudonymisation to your training corpus. For clinical NLP (the example below: diagnosis notes to ICD-10 codes), this means stripping patient identifiers while preserving the clinical signal.

The example below uses a fictional medical NLP dataset. Replace the records list with your own anonymised data. The push_to_hub call is optional — you can load directly from a local JSONL file.

prepare_dataset.py

from datasets import Dataset
import json, re

# ─── Fictional clinical NLP examples ────────────────────────────────────────
# Format: free-text note → ICD-10 code (structured label)
# In production: apply a NER-based de-identification step before this point.

records = [
    {"note": "Patient presents with fever 39.1°C, dry cough for 3 days, no dyspnea.", "icd10": "J06.9"},
    {"note": "Acute chest pain radiating to left arm. ECG: ST elevation leads II, III, aVF.", "icd10": "I21.1"},
    {"note": "Type 2 diabetes, poorly controlled. HbA1c 9.2%. Initiating insulin therapy.", "icd10": "E11.65"},
    {"note": "Recurring migraine with aura, 2–3 episodes per month. Photophobia, nausea.", "icd10": "G43.109"},
    {"note": "Post-operative wound infection, right knee arthroplasty, purulent discharge.", "icd10": "T84.50XA"},
    {"note": "Anxiety disorder with panic attacks, sleep onset insomnia. PHQ-9 score: 14.", "icd10": "F41.0"},
]

# ─── Basic pseudonymisation check ───────────────────────────────────────────
# Strip any residual patterns that look like names or dates.
PATTERNS = [
    r"\b(?:Mr|Mrs|Dr|Prof)\.?\s+[A-Z][a-z]+",  # titles + names
    r"\b\d{2}/\d{2}/\d{4}\b",                  # DD/MM/YYYY dates
    r"\b(?:born|DOB)\s+\d{4}\b",               # birth year references
]

def pseudonymise(text: str) -> str:
    for pat in PATTERNS:
        text = re.sub(pat, "[REDACTED]", text)
    return text

records = [{"note": pseudonymise(r["note"]), "icd10": r["icd10"]} for r in records]

# ─── Build HuggingFace Dataset ───────────────────────────────────────────────
dataset = Dataset.from_list(records)
print(dataset)

# Option A: push to private Hub repo (stays in your org's infra)
# dataset.push_to_hub("your-org/medical-icd10-private", private=True)

# Option B: save as JSONL for direct use in training script
with open("medical_train.jsonl", "w") as f:
    for row in records:
        f.write(json.dumps(row, ensure_ascii=False) + "\n")

print(f"Saved {len(records)} examples to medical_train.jsonl")

Note on Article 9 data. If your dataset contains health records, the legal basis for processing during fine-tuning must be established before this step — typically Article 9(2)(h) (healthcare purposes) or explicit consent under Article 9(2)(a). Anonymisation that renders data outside GDPR scope entirely (Recital 26) is the cleanest path, but requires a formal anonymisation impact assessment. Pseudonymisation reduces risk but the data remains personal data under GDPR.

Part 4 — QLoRA fine-tuning script (runnable)

This script loads Llama 3 8B Instruct in 4-bit quantization (QLoRA), attaches LoRA adapters to the attention projections, and fine-tunes on the medical dataset. Memory footprint: ~10 GB VRAM for 4-bit with batch size 4. An RTX 3080 (10 GB) is borderline — use gradient accumulation steps of 8 and batch size 2 if you hit OOM. RTX 4090 (24 GB) is comfortable.

RTX 3080 10GB

Tight (reduce batch to 2)

RTX 3090 / 4090 24GB

Comfortable

A100 40/80GB

Ideal for fp16 full-rank

fine_tune_llama3_medical.pyQLoRA · 4-bit · PEFT

# fine_tune_llama3_medical.py
# Runs on GhostNexus EU GPU nodes (RTX 4090 recommended).
# pip install transformers peft trl bitsandbytes accelerate datasets

import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    TrainingArguments,
    BitsAndBytesConfig,
)
from peft import LoraConfig, get_peft_model, TaskType
from trl import SFTTrainer
from datasets import load_dataset

# ─── Config ─────────────────────────────────────────────────────────────────
MODEL_ID   = "meta-llama/Meta-Llama-3-8B-Instruct"
OUTPUT_DIR = "/tmp/llama3-medical-adapter"

# ─── 4-bit quantization (QLoRA) ─────────────────────────────────────────────
# NF4 quantization is specifically designed for normally-distributed weights.
# double_quant quantizes the quantization constants themselves → saves ~0.4 bits/param.
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

# ─── Tokenizer ───────────────────────────────────────────────────────────────
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"  # required for SFTTrainer

# ─── Base model ─────────────────────────────────────────────────────────────
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=False,
)
model.config.use_cache = False
model.config.pretraining_tp = 1

# ─── LoRA adapters ──────────────────────────────────────────────────────────
# r=16, alpha=32: a sensible default for domain adaptation.
# Targeting all 4 attention projections captures most of the task-specific signal.
# Trainable params: ~0.1% of total — the quantized base model is frozen.
lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# → trainable params: 83,886,080 || all params: 8,114,032,640 || trainable%: 1.0338

# ─── Dataset ─────────────────────────────────────────────────────────────────
# Uses the local JSONL file produced by prepare_dataset.py.
# Replace with load_dataset("your-org/medical-icd10-private") if using Hub.
dataset = load_dataset("json", data_files="medical_train.jsonl", split="train")

LLAMA3_CHAT_TEMPLATE = (
    "<|begin_of_text|>"
    "<|start_header_id|>user<|end_header_id|>\n"
    "{note}"
    "<|eot_id|>"
    "<|start_header_id|>assistant<|end_header_id|>\n"
    "{icd10}"
    "<|eot_id|>"
)

def format_prompt(example):
    return {"text": LLAMA3_CHAT_TEMPLATE.format(**example)}

dataset = dataset.map(format_prompt, remove_columns=dataset.column_names)

# ─── Training arguments ──────────────────────────────────────────────────────
training_args = TrainingArguments(
    output_dir=OUTPUT_DIR,
    num_train_epochs=3,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,      # effective batch = 16
    gradient_checkpointing=True,        # trades compute for VRAM
    warmup_steps=50,
    learning_rate=2e-4,
    bf16=True,                          # bfloat16 compute, stable on Ampere+
    logging_steps=10,
    save_strategy="epoch",
    optim="paged_adamw_32bit",          # paged optimiser: prevents OOM spikes
    lr_scheduler_type="cosine",
    report_to="none",                   # no external telemetry — required for GDPR jobs
    dataloader_pin_memory=False,
)

# ─── Trainer ─────────────────────────────────────────────────────────────────
trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=512,
    packing=False,
)

trainer.train()

# ─── Save adapter weights only (not the quantized base model) ────────────────
model.save_pretrained(OUTPUT_DIR)
tokenizer.save_pretrained(OUTPUT_DIR)
print(f"LoRA adapter saved to {OUTPUT_DIR}")
print("Merge with base model locally using: model.merge_and_unload()")

Dependencies: transformers>=4.40, peft>=0.10, trl>=0.8, bitsandbytes>=0.43, accelerate>=0.27, datasets>=2.18

Part 5 — Submitting the job to GhostNexus

The client runs locally on your machine (inside the EEA). It uploads the script and dataset, dispatches the job to an EU node, then streams logs and retrieves the adapter weights when training completes.

submit_training_job.py

# submit_training_job.py
# Runs locally. Submits the training script + dataset to a GhostNexus EU node.
# pip install ghostnexus

import ghostnexus
import time

client = ghostnexus.GhostNexus(api_key="YOUR_API_KEY")

# Read the training script
with open("fine_tune_llama3_medical.py", "r") as f:
    script = f.read()

# Submit — GhostNexus automatically selects the nearest EU node
result = client.run(
    script=script,
    task_name="llama3-medical-qlora",
    files=["medical_train.jsonl"],          # uploaded with the job, deleted after
    gpu_type="RTX_4090",                    # or RTX_3090, A100_40G
    region="eu",                            # never routes outside EEA
    webhook_url="https://your-server.com/training-complete",  # optional
)

job_id = result["job_id"]
print(f"Job {job_id} queued | Status: {result['status']}")
print(f"Estimated cost: ${result['cost_estimate_usd']:.2f}")

# ─── Poll for completion (or use webhook above) ──────────────────────────────
while True:
    status = client.get_job(job_id)
    print(f"[{status['elapsed_s']}s] {status['status']} — {status['log_tail']}")
    if status["status"] in ("completed", "failed"):
        break
    time.sleep(15)

if status["status"] == "completed":
    # Retrieve LoRA adapter weights to local directory
    client.download_artifacts(job_id, dest="./llama3-medical-adapter/")
    print(f"Adapter weights downloaded.")
    print(f"Final cost: ${status['cost_usd']:.3f}")
    print("\nLast training logs:")
    print(status["output"][-2000:])

What happens to your data. The training script and JSONL file are encrypted in transit over HTTPS, stored ephemerally in the container filesystem during job execution, and deleted immediately on job completion. GhostNexus does not retain, copy, or inspect training data. The only artefact returned is the LoRA adapter directory.

Part 6 — GDPR compliance: the technical proof

This section is written for DPOs and legal teams reviewing whether GhostNexus constitutes an adequate processor under GDPR Article 28.

Compute is exclusively in the EEA

All GPU nodes run on Hetzner dedicated servers in Nuremberg (NBG1), Falkenstein (FSN1), and Helsinki (HEL1). Hetzner is a German company (Gunzenhausen, Bavaria). No node is outside the EEA. Node selection is constrained by the region=eu parameter enforced at API level.

Network isolation: Docker --network=none

Every training container is started with the --network=none Docker flag. The container has no network interface other than loopback. It cannot make HTTP requests, DNS lookups, or any outbound connection. Data exfiltration from inside the container is technically impossible.

Zero data retention

Training data uploaded with a job is written to a tmpfs volume inside the container. On container exit, the volume is destroyed. GhostNexus infrastructure does not mount persistent storage for training data. Artefacts (model weights) are available for 24 hours in encrypted object storage, then purged.

Per-second billing eliminates retention windows

Billing stops the moment the container exits. There is no minimum billing window that would incentivise keeping a container alive — and with it, any data — longer than necessary.

EU VAT invoice generated automatically

Invoices are issued by GhostNexus (EU entity) with EU VAT. Your accounting team receives a compliant EU invoice, not a US dollar wire transfer to an American company.

DPA available on request

GhostNexus provides a Data Processing Agreement under Article 28(3) covering the scope of processing, sub-processors, deletion obligations, and audit rights. Contact contact@ghostnexus.net.

Full privacy policy and terms of service: ghostnexus.net/privacy and ghostnexus.net/terms

Part 7 — Cost comparison

Prices as of April 2026. A typical QLoRA fine-tuning run (3 epochs, medical NLP dataset, RTX 4090) completes in 45–90 minutes depending on dataset size.

Setup	GPU equivalent	1h training	GDPR-safe
Internal RTX 2080 Ti IT queue, 3× slower, no scale	Slower, shared	"Free" (queue)	✓
AWS g5.xlarge (A10G) US data centers, Article 46 risk	~A10G 24GB	$1.01/hr	✗
RunPod RTX 4090 US/mixed infra, DPO-blocked	RTX 4090 24GB	$0.74/hr	✗
GhostNexus RTX 4090 EU-only, Docker network isolation	RTX 4090 24GB	$0.50/hr	✓

RunPod and AWS prices from public pricing pages, April 2026. GhostNexus RTX 4090 rate is the standard on-demand price — volume discounts available for >100h/month.

Part 8 — FAQ for DPOs and legal teams

Where exactly is the compute running?

Exclusively on Hetzner infrastructure in Germany and Finland — Nuremberg, Falkenstein, and Helsinki. All three locations are in the EEA. No job ever routes outside these datacenters.

Can GhostNexus access our training data or model weights?

No. Your script executes inside a Docker container started with --network=none. The container cannot make outbound network connections. GhostNexus infrastructure has no mechanism to read container memory or filesystem during execution. Scripts and any uploaded data are deleted immediately after the job completes — there is no retention window.

Do you sign a Data Processing Agreement (DPA)?

Yes. GhostNexus acts as a data processor under GDPR Article 28. We provide a standard DPA template and can negotiate custom terms for enterprise contracts. Contact contact@ghostnexus.net with subject 'DPA Request' to start the process.

Does "Privacy Shield 2.0" (EU-US Data Privacy Framework) solve the RunPod problem?

Not reliably for sensitive categories of data (Article 9 GDPR — health, biometric, financial). The EU-US DPF is under active legal challenge as of 2026. For healthcare and banking data, the only defensible position is full EEA data residency — not a contractual mechanism that can be invalidated by a court ruling.

Can we audit the infrastructure?

Yes. The GhostNexus node client is open source. You can inspect the Docker flags applied to every job. We can also provide a signed attestation of the Docker run parameters used for your specific job IDs on request.

Start your first GDPR-compliant training run

New accounts receive $15 free credits — enough for a complete QLoRA fine-tuning run on an RTX 4090 (approximately 30 minutes of training time). No card required to start.

Start with $15 free credits Request a DPA for your organisation

Use code WELCOME15 at registration · EU VAT invoice · DPA available · EEA compute only

RunPod Alternatives for GDPR-Compliant EU Training PyTorch Training on Cloud GPU — Complete Guide Fine-Tune Llama 3 on Cloud GPU — Setup to Deployment