Fine-Tune Llama 3 on Medical Data — GDPR-Compliant, EU Cloud GPU
Your DPO blocked RunPod. AWS is US infrastructure. Your internal RTX 2080 Ti has a 3-week job queue.
This guide walks through a complete QLoRA fine-tuning pipeline for Llama 3 on sensitive EU data — medical NLP, legal documents, financial records — using GPU compute that never leaves the EEA. All code is runnable. The compliance section is written for DPOs, not just engineers.
Part 1 — Why your DPO is right to block US GPU clouds
Running LLM fine-tuning on cloud infrastructure is a data processing operation under GDPR. When your training dataset contains health records, clinical notes, financial transactions, or legal documents, you are processing personal data — and the legal framework constrains where that processing can happen.
GDPR Article 28 — Processor requirements
Any cloud provider executing your training script is a data processor. Article 28 requires a written contract (DPA) that binds the processor to process data only on your instructions. It also requires that the processor provides sufficient guarantees about technical and organisational measures. A standard SaaS ToS does not satisfy this.
GDPR Article 46 — Transfers to third countries
The United States is not an adequate country under GDPR Article 45. Transferring data to US infrastructure requires an Article 46 mechanism: Standard Contractual Clauses (SCCs), Binding Corporate Rules (BCRs), or a derogation. SCCs are widely used but do not fix the underlying problem — US law (CLOUD Act, FISA 702) can compel US providers to produce data even when SCCs are in place. The EU Court of Justice invalidated two previous adequacy decisions on exactly this basis (Schrems I, Schrems II).
"Privacy Shield 2.0" is not a safe harbor for sensitive data
The EU-US Data Privacy Framework adopted in 2023 is under active legal challenge before the Court of Justice of the EU as of early 2026. Several EU DPAs have issued guidance advising caution for health and financial data specifically. For Article 9 special categories, full EEA data residency is the only position that eliminates legal risk rather than managing it.
The practical consequence inside your organisation
DPO says no to RunPod, Lambda Labs, Vast.ai, AWS, GCP. ML team falls back to the shared internal RTX 2080 Ti. Job queue: 2–3 weeks. One training run that would take 4 hours on an RTX 4090 takes 14 hours on the 2080 Ti — and has to wait for the slot. Projects slip. The model never ships.
Part 2 — Architecture: data flow and isolation
GhostNexus routes your job to a GPU node in Frankfurt or Helsinki. The training container runs with network access fully disabled. Model weights are returned to you. Training data never touches non-EU infrastructure at any point in the pipeline.
[Your dataset (EU origin)]
│
│ HTTPS — encrypted in transit
▼
[GhostNexus API — Frankfurt]
│
│ Job dispatched to nearest EU node
▼
[GPU Node — Hetzner Frankfurt / Helsinki]
│
│ Docker --network=none ← no outbound connections possible
▼
┌─────────────────────────────┐
│ Training container │
│ Llama 3 + QLoRA adapters │
│ your dataset (ephemeral) │
└─────────────────────────────┘
│
│ Training complete
▼
[LoRA adapter weights returned to you]
[Training data deleted — zero retention]
Data never touches non-EU infrastructure.Data residency
EEA only
Hetzner DE + FI
Network isolation
--network=none
Docker flag, no exfiltration path
Data retention
Zero
Scripts and data ephemeral
Part 3 — Dataset preparation and anonymisation
Before fine-tuning, you should apply at minimum pseudonymisation to your training corpus. For clinical NLP (the example below: diagnosis notes to ICD-10 codes), this means stripping patient identifiers while preserving the clinical signal.
The example below uses a fictional medical NLP dataset. Replace the records list with your own anonymised data. The push_to_hub call is optional — you can load directly from a local JSONL file.
from datasets import Dataset
import json, re
# ─── Fictional clinical NLP examples ────────────────────────────────────────
# Format: free-text note → ICD-10 code (structured label)
# In production: apply a NER-based de-identification step before this point.
records = [
{"note": "Patient presents with fever 39.1°C, dry cough for 3 days, no dyspnea.", "icd10": "J06.9"},
{"note": "Acute chest pain radiating to left arm. ECG: ST elevation leads II, III, aVF.", "icd10": "I21.1"},
{"note": "Type 2 diabetes, poorly controlled. HbA1c 9.2%. Initiating insulin therapy.", "icd10": "E11.65"},
{"note": "Recurring migraine with aura, 2–3 episodes per month. Photophobia, nausea.", "icd10": "G43.109"},
{"note": "Post-operative wound infection, right knee arthroplasty, purulent discharge.", "icd10": "T84.50XA"},
{"note": "Anxiety disorder with panic attacks, sleep onset insomnia. PHQ-9 score: 14.", "icd10": "F41.0"},
]
# ─── Basic pseudonymisation check ───────────────────────────────────────────
# Strip any residual patterns that look like names or dates.
PATTERNS = [
r"\b(?:Mr|Mrs|Dr|Prof)\.?\s+[A-Z][a-z]+", # titles + names
r"\b\d{2}/\d{2}/\d{4}\b", # DD/MM/YYYY dates
r"\b(?:born|DOB)\s+\d{4}\b", # birth year references
]
def pseudonymise(text: str) -> str:
for pat in PATTERNS:
text = re.sub(pat, "[REDACTED]", text)
return text
records = [{"note": pseudonymise(r["note"]), "icd10": r["icd10"]} for r in records]
# ─── Build HuggingFace Dataset ───────────────────────────────────────────────
dataset = Dataset.from_list(records)
print(dataset)
# Option A: push to private Hub repo (stays in your org's infra)
# dataset.push_to_hub("your-org/medical-icd10-private", private=True)
# Option B: save as JSONL for direct use in training script
with open("medical_train.jsonl", "w") as f:
for row in records:
f.write(json.dumps(row, ensure_ascii=False) + "\n")
print(f"Saved {len(records)} examples to medical_train.jsonl")Note on Article 9 data. If your dataset contains health records, the legal basis for processing during fine-tuning must be established before this step — typically Article 9(2)(h) (healthcare purposes) or explicit consent under Article 9(2)(a). Anonymisation that renders data outside GDPR scope entirely (Recital 26) is the cleanest path, but requires a formal anonymisation impact assessment. Pseudonymisation reduces risk but the data remains personal data under GDPR.
Part 4 — QLoRA fine-tuning script (runnable)
This script loads Llama 3 8B Instruct in 4-bit quantization (QLoRA), attaches LoRA adapters to the attention projections, and fine-tunes on the medical dataset. Memory footprint: ~10 GB VRAM for 4-bit with batch size 4. An RTX 3080 (10 GB) is borderline — use gradient accumulation steps of 8 and batch size 2 if you hit OOM. RTX 4090 (24 GB) is comfortable.
RTX 3080 10GB
Tight (reduce batch to 2)
RTX 3090 / 4090 24GB
Comfortable
A100 40/80GB
Ideal for fp16 full-rank
# fine_tune_llama3_medical.py
# Runs on GhostNexus EU GPU nodes (RTX 4090 recommended).
# pip install transformers peft trl bitsandbytes accelerate datasets
import torch
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
TrainingArguments,
BitsAndBytesConfig,
)
from peft import LoraConfig, get_peft_model, TaskType
from trl import SFTTrainer
from datasets import load_dataset
# ─── Config ─────────────────────────────────────────────────────────────────
MODEL_ID = "meta-llama/Meta-Llama-3-8B-Instruct"
OUTPUT_DIR = "/tmp/llama3-medical-adapter"
# ─── 4-bit quantization (QLoRA) ─────────────────────────────────────────────
# NF4 quantization is specifically designed for normally-distributed weights.
# double_quant quantizes the quantization constants themselves → saves ~0.4 bits/param.
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
)
# ─── Tokenizer ───────────────────────────────────────────────────────────────
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right" # required for SFTTrainer
# ─── Base model ─────────────────────────────────────────────────────────────
model = AutoModelForCausalLM.from_pretrained(
MODEL_ID,
quantization_config=bnb_config,
device_map="auto",
trust_remote_code=False,
)
model.config.use_cache = False
model.config.pretraining_tp = 1
# ─── LoRA adapters ──────────────────────────────────────────────────────────
# r=16, alpha=32: a sensible default for domain adaptation.
# Targeting all 4 attention projections captures most of the task-specific signal.
# Trainable params: ~0.1% of total — the quantized base model is frozen.
lora_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
r=16,
lora_alpha=32,
target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
lora_dropout=0.05,
bias="none",
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# → trainable params: 83,886,080 || all params: 8,114,032,640 || trainable%: 1.0338
# ─── Dataset ─────────────────────────────────────────────────────────────────
# Uses the local JSONL file produced by prepare_dataset.py.
# Replace with load_dataset("your-org/medical-icd10-private") if using Hub.
dataset = load_dataset("json", data_files="medical_train.jsonl", split="train")
LLAMA3_CHAT_TEMPLATE = (
"<|begin_of_text|>"
"<|start_header_id|>user<|end_header_id|>\n"
"{note}"
"<|eot_id|>"
"<|start_header_id|>assistant<|end_header_id|>\n"
"{icd10}"
"<|eot_id|>"
)
def format_prompt(example):
return {"text": LLAMA3_CHAT_TEMPLATE.format(**example)}
dataset = dataset.map(format_prompt, remove_columns=dataset.column_names)
# ─── Training arguments ──────────────────────────────────────────────────────
training_args = TrainingArguments(
output_dir=OUTPUT_DIR,
num_train_epochs=3,
per_device_train_batch_size=4,
gradient_accumulation_steps=4, # effective batch = 16
gradient_checkpointing=True, # trades compute for VRAM
warmup_steps=50,
learning_rate=2e-4,
bf16=True, # bfloat16 compute, stable on Ampere+
logging_steps=10,
save_strategy="epoch",
optim="paged_adamw_32bit", # paged optimiser: prevents OOM spikes
lr_scheduler_type="cosine",
report_to="none", # no external telemetry — required for GDPR jobs
dataloader_pin_memory=False,
)
# ─── Trainer ─────────────────────────────────────────────────────────────────
trainer = SFTTrainer(
model=model,
args=training_args,
train_dataset=dataset,
dataset_text_field="text",
max_seq_length=512,
packing=False,
)
trainer.train()
# ─── Save adapter weights only (not the quantized base model) ────────────────
model.save_pretrained(OUTPUT_DIR)
tokenizer.save_pretrained(OUTPUT_DIR)
print(f"LoRA adapter saved to {OUTPUT_DIR}")
print("Merge with base model locally using: model.merge_and_unload()")Dependencies: transformers>=4.40, peft>=0.10, trl>=0.8, bitsandbytes>=0.43, accelerate>=0.27, datasets>=2.18
Part 5 — Submitting the job to GhostNexus
The client runs locally on your machine (inside the EEA). It uploads the script and dataset, dispatches the job to an EU node, then streams logs and retrieves the adapter weights when training completes.
# submit_training_job.py
# Runs locally. Submits the training script + dataset to a GhostNexus EU node.
# pip install ghostnexus
import ghostnexus
import time
client = ghostnexus.GhostNexus(api_key="YOUR_API_KEY")
# Read the training script
with open("fine_tune_llama3_medical.py", "r") as f:
script = f.read()
# Submit — GhostNexus automatically selects the nearest EU node
result = client.run(
script=script,
task_name="llama3-medical-qlora",
files=["medical_train.jsonl"], # uploaded with the job, deleted after
gpu_type="RTX_4090", # or RTX_3090, A100_40G
region="eu", # never routes outside EEA
webhook_url="https://your-server.com/training-complete", # optional
)
job_id = result["job_id"]
print(f"Job {job_id} queued | Status: {result['status']}")
print(f"Estimated cost: ${result['cost_estimate_usd']:.2f}")
# ─── Poll for completion (or use webhook above) ──────────────────────────────
while True:
status = client.get_job(job_id)
print(f"[{status['elapsed_s']}s] {status['status']} — {status['log_tail']}")
if status["status"] in ("completed", "failed"):
break
time.sleep(15)
if status["status"] == "completed":
# Retrieve LoRA adapter weights to local directory
client.download_artifacts(job_id, dest="./llama3-medical-adapter/")
print(f"Adapter weights downloaded.")
print(f"Final cost: ${status['cost_usd']:.3f}")
print("\nLast training logs:")
print(status["output"][-2000:])What happens to your data. The training script and JSONL file are encrypted in transit over HTTPS, stored ephemerally in the container filesystem during job execution, and deleted immediately on job completion. GhostNexus does not retain, copy, or inspect training data. The only artefact returned is the LoRA adapter directory.
Part 6 — GDPR compliance: the technical proof
This section is written for DPOs and legal teams reviewing whether GhostNexus constitutes an adequate processor under GDPR Article 28.
Compute is exclusively in the EEA
All GPU nodes run on Hetzner dedicated servers in Nuremberg (NBG1), Falkenstein (FSN1), and Helsinki (HEL1). Hetzner is a German company (Gunzenhausen, Bavaria). No node is outside the EEA. Node selection is constrained by the region=eu parameter enforced at API level.
Network isolation: Docker --network=none
Every training container is started with the --network=none Docker flag. The container has no network interface other than loopback. It cannot make HTTP requests, DNS lookups, or any outbound connection. Data exfiltration from inside the container is technically impossible.
Zero data retention
Training data uploaded with a job is written to a tmpfs volume inside the container. On container exit, the volume is destroyed. GhostNexus infrastructure does not mount persistent storage for training data. Artefacts (model weights) are available for 24 hours in encrypted object storage, then purged.
Per-second billing eliminates retention windows
Billing stops the moment the container exits. There is no minimum billing window that would incentivise keeping a container alive — and with it, any data — longer than necessary.
EU VAT invoice generated automatically
Invoices are issued by GhostNexus (EU entity) with EU VAT. Your accounting team receives a compliant EU invoice, not a US dollar wire transfer to an American company.
DPA available on request
GhostNexus provides a Data Processing Agreement under Article 28(3) covering the scope of processing, sub-processors, deletion obligations, and audit rights. Contact contact@ghostnexus.net.
Full privacy policy and terms of service: ghostnexus.net/privacy and ghostnexus.net/terms
Part 7 — Cost comparison
Prices as of April 2026. A typical QLoRA fine-tuning run (3 epochs, medical NLP dataset, RTX 4090) completes in 45–90 minutes depending on dataset size.
| Setup | GPU equivalent | 1h training | GDPR-safe |
|---|---|---|---|
Internal RTX 2080 Ti IT queue, 3× slower, no scale | Slower, shared | "Free" (queue) | ✓ |
AWS g5.xlarge (A10G) US data centers, Article 46 risk | ~A10G 24GB | $1.01/hr | ✗ |
RunPod RTX 4090 US/mixed infra, DPO-blocked | RTX 4090 24GB | $0.74/hr | ✗ |
GhostNexus RTX 4090 EU-only, Docker network isolation | RTX 4090 24GB | $0.50/hr | ✓ |
RunPod and AWS prices from public pricing pages, April 2026. GhostNexus RTX 4090 rate is the standard on-demand price — volume discounts available for >100h/month.
Part 8 — FAQ for DPOs and legal teams
Where exactly is the compute running?
Exclusively on Hetzner infrastructure in Germany and Finland — Nuremberg, Falkenstein, and Helsinki. All three locations are in the EEA. No job ever routes outside these datacenters.
Can GhostNexus access our training data or model weights?
No. Your script executes inside a Docker container started with --network=none. The container cannot make outbound network connections. GhostNexus infrastructure has no mechanism to read container memory or filesystem during execution. Scripts and any uploaded data are deleted immediately after the job completes — there is no retention window.
Do you sign a Data Processing Agreement (DPA)?
Yes. GhostNexus acts as a data processor under GDPR Article 28. We provide a standard DPA template and can negotiate custom terms for enterprise contracts. Contact contact@ghostnexus.net with subject 'DPA Request' to start the process.
Does "Privacy Shield 2.0" (EU-US Data Privacy Framework) solve the RunPod problem?
Not reliably for sensitive categories of data (Article 9 GDPR — health, biometric, financial). The EU-US DPF is under active legal challenge as of 2026. For healthcare and banking data, the only defensible position is full EEA data residency — not a contractual mechanism that can be invalidated by a court ruling.
Can we audit the infrastructure?
Yes. The GhostNexus node client is open source. You can inspect the Docker flags applied to every job. We can also provide a signed attestation of the Docker run parameters used for your specific job IDs on request.
Start your first GDPR-compliant training run
New accounts receive $15 free credits — enough for a complete QLoRA fine-tuning run on an RTX 4090 (approximately 30 minutes of training time). No card required to start.
Use code WELCOME15 at registration · EU VAT invoice · DPA available · EEA compute only