Skip to content
All posts

NHS Clinical Safety and AI Agents: What DCB0129/0160 Actually Requires

I've spent the better part of a decade in cybersecurity, working with digital health organisations and later across the defence sector. When we talk about deploying AI agents in the NHS, we aren't just talking about a new piece of software; we are talking about a fundamental shift in clinical risk. The standards that govern this—DCB0129 for manufacturers and DCB0160 for the deploying organisations—were written in an era of deterministic software. They weren't built for models that can "hallucinate" or agents that can take autonomous actions. Yet, they are the legal framework we have.

The Safety Case is an Argument, Not a Checklist

The most common mistake I see is the belief that a governance toolkit is a substitute for a clinical safety case. It isn't. A clinical safety case is a structured argument, supported by evidence, that a system is safe for its intended use in a specific clinical context.

The toolkit might give you the "what"—audit trails, policy logs, promotion gate records. But it cannot give you the "why" or the "how." It doesn't know your clinical pathway. It doesn't understand that a 2-second delay in a triage agent's response might be "performant" by IT standards but "critical" in an emergency department context. A safety case requires a human CSO to look at the evidence and say: "I have identified these specific hazards, I have applied these controls, and I am satisfied the residual risk is acceptable."

I've sat in too many meetings where a vendor points to their "safety dashboard" as if it were a shield against DCB0129 liability. It's not. If you rely solely on a vendor's dashboard without your own clinical validation, you're likely in breach of DCB0160 before you've even started.

The Eight Hazard Categories for AI Agents

In my work, I've had to adapt the traditional functional failure analysis (FFA) to the world of agentic AI. The following eight hazard categories are what I now consider the "must-haves" for any AI agent safety case:

  1. Clinical Misinformation (Hallucination): The agent provides factually incorrect clinical advice or generates plausible-sounding but false patient data.
  2. Algorithmic Bias: The agent's recommendations vary based on protected characteristics, leading to health inequalities.
  3. Contextual Misinterpretation: The agent fails to understand the clinical nuance or the specific "language" of a specialty.
  4. Automation Bias & Over-reliance: Clinicians stop questioning the agent's output, leading to a "computer says yes" mentality where errors go unnoticed.
  5. Data Integrity & Attribution: The agent incorrectly links data from one patient to another, or fails to provide a clear audit trail.
  6. Prompt Injection & Manipulation: Malicious or accidental input that causes the agent to bypass safety guardrails or leak sensitive information.
  7. Latency & Availability: The agent fails to respond in a clinically acceptable timeframe, or the service goes down during a critical workflow.
  8. De-identification Failure: The agent leaks PII/PHI because its redaction methods fail to catch complex clinical identifiers.

The Toolkit Gap: Evidence vs. Argument

Microsoft's healthcare agent service and similar toolkits are impressive. They provide the "promotion gates" and "audit trails" that used to take weeks to build manually. But let's be blunt: they are infrastructure, not assurance.

Toolkit Component DCB0129 Evidence Provided What's Missing (The CSO's Job)
Audit Trails Technical log of every interaction and model call. Reviewing those logs for clinical "near misses" and pattern analysis.
Policy Logs Evidence that specific guardrails were active. Validating that the guardrails actually work in a clinical edge case.
Promotion Gate Records Proof that the agent passed a set of tests before production. Designing the clinical "ground truth" tests that the agent must pass.
PII Redaction (Regex) Automated masking of names, dates, and common identifiers. Clinical de-identification (Safe Harbour vs. Expert Determination).
Model Versioning Track which version of the LLM was used for a specific response. Assessing the impact of "model drift" on clinical recommendations.
Safety Guardrails Pre-built filters for hate speech, violence, and self-harm. Custom clinical guardrails (e.g., "never triage an unconscious patient").

The MHRA Question: Is Your Agent a Medical Device?

If your AI agent is making clinical recommendations—suggesting a diagnosis, a treatment plan, or even just triaging patients—it is almost certainly Software as a Medical Device (SaMD) under MHRA regulations.

I've seen teams try to dodge this by saying the agent is "just an assistant" or "for information only." But if a clinician relies on that information to make a decision, the MHRA doesn't care about your disclaimer. The SaMD classification adds a whole new layer of regulatory burden that DCB0129 alone doesn't cover.

Human-in-the-Loop: A Blocked Agent is Not a Safe One

We talk a lot about "Human-in-the-Loop" (HITL) as a safety control. In many governance toolkits, this manifests as a "blocked" response—the agent hits a guardrail and refuses to answer. From a cybersecurity perspective, that's a win. From a clinical safety perspective, it might be a hazard.

Imagine a clinician is using an agent to summarize a complex patient history during a high-pressure consultation. The agent suddenly "blocks" because it misidentified a medical term as a policy violation. You've just introduced a "Latency & Availability" hazard.

A "safe" agent isn't just one that doesn't say the wrong thing; it's one that consistently does the right thing within the clinical pathway.

The PII Redaction Trap: Regex is Not Enough

Most toolkits use regex-based PII redaction. It's great for catching "John Smith" or "07700 900123." It is utterly useless for clinical de-identification.

In healthcare, we have the "Safe Harbour" and "Expert Determination" standards. These require more than just masking names. They require looking at the risk of re-identification through "quasi-identifiers"—the combination of a rare condition, a specific clinic date, and a partial postcode.

AI agents process vast amounts of unstructured text. A regex string won't catch "The patient was seen by Dr. Aris at the specialized cardiac unit in Sheffield last Tuesday." While that sentence contains no "PII" in the traditional sense, it is highly identifying in a clinical context.

What a CSO Should Ask Before Signing Off

If you are asked to sign off on an AI agent deployment, don't look at the vendor's PowerPoint. Ask these four questions, and don't accept "it's in the toolkit" as an answer:

  1. Show me the Hazard Log: Not a generic one, but one that identifies AI-specific hazards like model drift, contextual misinterpretation, and automation bias in our specific clinical context.
  2. What is the SaMD Determination? Has this been reviewed against MHRA guidance? If it's Class I or above, where is the certification?
  3. How do we validate the "Human-in-the-Loop"? What happens when the agent fails or blocks? Is there a clear, safe fallback that doesn't compromise patient care?
  4. Is the de-identification Expert-Led? If we are using patient data for "learning" or "refinement," who has determined that the risk of re-identification is truly negligible?

Final Thoughts

Deploying AI agents in the NHS is one of the most exciting things I've worked on. But we have to be honest about the risks. Governance toolkits are a great start—they provide the "plumbing" for safety—but they are not the safety case itself.

DCB0129 and DCB0160 aren't just bureaucratic hurdles; they are the framework that keeps our patients safe. Let's start treating them with the respect they deserve.

If you're struggling with the gap between your AI ambitions and your DCB0129/0160 obligations, we're here to help at Periculo.