ai-compliancetrade-pressNewsThe Broadside1 min read

VA keeps clinical AI chatbots below high-impact review

The patient-safety problem is not that clinicians tried AI, but that VA treated clinical documentation prompts like office productivity.


TL;DR

VA’s Office of Inspector General said the department failed to designate VA GPT and Microsoft 365 Copilot Chat as high-impact AI uses, even as clinicians used them for patient documentation prompts. The gap matters because VA applied high-impact safeguards to its ambient AI scribe tool but not to similar chatbot-driven documentation work. OIG said the setup limits error monitoring, retrospective labeling of AI-generated documentation and patient-safety oversight.

VA’s AI governance problem is no longer hypothetical spreadsheet risk. The Office of Inspector General found that VA GPT and Microsoft 365 Copilot Chat were being used in clinical settings without the high-impact classification VA gave to its ambient AI scribe tool, even though the watchdog said the chatbot prompts had similar documentation functionality.

That distinction is the story. VA can reasonably say these general-purpose chatbots were not designed to make clinical decisions. It cannot make that fact do all the work. OIG said VA does not centrally curate or evaluate prompts or generative outputs that could be applied to clinical decision-making, and has no AI-specific reporting mechanism or labeling process to identify AI-generated documentation after the fact.

The scale is not trivial. OIG reviewed a VA AI-focused Microsoft Teams channel with 10,997 active users over a 90-day period and identified 135 voluntarily shared prompts, 79 of them clinical. Prompt quality is not a clerical detail in medical generative AI; the watchdog noted that prompt techniques can affect output errors that may influence diagnosis and management.

OIG made three recommendations: address chatbot use and oversight, evaluate whether the tools should be treated as high-impact and carry safeguards, and fold AI-related risk monitoring into existing patient-safety programs. VA concurred with two and concurred in principle with an oversight review. For practitioners, the immediate lesson is narrower and less comfortable: if a tool can produce clinical documentation, governance cannot stop at whether the vendor called it a general chat tool.


Published ·Deep Fathom