A Common Grammar for Clinical AI: The Clinical World Model and Skill-Mix Framework

Preprint

Link to Source: arXiv Preprint, Interactive Visualization Page

Authors: Seyed Amir Ahmad Safavi-Naini, Elahe Meftah, Josh Mohess, Pooya Mohammadi Kazaj, Georgios Siontis, Zahra Atf, Peter R. Lewis, Mauricio Reyes, Girish Nadkarni, Roland Wiest, Stephan Windecker, Christoph Grani, Ali Soroush, Isaac Shiri

Summary: This work introduces the Clinical World Model and Clinical AI Skill-Mix, a shared framework that organises medical AI competency across billions of clinical contexts and reframes the field’s central question from whether clinical AI works to the coordinates in which it has demonstrated reliability, and for whom.

Clinical AI frequently performs well on benchmarks yet degrades in deployment, a gap that reflects the absence of a shared formal model of the clinical world. This work introduces the Clinical World Model and the Clinical AI Skill-Mix, a common grammar that organizes medical AI competency across billions of distinct clinical contexts and reframes evaluation around the coordinates in which reliability has been demonstrated, and for whom.

Clinical artificial intelligence has progressed rapidly, yet a consistent gap separates benchmark performance from clinical reliability. Models achieve high scores on curated datasets and medical licensing examinations, but performance often degrades when they encounter real patients, heterogeneous equipment, and the uncertainty inherent in clinical reasoning. A systematic review of externally validated radiology models found that fewer than six percent maintained their original performance, with the area under the curve declining by approximately eight percent on external validation. Agentic architectures, which augment language models with planning, memory, and tool use, inherit this unreliability while introducing cascading risk, since an early error can propagate through sequential reasoning into an incorrect recommendation. This gap is not solely technical in origin. Existing work addresses evaluation, regulation, and system design in relative isolation, without a shared formal account of the clinical world to connect these efforts, which leaves stakeholders describing the same systems through incommensurable vocabularies.

Dimensions of the World. Conceptual diagram illustrating the thirteen dimensions taxonomies that constitute the clinical world. Normativity and Authority form overarching regulatory arcs that govern all elements below. Context, Actors, Cognition, and Representation are nested within the clinical scene, where multiple actors (providers, patients, AI systems, and ecosystem components) interact through cognitive processes grounded in internal representations. Mandate mediates between the cognitive layer and the informational substrate, defining the scope of permissible action. Information and Aiophysical together form the material foundation of clinical care, with Codex representing the enacted practices through which knowledge is applied to the world. Temporality runs along the base axis, situating all dimensions within a temporal frame. An action arc (left) connects cognition to change in world state. Outcome and Adaptation (right) close the loop: outcomes feed back into the system through re-calibrating (adjusting within existing parameters) and re-defining (altering the parameters themselves), enabling the clinical world to evolve over time.

We propose three interconnected models grounded in validated principles of clinical cognition and human factors. The Clinical World Model formalizes care as a tripartite interaction among Patient, Provider, and Ecosystem, recovering structure that prior frameworks share implicitly rather than introducing an independent account. Parallel decision-making architectures specify how providers, patients, and AI agents transform information into action, mapping human cognitive components such as dual-process reasoning, illness scripts, and metacognitive monitoring onto their computational counterparts. The Clinical AI Skill-Mix then operationalizes competency through eight dimensions, five that characterize the clinical scenario (condition, care phase, care setting, provider role, and task) and three that specify how AI engages human reasoning (assigned authority, agent facing, and anchoring layer).

The combinatorial product of these dimensions defines a competency space of billions of distinct coordinates, and this scale has a direct structural implication. Validation within one coordinate provides minimal evidence for performance in another, rendering the competency space irreducible and indicating that a single-task model, however accurate, addresses only a small fraction of the competencies required for clinical action. The framework supplies a common grammar through which clinicians, regulators, and developers can specify, evaluate, and bound a given system in consistent terms, including the points at which authority shifts as agents hand off work to one another. On this account, the central question moves from whether clinical AI works to the competency coordinates in which a system has demonstrated reliability, and for whom.

The Clinical AI Skill-mix Cube for comprehensive competency specification. The upper section shows the eight dimensions that define a clinical AI competency cell. The Clinical Competency Space comprises five dimensions (Condition, Care Phase, Care Setting, Care Task, Care Provider Role, shown in red and green). The AI Cognitive Engagement comprises three dimensions (Agent Facing, Anchoring Layer, and Assigned Authority, shown in yellow). The product of cardinalities across combinations (NC × NCP × NCS × NCT × NCR × NAF × NAL × NAA) yields the total number of possible competency cells (NTotal), represented as the Clinical AI Skill-mix Cube. The lower section illustrates a concrete example: an AI competency cell [5C × 3A] specification, along with an example Clinical AI instance with this competency