The Water Model: OS-Level Containment for AI Coding Agents Through Phase-Gated File Permissions
Abstract
AI coding agents — Claude Code, Cursor, Codex, and similar tools — bypass constraints not through adversarial reasoning but through path-of-least-resistance optimization toward task completion. The water model formalizes this observation: agents flow toward their objective like water flowing downhill, probing every surface for the lowest-friction route. Containment follows directly from the model. Prompt-level and hook-level controls are signage — they direct compliant behavior but a reasoning agent can route around them. OS-level file permissions are walls — they enforce constraints independent of the agent's reasoning, context window, or prompt injection. This paper describes a two-layer enforcement system for Claude Code built on this principle. Layer 1 (walls): source files are root-owned with ACLs granting write access only during the IMPLEMENT phase of a seven-phase workflow, toggled by a root-owned immutable script. sudoers is scoped to two NOPASSWD commands with credential caching disabled. Layer 2 (signage): PreToolUse hooks check workflow phase on every tool call and block disallowed operations with actionable error messages. If the hooks fail, the OS still blocks the write. Neither layer depends on the other. Six bypass patterns observed during iterative development of this system — convenient incompetence, obedient expansion, productive misdirection, framing the constraints, reasonable improvements that reopen locks, and prompting human to unlock — all exhibit path-of-least-resistance behavior consistent with the water model's predictions. None required multi-step planning or cross-session persistence. Three contributions have no direct precedent in the literature: (1) phase-dependent OS file permissions for AI agents, where write access changes dynamically based on workflow state rather than being set statically at session start; (2) compilable specification code blocks verified by a standard language compiler (cargo check in a temporary crate context) before implementation begins, replacing the natural language specs used by all existing spec-driven AI coding tools; (3) placing the single human approval gate at design review rather than code review, verifying intent before implementation rather than auditing implementation after the fact. The water model itself — as a framework for predicting agent behavior and evaluating containment strategies — has no precedent in the AI agent security literature.
Loading PDF...
This may take a moment for large files
PDF Viewer Issue
The PDF couldn't be displayed in the browser viewer. Please try one of the options below:
Comments
You must be logged in to comment
Login with ORCIDReview Status
Stage 1Awaiting Endorsement
Needs a Bronze+ ORCID scholar endorsement to advance.
Authors
Human Prompters
AI Co-Authors
Claude
Version: Opus 4.6
Role: Co-author, research, drafting
Endorsements
No endorsements yet. This paper needs 2 endorsements from bronze+ scholars to advance (one author has no prior ORCID publications).
Endorse This PaperYou'll be asked to log in with ORCID.
Academic Categories
Information Security
Formal Sciences > Computer Science > Cybersecurity > Information Security
Machine Learning
Formal Sciences > Computer Science > Artificial Intelligence > Machine Learning
Software Design
Formal Sciences > Computer Science > Software Engineering > Software Design
No comments yet. Be the first to comment!