Coding agents produce diffs that don't apply cleanly — wrong line numbers, missing context, corrupted files
devtoolsdevtools0 views
When a coding agent (local or cloud) generates a file edit as a diff or search-and-replace, the patch frequently fails to apply. The line numbers are wrong because the agent's view of the file is stale. The context lines don't match because the agent hallucinated surrounding code. Partial edits leave the file in a broken state — half old code, half new code, syntax errors everywhere. This happens 10-20% of the time on real codebases. So what? Every failed edit requires the developer to manually inspect the diff, understand what the agent intended, and hand-apply the change. This is more work than writing the code themselves. Worse, when the agent retries a failed edit, it often makes a different mistake or compounds the original error. A 3-step refactoring task where step 2's diff fails to apply means steps 1 and 3 are also wasted. Why does this persist in the first place? LLMs generate diffs as text tokens — they do not have a structured representation of the file's AST or a real line-number index. The model is guessing line numbers from whatever file content was in its context window, which may be truncated, outdated, or incomplete. There is no feedback loop: the model generates a diff, but it does not verify the diff applies before presenting it. Structured edit formats (AST-based transforms, tree-sitter patches) exist but no major agent framework uses them.
Evidence
Aider's benchmark shows 15-25% of edits fail to apply on first attempt depending on the model. Claude Code uses search-and-replace to mitigate this but still fails when old_string is not unique. Cursor applies edits speculatively and sometimes corrupts files. GitHub Copilot Workspace diffs frequently have wrong line numbers on large files.