There is no specification language for agent tasks — every instruction is ambiguous natural language

devtools0 views
Every agent task is described in natural language: "refactor this module," "fix the login bug," "research competitors." There are no formal acceptance criteria, no type-checked task definitions, no machine-verifiable success conditions. So what? The same instruction produces different results every run. "Fix the login bug" might mean fix the null pointer, fix the UI, fix the error message, or rewrite the whole auth flow depending on how the model interprets it. You cannot build reliable workflows on top of non-deterministic task interpretation. Every prompt is a prayer, not a specification. Why does this matter in the first place? Software engineering spent 50 years developing specification languages (types, schemas, contracts, test assertions) specifically because natural language is ambiguous and humans misinterpret each other constantly. We solved this problem for human-to-human communication in code. Now we are reintroducing the same ambiguity with human-to-agent communication and pretending natural language is fine. It is not fine — it is the same problem, and it needs the same class of solution: a formal way to define what "done" means that both humans and agents can agree on before execution starts. The structural reason: building a task specification language requires solving the hard problem of formally describing open-ended work, which is in tension with the appeal of agents ("just tell it what to do in English").

Evidence

No agent framework supports typed task definitions with machine-checkable acceptance criteria. GitHub Actions, Terraform, and CI/CD systems have declarative specs — agent tasks do not. Every agent eval benchmark (SWE-bench, METR, HELM) defines tasks with natural language + hidden test suites that the agent cannot see, which is not a scalable pattern.

Comments