Coding Agents
Coding agent targets evaluate AI coding assistants and CLI-based agents. These targets require a judge_target to run LLM-based evaluators.
Prompt format
Section titled “Prompt format”Agent providers receive a structured prompt document with two sections: a preread block listing files the agent must read, and the user query containing the eval input.
File handling
Section titled “File handling”When an eval test includes type: file inputs, agent providers do not receive the file content inline. Instead, they receive:
- A preread block with
file://URIs pointing to absolute paths on disk - The user query with
<file: path="...">reference tags
The agent is expected to read the files itself using its filesystem tools.
This differs from LLM providers, which receive file content embedded directly in the prompt as XML:
<file path="src/example.ts">// file content is inlined here</file>Example prompt
Section titled “Example prompt”Given an eval with a guideline file and a file input:
input: - role: user content: - type: file value: ./src/example.ts - type: text value: Review this codeThe agent receives a prompt like:
Read all guideline files:* [guidelines.md](file:///abs/path/guidelines.md).
Read all input files:* [example.ts](file:///abs/path/src/example.ts).
If any file is missing, fail with ERROR: missing-file <filename> and stop.Then apply system_instructions on the user query below.
[[ ## user_query ## ]]<file: path="./src/example.ts">Review this codeThe preread block instructs the agent to read both guideline and input files before processing the query. If a system_prompt is configured on the target, it is passed separately via the provider SDK (not in the prompt document).
Claude
Section titled “Claude”targets: - name: claude_agent provider: claude workspace_template: ./workspace-templates/my-project judge_target: azure_base| Field | Required | Description |
|---|---|---|
workspace_template | No | Path to workspace template directory |
cwd | No | Working directory (mutually exclusive with workspace_template) |
judge_target | Yes | LLM target for evaluation |
Codex CLI
Section titled “Codex CLI”targets: - name: codex_target provider: codex workspace_template: ./workspace-templates/my-project judge_target: azure_base| Field | Required | Description |
|---|---|---|
workspace_template | No | Path to workspace template directory |
cwd | No | Working directory (mutually exclusive with workspace_template) |
judge_target | Yes | LLM target for evaluation |
Pi Coding Agent
Section titled “Pi Coding Agent”targets: - name: pi_target provider: pi-coding-agent workspace_template: ./workspace-templates/my-project judge_target: azure_base| Field | Required | Description |
|---|---|---|
workspace_template | No | Path to workspace template directory |
cwd | No | Working directory (mutually exclusive with workspace_template) |
judge_target | Yes | LLM target for evaluation |
VS Code / Copilot
Section titled “VS Code / Copilot”targets: - name: vscode_dev provider: vscode workspace_template: ${{ WORKSPACE_PATH }} judge_target: azure_base| Field | Required | Description |
|---|---|---|
executable | No | Path to VS Code binary. Supports ${{ ENV_VAR }} syntax or literal paths. Defaults to code (or code-insiders for the insiders provider). |
workspace_template | Yes | Path to workspace template directory |
judge_target | Yes | LLM target for evaluation |
Using a custom executable path:
targets: - name: vscode_dev provider: vscode executable: ${{ VSCODE_CMD }} workspace_template: ${{ WORKSPACE_PATH }} judge_target: azure_baseVS Code Insiders
Section titled “VS Code Insiders”targets: - name: vscode_insiders provider: vscode-insiders workspace_template: ${{ WORKSPACE_PATH }} judge_target: azure_baseSame configuration as VS Code.
Custom CLI Agent
Section titled “Custom CLI Agent”Evaluate any command-line agent:
targets: - name: local_agent provider: cli command: 'python agent.py --prompt-file {PROMPT_FILE} --output {OUTPUT_FILE}' workspace_template: ./workspace-templates/my-project judge_target: azure_base| Field | Required | Description |
|---|---|---|
command | Yes | Command to run. {PROMPT} is inline prompt text and {PROMPT_FILE} is a temp file path containing the prompt. |
workspace_template | No | Path to workspace template directory |
cwd | No | Working directory (mutually exclusive with workspace_template) |
judge_target | Yes | LLM target for evaluation |
Mock Provider
Section titled “Mock Provider”For testing the evaluation harness without calling real providers:
targets: - name: mock_target provider: mock