Skip to content

LSP Diagnostics

After an AI agent edits a file, something needs to tell it whether the edit introduced errors. In traditional development, that feedback loop is the developer’s IDE — red squiggly lines under syntax errors, type mismatches highlighted inline. For an autonomous agent, LSP diagnostics serve the same purpose: they provide structured, language-aware feedback that the model can act on without running a full build. The challenge is timing (LSP servers are async), filtering (not all diagnostics are actionable), and formatting (the model needs to parse the feedback).

Commit: b9050e1d

Aider does not use LSP. Instead, it implements a multi-layer linting pipeline that runs after edits, combining tree-sitter syntax validation, Python-specific compile checks, and optional external linters.

aider/linter.py (305 lines) provides three validation levels:

Level 1 — Tree-Sitter Syntax Check (basic_lint, linter.py:262-269):

def basic_lint(self, fname, code):
lang = filename_to_lang(fname)
if not lang:
return
parser = get_parser(lang)
if not parser:
return
tree = parser.parse(bytes(code, "utf-8"))
errors = traverse_tree(tree) # Finds ERROR and MISSING nodes

This is language-agnostic: any file with a tree-sitter grammar can be syntax-checked. The traverse_tree() function walks the AST looking for ERROR or MISSING node types, which indicate parse failures.

Level 2 — Python Compile Check (lint_python_compile, linter.py:118-135):

def lint_python_compile(self, fname, code):
try:
compile(code, fname, "exec")
return ""
except SyntaxError as err:
# Extract line number from traceback

Only runs for Python files. Catches syntax errors that tree-sitter might miss (Python’s grammar is complex enough that tree-sitter sometimes accepts invalid code).

Level 3 — Flake8 (flake8_lint): Runs flake8 --select=E9,F821,F823,F831,F406,F407,F701,F702,F704,F706 — a curated set of fatal-only checks:

  • E9: Runtime errors (syntax errors, IO errors)
  • F821: Undefined name
  • F823: Local variable referenced before assignment
  • F831: Duplicate argument in function definition
  • F4xx: Import issues
  • F7xx: Statement-level errors

This avoids style warnings (E1-E5, W1-W6) that would create noise.

linter.py:234-256 uses grep_ast.TreeContext to render errors with surrounding code context:

context = TreeContext(
fname,
code,
color=False,
line_number=True,
child_context=False,
last_line=False,
margin=0,
mark_lois=True, # Lines of Interest
loi_pad=3, # 3 lines of context
show_top_of_file_parent_scope=True,
)
context.add_lines_of_interest(error_line_numbers)
context.add_context()

This produces output that shows the error line (marked with |) plus its enclosing scope (function/class), giving the model enough context to understand what went wrong.

After edit completion (base_coder.py:1599-1623):

if edited and self.auto_lint:
lint_errors = self.lint_edited(edited)
self.auto_commit(edited, context="Ran the linter")
self.lint_outcome = not lint_errors
if lint_errors:
ok = self.io.confirm_ask("Attempt to fix lint errors?")
if ok:
self.reflected_message = lint_errors
return

The reflected_message field is the injection mechanism. When set, the next iteration of the agent loop includes the lint errors as a user message, prompting the model to fix them. This creates a lint-fix-lint cycle bounded by the reflection limit (max 3 iterations).

commands.py:356-409 provides manual lint invocation:

  1. Scans all dirty files in the git repo
  2. Runs the linter pipeline on each
  3. Shows errors to the user
  4. Optionally creates a separate lint_coder instance to fix errors (line 400-409)
  5. Auto-commits fixes if enabled

set_linter(lang, cmd) allows users to specify custom lint commands per language:

--lint-cmd "python: ruff check --fix {fname}"
--lint-cmd "javascript: eslint {fname}"

The {fname} placeholder is replaced with the file path at invocation time.


Commit: 4ab44e2c5

Codex does not implement LSP diagnostics or any post-edit validation. The agent relies entirely on:

  1. Execution feedback: Running commands and observing their output (build errors, test failures)
  2. Model reasoning: The model’s own understanding of code correctness
  3. User feedback: The human reviewing and requesting corrections

There is no lsp/ directory, no diagnostic collection, and no lint integration in the Codex codebase. The config_loader/diagnostics.rs file exists but handles TOML configuration file parsing errors, not code diagnostics.

This is a deliberate design choice: Codex is tightly coupled to the OpenAI models API and relies on the model’s training to produce correct code. Post-edit validation is left to the user or to explicit tool invocations (running tests, building the project).


Commit: 7ed449974

OpenCode has the most comprehensive LSP diagnostic system. It runs 30+ language servers, collects diagnostics after every file edit, filters to errors only, and injects them back into the tool result that the model sees.

Three files form the diagnostic pipeline:

  1. lsp/client.ts (253 lines) — Per-server LSP client with diagnostic storage and debounced waiting
  2. lsp/index.ts (486 lines) — Global diagnostic aggregation across all servers
  3. lsp/server.ts (2047 lines) — 30+ server configurations with auto-download

Each LSP client maintains an in-memory diagnostic map (client.ts:51):

const diagnostics = new Map<string, Diagnostic[]>()

Keyed by normalized file path, populated by the textDocument/publishDiagnostics notification handler (client.ts:52-62):

connection.onNotification("textDocument/publishDiagnostics", (params) => {
const filePath = Filesystem.normalizePath(fileURLToPath(params.uri))
const exists = diagnostics.has(filePath)
diagnostics.set(filePath, params.diagnostics)
if (!exists && input.serverID === "typescript") return
Bus.publish(Event.Diagnostics, { path: filePath, serverID: input.serverID })
})

The TypeScript guard on line 60 (if (!exists && input.serverID === "typescript") return) is critical: TypeScript language server emits diagnostics immediately on textDocument/didOpen, which would flood the system with pre-existing errors. By skipping the first diagnostic publication per file, OpenCode only reports diagnostics caused by agent edits.

waitForDiagnostics() (client.ts:210-238) is the timing mechanism that makes the whole system work:

async waitForDiagnostics(input: { path: string }) {
return await withTimeout(
new Promise<void>((resolve) => {
unsub = Bus.subscribe(Event.Diagnostics, (event) => {
if (event.properties.path === normalizedPath &&
event.properties.serverID === result.serverID) {
if (debounceTimer) clearTimeout(debounceTimer)
debounceTimer = setTimeout(() => {
unsub?.()
resolve()
}, DIAGNOSTICS_DEBOUNCE_MS) // 150ms
}
})
}),
3000, // 3 second timeout
).catch(() => {}).finally(() => {
if (debounceTimer) clearTimeout(debounceTimer)
unsub?.()
})
}

The 150ms debounce (DIAGNOSTICS_DEBOUNCE_MS, client.ts:16): LSP servers often emit diagnostics in phases. TypeScript first emits syntax diagnostics, then semantic diagnostics. Pyright may emit partial results before the full analysis completes. The debounce waits for the server to “settle” — if no new diagnostics arrive within 150ms, the current set is considered complete.

The 3-second timeout: If a server is slow or broken, the agent doesn’t hang. After 3 seconds, whatever diagnostics have arrived are used. The .catch(() => {}) ensures a timeout is treated as “no diagnostics” rather than an error.

touchFile() (index.ts:277-289) is the entry point for notifying LSP servers about file changes:

export async function touchFile(input: string, waitForDiagnostics?: boolean) {
const clients = await getClients(input)
await Promise.all(
clients.map(async (client) => {
const wait = waitForDiagnostics
? client.waitForDiagnostics({ path: input })
: Promise.resolve()
await client.notify.open({ path: input })
return wait
}),
)
}

When waitForDiagnostics is true, the function:

  1. Starts listening for diagnostic events before sending the notification
  2. Sends textDocument/didOpen to all matching LSP clients
  3. Waits for the debounced diagnostic response (or 3s timeout)

This ordering is important — starting the listener before the notification prevents a race condition where fast servers emit diagnostics before the listener is ready.

diagnostics() (index.ts:291-301) collects diagnostics from all running LSP servers:

export async function diagnostics() {
const results: Record<string, LSPClient.Diagnostic[]> = {}
for (const result of await runAll(async (client) => client.diagnostics)) {
for (const [path, diagnostics] of result.entries()) {
const arr = results[path] || []
arr.push(...diagnostics)
results[path] = arr
}
}
return results
}

Multiple servers can provide diagnostics for the same file (e.g., TypeScript server + ESLint server). Results are concatenated.

LSP.Diagnostic.pretty() (index.ts:469-484) formats diagnostics for the model:

export function pretty(diagnostic: LSPClient.Diagnostic) {
const severityMap = {
1: "ERROR",
2: "WARN",
3: "INFO",
4: "HINT",
}
const severity = severityMap[diagnostic.severity || 1]
const line = diagnostic.range.start.line + 1 // 0-indexed to 1-indexed
const col = diagnostic.range.start.character + 1
return `${severity} [${line}:${col}] ${diagnostic.message}`
}

Diagnostics are injected into the tool call response that the model sees. Each edit tool handles this slightly differently:

Edit Tool (tool/edit.ts:133-143):

let output = "Edit applied successfully."
await LSP.touchFile(filePath, true)
const diagnostics = await LSP.diagnostics()
const issues = diagnostics[normalizedFilePath] ?? []
const errors = issues.filter((item) => item.severity === 1) // ERROR only
if (errors.length > 0) {
const limited = errors.slice(0, MAX_DIAGNOSTICS_PER_FILE) // Cap at 20
const suffix = errors.length > MAX_DIAGNOSTICS_PER_FILE
? `\n... and ${errors.length - MAX_DIAGNOSTICS_PER_FILE} more` : ""
output += `\n\nLSP errors detected in this file, please fix:\n<diagnostics file="${filePath}">\n${limited.map(LSP.Diagnostic.pretty).join("\n")}${suffix}\n</diagnostics>`
}

Apply Patch Tool (tool/apply_patch.ts:234-269): Multi-file version. For each changed file (excluding deletes), touches the file, waits for diagnostics, and appends errors. Uses relative paths for the file label.

Write Tool (tool/write.ts:56-73): Most comprehensive. Reports diagnostics for both the written file and other project files:

const MAX_DIAGNOSTICS_PER_FILE = 20
const MAX_PROJECT_DIAGNOSTICS_FILES = 5

The write tool caps at 5 other files with errors. This catches cascade errors — where writing one file causes errors in files that import it.

The model sees diagnostic feedback as XML-tagged sections appended to the tool result:

Edit applied successfully.
LSP errors detected in this file, please fix:
<diagnostics file="/path/to/file.ts">
ERROR [10:5] Variable 'x' is not defined
ERROR [15:10] Type 'any' is not assignable to type 'string'
... and 3 more
</diagnostics>

The <diagnostics> tags make it unambiguous for the model. The please fix instruction guides the model’s next action.

30+ language servers are configured in lsp/server.ts, each with:

  • Extension list: When to activate (e.g., .ts, .tsx, .js, .jsx for TypeScript)
  • Root detection: NearestRoot() searches for project markers (e.g., tsconfig.json, package.json)
  • Auto-download: If the server binary isn’t found, OpenCode downloads it from GitHub releases
  • Initialization options: Server-specific settings passed at startup

Key servers: TypeScript (Deno or tsc), Pyright, Gopls, Rust-Analyzer, Clangd, Ruby-LSP, JDTLS, ESLint, Biome, Svelte, Astro, Bash-LS, Terraform-LS, YAML-LS, Lua-LS.


LSP servers are asynchronous. After sending textDocument/didChange, the server may take anywhere from 10ms (syntax check) to several seconds (full type check on a large project) to emit diagnostics. OpenCode’s 150ms debounce + 3s timeout is a pragmatic compromise:

  • Too short a debounce and you get partial diagnostics (only syntax, missing semantic)
  • Too long and the agent waits unnecessarily
  • No timeout and a broken server hangs the entire agent

OpenCode filters to severity 1 (ERROR) only. This was likely learned through experience:

  • WARN: Style issues, unused variables — the AI shouldn’t try to fix these mid-task
  • INFO/HINT: Suggestions, not problems — noise for an agent
  • ERROR: Broken code — the agent must fix these before continuing

Aider’s approach of using only fatal flake8 checks (E9, F8xx) reflects the same principle.

The TypeScript Initial Diagnostics Problem

Section titled “The TypeScript Initial Diagnostics Problem”

TypeScript server emits diagnostics for pre-existing errors on didOpen. If an agent opens a file that already has 50 type errors, those shouldn’t be blamed on the agent’s edit. OpenCode’s guard (if (!exists && input.serverID === "typescript") return) handles this for TypeScript specifically, but the problem exists for all LSP servers. A more general solution would track diagnostics before and after the edit, reporting only the delta.

A single malformed import can cause hundreds of errors across a TypeScript project. Without the MAX_DIAGNOSTICS_PER_FILE = 20 cap, the tool result could consume the entire context window. The ”… and N more” suffix tells the model there are more errors without listing all of them.

The write tool reports diagnostics from other files (up to 5). This catches cascade errors, but it can also surface pre-existing problems in unrelated files. There is no mechanism to distinguish “errors caused by this edit” from “errors that already existed.” In practice, models handle this reasonably well — they tend to focus on the file they just edited.

LSP servers return file:// URIs. Tool inputs use filesystem paths. Windows uses backslashes; Unix uses forward slashes. Filesystem.normalizePath() ensures consistent comparison, but any missed normalization causes diagnostics to silently not match the edited file.

Aider’s tree-sitter + linter approach doesn’t require running a language server. This means:

  • No startup time (tree-sitter is instant, LSP servers can take seconds)
  • No resource overhead (language servers consume significant memory)
  • Deterministic results (no async timing issues)

The tradeoff is that tree-sitter catches only syntax errors, not type errors or semantic issues. For many editing workflows, syntax checking is sufficient.

Codex relies on the model being good enough to write correct code, and on the user running tests/builds explicitly. This works for OpenAI’s models (which have strong code generation capabilities) but would be risky for weaker models.


Architecture: Optional LSP with Tree-Sitter Fallback

Section titled “Architecture: Optional LSP with Tree-Sitter Fallback”

Two diagnostic sources, prioritized:

  1. LSP diagnostics (when a server is available and initialized)
  2. Tree-sitter syntax validation (instant, always available, no server needed)
pub struct DiagnosticResult {
pub file: PathBuf,
pub diagnostics: Vec<Diagnostic>,
pub source: DiagnosticSource, // Lsp { server_id } | TreeSitter | Linter { name }
}
pub struct Diagnostic {
pub severity: Severity, // Error, Warning, Info, Hint
pub range: Range, // start line:col, end line:col
pub message: String,
pub code: Option<String>, // e.g., "E0308" for rustc
}
pub enum Severity { Error, Warning, Info, Hint }
pub async fn collect_after_edit(
files: &[PathBuf],
lsp_manager: &LspManager,
timeout: Duration, // Default 3s
debounce: Duration, // Default 150ms
) -> Vec<DiagnosticResult> {
let mut results = Vec::new();
for file in files {
if let Some(client) = lsp_manager.client_for(file) {
// LSP path: notify + wait with debounce
client.did_change(file).await;
let diagnostics = client
.wait_for_diagnostics(file, debounce, timeout)
.await
.unwrap_or_default();
results.push(DiagnosticResult {
file: file.clone(),
diagnostics,
source: DiagnosticSource::Lsp { server_id: client.id() },
});
} else {
// Tree-sitter fallback
if let Some(diags) = tree_sitter_validate(file) {
results.push(DiagnosticResult {
file: file.clone(),
diagnostics: diags,
source: DiagnosticSource::TreeSitter,
});
}
}
}
results
}
pub fn format_for_model(
results: &[DiagnosticResult],
max_per_file: usize, // Default 20
max_other_files: usize, // Default 5
edited_file: &Path,
) -> Option<String> {
let mut output = String::new();
let errors: Vec<_> = results.iter()
.flat_map(|r| r.diagnostics.iter()
.filter(|d| d.severity == Severity::Error)
.map(|d| (&r.file, d)))
.collect();
if errors.is_empty() { return None; }
// Group by file, edited file first
// Cap per file, cap other files
// Format as XML: <diagnostics file="...">ERROR [line:col] message</diagnostics>
Some(output)
}
[diagnostics]
enabled = true
timeout_ms = 3000
debounce_ms = 150
max_per_file = 20
max_other_files = 5
severity_filter = "error" # "error", "warning", "info", "hint"
tree_sitter_fallback = true
[diagnostics.lsp]
auto_download = true
disabled_servers = [] # e.g., ["eslint"] to disable specific servers
  • openoxide-diagnostics — Diagnostic collection, filtering, formatting
  • openoxide-lsp — LSP client management (already designed in integration.md)
  • tree-sitter — Syntax validation fallback
  • tokio — Async debounce/timeout
  • Error-only by default — filter to Severity::Error to reduce noise (OpenCode pattern)
  • Tree-sitter fallback — always available, no server needed (Aider insight)
  • Debounce + timeout — 150ms/3s (OpenCode’s proven values)
  • XML tag formatting<diagnostics> tags for clear model parsing
  • Delta tracking — track diagnostics before and after edit, report only new errors (improvement over all reference implementations)
  • Per-file and project caps — prevent token explosion (20 per file, 5 other files)