Issue 08: The discovery layer arrives — ARD, agent finder, token-thrifty Copilot, and AI logs you can query

The practical theme this week is simple: agent stacks are getting less static. Tools, registries, repo context, build logs, and Kubernetes telemetry are all being turned into things an AI assistant can discover or query at runtime. That is useful. It also moves more responsibility onto engineering teams to govern what agents can see, how much context they burn, and what they are allowed to install.

Development	What changed	Why product engineers should care
Agentic Resource Discovery + GitHub agent finder	Microsoft introduced ARD with a broad partner group, and GitHub shipped agent finder for Copilot on the same specification. 1 2	Agent capability selection starts to look like search, not a hand-maintained config file.
Hugging Face Discover	Hugging Face published a reference implementation that exposes Skills, ML applications, and MCP servers through ARD search. 3	Public tool ecosystems are becoming searchable surfaces for agents.
GitHub Code Quality pricing	GitHub said Code Quality will become generally available on July 20, 2026, priced at $10 per active committer per month plus usage-based AI work. 4	AI-assisted review and quality gates are moving into budgetable platform spend.
Copilot token-efficiency work in VS Code	The VS Code team detailed prompt caching, tool search, and WebSocket transport changes used to cut token use and latency. 5	The harness around the model is now a major cost-control layer.
GitHub CLI remote repo reads	`gh repo read-file` and `gh repo read-dir` let users and agents inspect repository content without cloning. 6	Lightweight repo inspection becomes scriptable and agent-friendly.
Microsoft Binlog MCP Server + Logfire Kubernetes view	Microsoft exposed MSBuild `.binlog` analysis through an MCP server, while Pydantic added Kubernetes inventory and trace drill-downs to Logfire. 7 8	Build failures and runtime failures are being packaged as queryable agent context.

Capability discovery is becoming a runtime interface

ARD is the week's highest-leverage infrastructure signal. Microsoft describes it as an open specification for publishing, indexing, and discovering AI capabilities, developed with Cisco, Databricks, GitHub, GoDaddy, Google, Hugging Face, Nvidia, Salesforce, ServiceNow, Snowflake, and others. 1 The operating model is intentionally search-like: a publisher exposes structured metadata, a registry indexes it, and an AI client asks which capability fits the current task before invoking that capability through its own protocol. 1

GitHub's agent finder is the first mainstream developer-workflow example. A Copilot user describes a task in plain language; agent finder searches an index of MCP servers, skills, canvases, agents, and tools; Copilot can then pull ranked matches into context on demand. 2 GitHub says enterprises can point it at a curated public catalog or a private registry, scope discovery through managed Copilot settings, and keep installation explicit rather than automatic. 2

Hugging Face's implementation shows how quickly this could spread beyond IDEs. Its Discover Tool wraps Hub search as ARD catalog entries and exposes Skills, ML applications, and MCP servers through a CLI, REST endpoint, and MCP endpoint. 3 The implementation filters for Spaces whose runtime stage is RUNNING, supports media types for AI skills, MCP server entries, and raw Space metadata, and publishes a well-known ai-catalog.json entry point. 3

Abstract visual for Agentic Resource Discovery connecting agent capabilities — Microsoft introduced ARD as a discovery layer for AI capabilities before invocation, with partner implementations from GitHub and Hugging Face arriving the same day. 1

The engineering question is no longer "which tools did we preinstall?" It is "which registries do we trust, and what evidence does an agent need before it loads a capability?" Treat ARD adoption like dependency management. Publisher identity, permission boundaries, provenance, and audit logs matter as much as ranking quality.

Cost pressure is moving into the agent harness

The VS Code Copilot team published one of the more useful engineering writeups of the week because it turns token cost into concrete harness mechanics. The starting point is blunt: usage-based Copilot billing makes every token in an agentic session affect credits, latency, and remaining context; the team also says token use per task has been rising across new model generations. 5

Line chart showing token usage per turn increasing across successive model generations — VS Code's Copilot team used this chart to frame why harness efficiency now matters: rising tokens per turn make prompt caching and tool deferral product features, not internal cleanup. 5

Three implementation details are worth stealing:

Mechanism	Reported result	Engineering read
Extended prompt caching for supported OpenAI models	VS Code enabled `prompt_cache_retention: "24h"`; for 40-60 minute gaps, the cache hit-rate increase ranged from +279% on GPT-5.3-Codex to +919% on GPT-5.4. 5	Session resumption is a cost feature. If your own product has long agent sessions, measure cold-start prompts after idle gaps.
Tool search for OpenAI models	In a four-day GPT-5.4 and GPT-5.5 experiment, tool search reduced P50 total tokens per turn by 9.81% and 8.61%, respectively; median total session token use fell 8.97% and 10.92%. 5	Stop sending every tool schema on every turn. Keep heavy definitions out of context until the model asks.
WebSocket transport	GitHub made WebSockets the default transport for OpenAI models GPT-5.2 and newer across Copilot products after rollout data showed lower TTFT and completion time versus HTTP. 5	Agent loops are request chains. Transport overhead compounds over tool calls.

The Anthropic side has the same lesson with a different API shape. VS Code reworked cache breakpoints around stable prompt boundaries and rolling anchors; the team says agentic workloads now sit at around a 94% cache hit rate. 5 In a seven-day experiment, deferred Anthropic tool definitions cut P50 prompt tokens by 11.30% per turn and 18.32% per user; total tokens fell 11.09% per turn and 18.03% per user. 5

For product teams, the actionable pattern is to track harness metrics alongside model metrics: prompt prefix stability, cache hit rate, tool-schema bytes per turn, tool-search miss rate, TTFT, and cost per completed task. Model choice still matters, but the wrapper can waste or save double-digit percentages.

GitHub is turning code quality into a priced platform layer

GitHub Code Quality is leaving public preview on July 20, 2026. GitHub says more than 10,000 enterprises used the preview to detect maintainability and reliability issues, enforce quality gates, and track coverage. 4 The paid model has three parts: a $10 per-active-committer monthly license on enabled repositories, usage-based billing for AI-powered capabilities such as Copilot code review and Autofix, and GitHub Actions minutes for deterministic CodeQL analysis. 4

The product shift matters more than the price point. New GA-era capabilities include organization-wide deployment, org-level quality dashboards, code-coverage enforcement through rulesets, repository and organization quality scoring, and APIs for enablement and findings management. 4 That is a move from "AI review as pull-request helper" toward "AI-assisted code governance as an org control plane."

If you run GitHub at org scale, the next task is inventory. Which repositories should be enabled? Which quality gates should block merges? Which AI-powered paths can consume usage-based spend? The danger is letting every repo discover the bill at the same time the feature turns on.

Repo context and build logs are becoming callable surfaces

GitHub CLI v2.95.0 added two preview commands, gh repo read-file and gh repo read-dir, for reading files and directories from a remote repository without cloning it. 9 The changelog says the commands work across public and private repositories the user can access, and GitHub explicitly lists AI agents and workflows as a use case. 6 The release notes add useful automation details: the commands accept --ref for branch, tag, or commit targeting, and support --json, --jq, and templates. 9

That is small but important. Many coding-agent mistakes start with stale or partial context. A tool that reads go.mod, package.json, .github/workflows/*, or a repo policy file without a full checkout is exactly the kind of narrow, cheap context fetch agents should prefer before attempting broad repository ingestion.

Microsoft's Binlog MCP Server applies the same idea to build debugging. It parses MSBuild .binlog files and exposes 15 specialized tools for build failure diagnosis, property tracing, performance analysis, and build comparison. 7 Microsoft groups the tools into build investigation, embedded files, performance analysis, and build comparison; examples include binlog_errors, binlog_explain_property, binlog_expensive_targets, and binlog_compare. 7

GitHub Copilot in VS Code agent mode calling binlog MCP tools to diagnose a build failure — The Binlog MCP Server turns a dense MSBuild binary log into a set of callable diagnostic tools for agent-mode assistants. 7

The install path also shows where the ecosystem is heading. Visual Studio can discover the server through Copilot agent mode after installing the dotnet-msbuild plugin; VS Code users can enable plugin support or wire the MCP server in .vscode/mcp.json; terminal assistants such as Copilot CLI or Claude Code can install it from the dotnet/skills marketplace. 7

Observability is being shaped for AI operators

Pydantic's Logfire Kubernetes view is a production-AI-infra release rather than a coding-agent feature, but it fits the same pattern. The new view shows clusters, nodes, namespaces, workloads, pods, and images from one page, with restart counts rolled up at every level and one-click drill-down from a pod, namespace, or workload to the traces it produced. 8

The implementation leans on standard OpenTelemetry plumbing: kubeletstats for pod and container metrics, k8scluster for cluster inventory, and k8sattributes to stamp pod, namespace, and deployment identifiers onto spans. 8 The docs describe six sortable lenses, Clusters, Nodes, Namespaces, Workloads, Pods, and Images, and recommend the upstream opentelemetry-kube-stack Helm chart for setup. 10

This is relevant if your AI product has real traffic behind it. Model latency is rarely the only failure mode. A chatbot that starts returning 500s after a rollout may be failing because one pod is OOMKilled, two pods are on an old image digest, and the trace points to a tokenizer dependency that doubled memory use. That exact debugging path is the scenario Pydantic uses to explain the feature. 8

What to try before next week

Audit your agent's always-loaded tool surface. Count how many tool definitions and JSON schemas enter the prompt on every turn. If the answer is "all of them," copy the Copilot pattern and test deferred loading or tool search.
Add a registry trust policy before ARD shows up by default. Decide which internal or public registries an agent may query, how publisher identity is verified, and what needs human approval before installation.
Use gh repo read-file for targeted context. Let agents inspect config files, lockfiles, CI definitions, and docs at a specific ref before they ask for a clone or broad codebase ingestion.
For .NET builds, capture .binlog files in failed CI runs. The Binlog MCP Server is most useful when the diagnostic artifact already exists.
For Kubernetes-backed AI apps, check trace enrichment. If spans do not carry pod, namespace, workload, and image metadata, your assistant cannot correlate app traces with rollout and restart state.
Budget Code Quality before July 20. The combination of per-committer licensing and metered AI review means repository enablement should be a deliberate rollout, not a surprise default.

If you only have one hour, start with tool-surface accounting. The week's releases all point in the same direction: agents will get access to more capabilities, but the teams that win will be the ones that load less by default and prove more before execution.