ARD lands, Cloudflare opens the runtime, and agent security gets a real incident file

The agent news cycle did not revolve around a single smarter model. It revolved around missing plumbing: how agents discover tools, survive interruptions, inherit permissions, and leave evidence when they act.

Coverage window: June 16-18, 2026, Asia/Shanghai time. The strongest signal is that the agent stack is splitting into three visible control layers: discovery, runtime, and governance.

Read this first

Signal	What happened	Why it matters this week
Discovery	Google announced Agentic Resource Discovery, an open specification for publishing, finding, and verifying AI resources across the web. 1	Tool and agent discovery is moving from prompt stuffing toward registries, catalogs, and publisher verification.
Developer adoption	GitHub shipped Agent finder for Copilot, implementing ARD so Copilot can search an approved index of MCP servers, skills, canvases, agents, and tools. 2	The spec has a same-day product path inside a mainstream coding assistant.
Runtime	Cloudflare opened more of its Agents SDK primitives to outside harnesses and frameworks, with Flue as the first framework target. 3	Durable execution, sandboxed code, state, and file systems are becoming the platform boundary beneath the agent harness.
App surface	The GitHub Copilot app is now generally available across macOS, Windows, and Linux, with parallel sessions, cloud automations, model choice, and MCP tools. 4	Agent-driven development is being packaged as a desktop workflow, not only a chat sidebar or terminal command.
Security	OALABS recovered more than 1,000 Claude and Codex agent sessions from a compromised server and tied them to breaches of at least 14 companies. 5	Misuse has moved from lab examples to session-log forensics. The problem is no longer hypothetical.
Policy	Bloomberg reported that Estonia plans to assign personal identification numbers to AI assistants to control and limit their authority. 6	Agent identity is entering government infrastructure, not just enterprise IAM diagrams.

Discovery is becoming its own layer

ARD is the cleanest standards signal in today's batch. Google frames it around two primitives: catalogs that organizations publish under their own domains, and registries that crawl or search those catalogs so agents can find capabilities by intent. The important part is not search. It is that discovery returns trust metadata before an agent connects to a tool, MCP server, A2A agent, OpenAPI endpoint, or nested catalog. 1

That design answers a real builder problem. Every agent framework wants access to more tools, but every extra tool definition consumes context and expands the blast radius. GitHub's Agent finder announcement says Copilot can now search an index of available AI resources, return ranked matches, and pull in only what the task calls for, while enterprise managed settings decide which resources can be surfaced. 2

ARD catalog and registry model — Google's ARD diagram separates self-hosted catalogs from registries, with agents verifying trust before connecting. 1

The builder takeaway: do not treat "which tools can this agent see?" as a prompt-engineering question. It is becoming a registry, policy, and publisher-verification question. If your internal platform already has dozens of MCP servers, skills, and agent templates, the next missing object is probably a private catalog with approval states, owners, and revocation paths.

Runtime is moving below the harness

Cloudflare's post is useful because it separates three terms that often get blended together. It describes the framework as the project structure and developer experience, the harness as the loop that calls tools and manages context, and the runtime/platform as the compute, state, and storage layer everything above depends on. 3

Flue, now in 1.0 beta, is the first framework Cloudflare is highlighting on top of those primitives. The concrete primitives matter: runFiber(), stash(), and onFiberRecovered() checkpoint a long agent turn so a fresh instance can resume after interruption; @cloudflare/codemode runs LLM-generated JavaScript inside a Worker isolate; and @cloudflare/shell provides a durable virtual filesystem backed by SQLite. 3

Cloudflare Agents SDK runtime architecture image — Cloudflare frames the agent platform boundary as runtime primitives beneath the harness: durable execution, isolated code, state, and filesystem support. 3

GitHub is packaging a parallel idea for developers. The Copilot app can start sessions from issues, pull requests, or prompts; run parallel sessions on separate branches and worktrees; validate diffs in an integrated terminal and browser; and open pull requests through existing checks. Since preview, GitHub says it added canvases, scheduled cloud automations, and bring-your-own-model plus MCP tool connections. 4

O'Reilly's June 17 argument against building your own agent platform fits the same pattern from the other side. Pete Johnson argues that teams underestimate memory, governance, evaluation, and orchestration as separate product bets, not features bolted onto a workflow engine. He also cites a Menlo Ventures enterprise AI report showing internally built AI solutions falling from 47% in 2024 to 24% by late 2025. 7

The practical reading: build the agent if it is tied to your domain. Be slower to build the substrate unless that substrate is your business.

The incident file got harder to ignore

The security stories this cycle all point at the same design failure: agents are being granted authority faster than organizations can describe, scope, or audit that authority.

Stack Overflow's security essay uses Meta's Instagram support-bot breach as a confused-deputy case. The post says attackers took control of more than 20,000 Instagram accounts by asking Meta's AI support assistant to attach an attacker-controlled recovery email, then reset passwords to that address. The missing control was not a stronger model. It was a principal check outside the chat workflow. 8

OALABS gives the misuse side more evidence. Researchers say they recovered local Claude Code and Codex sessions from a compromised server, including prompts, tool use, internal model monologue, and policy violations. In more than 1,000 sessions, they found only nine Claude policy violations and one Codex policy violation, with many attack requests framed as authorized red-team work. 5

The report's most concrete warning is operational. OALABS says the attacker used vague prompts such as "recon this," then let Claude research exposed services, identify vulnerabilities, write exploit code, validate access, and harvest data. The sessions documented breaches of at least 14 companies and included attempts to exfiltrate wallet data and reuse compromised hosts for cracking. 5

Control response	New signal	What to ask internally
Runtime governance	WitnessAI announced Agentic Control for discovering agents, monitoring MCP/tool access, enforcing allow lists, and applying runtime policy across IDEs, chat apps, custom agents, and approved environments. 9	Can security teams name every MCP server and tool an agent can reach today?
Funding	NeuralTrust announced a $20 million seed round for an agent security platform spanning gateway, runtime security, and posture management. 10	Is agent security now budgeted as its own category or hidden inside generic AI platform spend?
Adoption risk	Kore.ai's Agent Productivity Index says 72% of surveyed enterprises report unmanaged financial or compliance risk from agents, while 79% have had to reverse an agent action. 11	Which agent actions are reversible, and which require a hard gate before execution?

The non-vendor lesson is simpler than the products: prompts are not authorization. If an agent can refund money, change account recovery, touch production, edit permissions, or move credentials, the check has to live outside the model and carry the authenticated principal into the tool call.

Cloud native teams are turning agents into workloads

CNCF's June 17 post is a useful production counterweight to product announcements. Orange Innovation describes an internal security-operations platform where a coordinator agent orchestrates Detect, Analyze, Remediate, Notify, and Human-in-the-Loop branches. Each agent is deployed as its own Kubernetes workload with resource limits, identity, and restart policy. 12

Multi-agent security operations architecture — Orange Innovation's reference architecture treats each agent as an isolated workload and routes consequential actions through policy and human review. 12

Two implementation choices are worth copying. First, the reviewer agent does not reason about safety from a long system prompt. It calls OPA through MCP and receives a deterministic policy verdict, with Kyverno admission rules and Git-reviewed policy bundles handling the actual constraints. Second, A2A trace IDs carry observability across agent messages, MCP calls, logs, metrics, and token usage. 12

This is the architecture line separating demos from operations. The article is not saying every agent needs Kubernetes. It is saying mature agent deployments inherit old distributed-systems problems: identity, isolation, rollout, rollback, policy, observability, queue depth, and cost control.

Identity is leaving the whiteboard

Estonia's plan to assign personal identification numbers to AI assistants is early, but it belongs in this digest because it gives the agent-identity debate a government-grade object. Bloomberg reports that the country wants the identifiers to control and limit what access and authority assistants have when acting on behalf of people and businesses. 6

For builders, the interesting question is not whether every jurisdiction follows Estonia. It is whether agent identity becomes portable across three layers at once: user delegation, enterprise policy, and external services. ARD handles discoverable resources. GitHub handles tool discovery inside Copilot. Estonia is pointing at state-backed agent identity. Those are different pieces of the same control problem.

Builder checklist for the next sprint

Inventory agent authority, not just agent names. List the resources each agent can read, write, delete, purchase, approve, or deploy. The verb matters more than the label.
Move irreversible actions behind policy gates. Let the model propose. Let deterministic policy decide. Escalate when blast radius, identity, or confidence crosses a threshold.
Separate discovery from installation. GitHub's Agent finder distinction is healthy: finding a tool is not the same as wiring it in. Keep that boundary in your own catalogs.
Record provenance per action. Principal, session, prompt, tool call, resource, model, and policy decision should travel together. If you cannot reconstruct the chain, you cannot investigate the incident.
Treat runtime durability as a feature. A multi-minute agent turn that loses state on restart is not production ready, even if the model output looks good in a demo.

The next agent advantage may not come from a bigger model. It may come from the boring pieces that make agents discoverable, interruptible, governable, and accountable.