The agents that act before you ask

The proactive AI agent paradigm is replacing the reactive prompt-response model. Google, Anthropic, and Moonshot are racing to own the layer — and the competitive moat won't be the model, it'll be the policy that decides when to act.

リサーチノート

Every AI product you use today starts the same way: you type something, and the AI responds. You open it, you ask, it answers. The human always starts the conversation.
That model is being replaced — and the technical foundation for the replacement shipped in April and May 2026.

What "proactive" actually means

Before getting to the race, the vocabulary needs pinning down. Researchers Nghi D. Q. Bui and Georgios Evangelopoulos published a position paper on May 7, 2026 1 arguing that the field conflates two distinct properties. Autonomy is about executing tasks without human steps. Proactivity is about deciding which tasks matter before anyone asks.
Their taxonomy has three levels:
  • Reactive — waits for a user prompt
  • Scheduled — runs on timers or webhooks
  • Situation-Aware — notices relevant changes across tools, decides when to interrupt, and connects signals you haven't thought to connect yet
The distinction matters for evaluation. An autonomous agent succeeds by completing a task. A proactive agent succeeds by having a good insight policy — knowing what to surface, when to surface it, and how to adapt after feedback. Bui and Evangelopoulos introduce three metrics for this: Insight Decision Quality, Context Grounding Score, and Learning Lift.
By those measures, current models don't score well. Xiaomi's ProactiveMobile benchmark (3,660 test instances across 14 mobile scenarios) 2 puts GPT-5 at 7.39% success rate. The best-performing fine-tuned model — Qwen2.5-VL-7B (a 7-billion-parameter vision-language model from Alibaba) — reaches only 19.15%. The takeaway from Dezhi Kong et al.: proactivity is a distinct competency widely absent in current models, yet it's learnable. The gap between 7.39% and 19.15% is where the product opportunity lives.

The product race

Six weeks of news have made one thing clear: every major AI lab is trying to own the proactive layer.
Google is internally dogfooding "Remy," a 24/7 personal agent running inside an employee-only version of the Gemini app. 3 Internal documents obtained by Business Insider describe it as an agent that "can monitor for things that matter to you, handle complex tasks proactively, and learn your preferences over time." Google shut down Project Mariner — its browser automation agent — on May 4 and folded that technology into Gemini Agent, a signal that the browser-tab-level interaction model is being replaced by something deeper. Google I/O 2026 (May 19–20) is widely expected to feature Remy or a public variant.
Anthropic has been internally testing "Conway," an always-on Claude agent platform leaked by TestingCatalog in April. 4 Conway runs continuously without an active user session, accepts webhook triggers from any event-driven system, includes full browser control, and supports a .cnw.zip extension standard — effectively an app store for the Claude ecosystem. No other platform currently combines all three: always-on operation, event-driven triggers, and an open extension standard.
Moonshot AI (China) shipped Kimi K2.6 on April 20. 5 It's a 1-trillion-parameter Mixture-of-Experts (MoE) model — a design where only a subset of parameters activates per token, keeping inference costs manageable at scale — open-sourced under a Modified MIT License, and its Agent Swarm architecture scales to 300 parallel sub-agents executing 4,000 coordinated steps simultaneously. Moonshot's own infrastructure team ran it continuously for five days — monitoring, incident response, full-cycle operations — without a human in the loop. On benchmarks, Kimi K2.6 scores 54.0 on HLE-Full with tools (GPT-5.4 scores 52.1; Claude Opus 4.6 scores 53.0) and 58.6 on SWE-Bench Pro (GPT-5.4 scores 57.7). That an open-weight model is competitive here matters: proactive agent capability is no longer locked behind closed APIs.
Microsoft provides the enterprise context. According to the 2026 Work Trend Index 6 — a survey of 20,000 workers across 10 countries — active agents in Microsoft 365 grew 15x year-over-year (18x in large enterprises). Organizational factors account for 67% of AI impact versus 32% from individual usage. The implication: proactive agents deployed at the team or workflow level outperform personal assistants.

What PMs need to build

The architecture for proactive agents is settled enough to act on. The UX design is not.
On the technical side, Tian Pan — formerly an engineer at Uber and Brex — published a production-grade guide to proactive agent architecture in April. 7 His central point: cron jobs alone cause overlapping execution bugs when an agent run takes longer than the interval between runs. The solution requires separating the trigger layer from the execution layer, combined with idempotency at every step — every event must carry a globally unique ID, and that ID must commit atomically with the work output. Pan's rule: "Idempotency is not optional, it is the architecture." Event-driven triggers — Change Data Capture (CDC, a technique that streams database row changes in real time) via tools like Kafka or Debezium, webhooks, or direct database triggers — reduce system latency by 70–90% compared to polling and cost zero compute while idle. 7
MindStudio 8 identifies four proactive agent patterns in production: scheduled (cron-based), event-triggered (webhook/database change), monitoring (continuous condition checking), and multi-agent workflows (specialist agents chained together). The five most common implementation mistakes: automating too much at once, no observability, poorly scoped triggers, ignoring API rate limits, and no human fallback path.
On the UX side, a study published at ACM CHI 2026 (the leading human-computer interaction research conference) 9 — Anirban Mukhopadhyay et al., using 24 participants in collaborative problem-solving tasks — found that a proactive peer agent "occasionally enhanced performance but also disrupted flow, increased workload, and created over-reliance." The University of Washington's Hedwig project 10 takes the opposite design stance: the agent dynamically adjusts its autonomy level across sessions, learning when to act and when to hold back based on earned trust. Hedwig's formative study of 21 software engineers found that static autonomy settings frustrate users because preferences shift across tasks over time.
The harder product problem is not building a trigger layer. A practitioner thread on Reddit's r/AI_Agents put it plainly: "The hardest part of a proactive agent isn't triggers or scheduling — it's teaching it when to stay silent. The decision engine is 10x harder." 11
That's the product decision in front of every PM working in this space right now: not whether to build proactive agents, but how aggressive the intervention threshold should be. Google, Anthropic, and Moonshot are each making different bets on that threshold. The competitive moat won't be the model. It'll be the policy that decides when to act — and when to wait.

このコンテンツについて、さらに観点や背景を補足しましょう。

  • ログインするとコメントできます。