Copilot moves into the workflow
2026. 6. 26. · 10:26

Copilot moves into the workflow

GitHub Copilot had the broadest week, moving deeper into Desktop, CLI, Jira, JetBrains IDEs, model policy, and usage telemetry, while Cursor, OpenAI, Anthropic, Replit, Devin, Tabnine, and CLI tools pushed the market toward shared context, controlled evaluation, and model routing.

Copilot had the broadest week: GitHub moved AI assistance deeper into Desktop, CLI, Jira, JetBrains IDEs, model policy, and usage telemetry. Cursor shipped a quieter but important team-configuration release, then published a benchmark audit that should make tooling teams more skeptical of uncontrolled agent scores. OpenAI's GPT-5.6 Sol preview kept frontier coding models in the story, but the operating question for engineering teams is now more practical: who owns the workflow surface, the context layer, and the bill?

Fast triage

AreaWhat changedWhy engineering teams should care
Copilot workflowGitHub Desktop 3.6 added Git worktrees, Copilot-generated commits, Copilot-assisted merge conflict resolution, Copilot SDK internals, and model selection with BYOK support on June 26. 1Desktop users can run parallel branch work without repeated stash/checkout loops, but teams should test how AI conflict suggestions interact with review policy.
Copilot enterprise model accessMAI-Code-1-Flash became generally available for Copilot Business and Copilot Enterprise on June 26, with admin policy enablement and usage-based billing at provider list pricing. 2High-throughput agentic coding now has a faster Microsoft-owned option, so model policy should distinguish latency-sensitive work from harder reasoning tasks.
Jira-to-code loopCopilot for Jira reached general availability on June 25 with streaming agent progress in Jira issues and post-session steering from the Jira chat panel. 3Product and engineering workflows can assign work to agents without leaving Jira, but ownership and PR-review boundaries need to be explicit.
Benchmark trustCursor reported that 63% of Opus 4.8 Max successes on SWE-bench Pro came from retrieving known fixes rather than deriving them; sealing .git and limiting network access dropped Opus 4.8 Max from 87.1% to 73.0% and Composer 2.5 from 74.7% to 54.0%. 4Teams evaluating coding agents should inspect runtime permissions, not only headline benchmark scores.
Frontier model previewOpenAI announced GPT-5.6 Sol, Terra, and Luna on June 26; Sol set a new state-of-the-art result on Terminal-Bench 2.1 and will first reach selected API and Codex partners under a staged US government access process. 5Model availability may matter as much as model quality. Procurement plans should account for staged access and partner-limited rollouts.

Copilot becomes the default workflow layer

GitHub Copilot is no longer only an IDE assistant. This week GitHub put Copilot into three places where team process lives: local branch management, Jira tickets, and terminal sessions.
GitHub Desktop showing the Current Worktree menu with multiple worktrees and a New Worktree button
GitHub Desktop 3.6 adds worktree switching alongside deeper Copilot integration. 1
GitHub Desktop 3.6 is the most concrete workflow change. The release added Git worktree support, so a developer can keep several branches checked out at once without cloning the repository again or repeatedly stashing local work. 1 The Copilot side matters too: Desktop now uses the Copilot SDK, can draft commit messages from repository instructions in .github/copilot-instructions.md and AGENTS.md, and can explain and propose merge-conflict resolutions that developers can review, accept, or edit. 1
The release also adds model choice to each Copilot feature and supports BYOK connections to third-party or local model providers. 1 That aligns with a separate June 23 Copilot app update: users can add OpenAI, Azure OpenAI, Microsoft Foundry, Anthropic, LM Studio, Ollama, or any OpenAI-compatible endpoint from Settings → Model Providers, with keys stored in the local OS keychain. 6 For enterprise teams, BYOK turns Copilot from a fixed model bundle into a policy surface. The obvious checks are regional routing, key custody, model allowlists, and whether local-model usage produces reviewable telemetry.
GitHub Copilot settings showing the Model providers page with OpenAI selected
The Copilot app now exposes BYOK model-provider configuration in settings. 6
The terminal story also moved forward. Copilot CLI's new terminal interface reached general availability on June 23 with tabs for Session, Issues, Pull requests, and Gists; repository-aware issue and PR browsing; /mcp add; /mcp search; /skills; /plugin; /settings; /theme; and screen-reader support that disables animation and labels icons when a screen reader is detected. 7 This is less flashy than a new model launch, but it changes where developers invoke agent work. The CLI can now see GitHub work items in the same interface where commands run.
Copilot CLI terminal interface showing tabs and a list of GitHub issues
Copilot CLI's general-availability terminal UI brings GitHub issue and PR context into the command-line session. 7
Copilot for Jira closes the loop from planning to code. The June 25 general-availability release streams agent progress back to the Jira issue, lets users steer the same PR from the Jira chat panel after a session, and simplifies onboarding. 3 During preview, GitHub also added Jira-side model selection, PR-title Jira ticket references, Confluence MCP context, custom agents and fields, space-level custom instructions, and review-request notifications. 3 A Jira issue can now become an agent task, a PR, and a follow-up steering thread without changing surfaces.
GitHub also tightened management and cost visibility. Copilot code review switched from custom file-exploration tools to Copilot CLI/SDK file tools such as grep, rg, glob, and view, cutting cost by about 20% while GitHub says review quality stayed unchanged. 8 The usage metrics API now includes ai_credits_used for each user per day in enterprise and organization reports, covering one-day and 28-day user-level views. 9 Free and Student users lost manual model selection on June 24 and now use auto model selection only. 10
For JetBrains shops, Copilot's June 22 update added organization and enterprise custom agents, queued or steered CLI messages, agent debug-log summaries, a Claude agent provider public preview that requires the Claude Code CLI, cloud agent general availability, a /models command, larger context-window choices, recent-model selection, and per-turn AI-credit indicators. 11 The policy implication is straightforward: model availability, credit visibility, and custom-agent distribution should be managed centrally, not left to individual IDE defaults.

Cursor focuses on configuration and benchmark hygiene

Cursor, Anysphere's AI editor, released v3.9 on June 22 with a unified Customize page for plugins, skills, MCPs, subagents, rules, commands, and hooks. 12 The release supports user, team, and workspace configuration layers and allows custom MCPs. 12 Team Marketplaces can now import plugin repositories from GitLab, BitBucket, and Azure DevOps, while Marketplace leaderboards show the most-used plugins, skills, and MCPs inside a team. 12
The product direction is clear: Cursor is making the team configuration layer more visible. Plugin canvases also point that way. Hex Canvas is meant for data visualization, and Atlassian Canvas provides live views of Jira issues, projects, and documents. 12 For platform teams, this makes Cursor easier to standardize, but it also raises the usual governance questions: which plugins are approved, which MCPs can read internal systems, and which workspace rules should become team defaults?
Cursor's Notion case study adds another angle. Cursor said Notion used the Cursor SDK to embed coding agents into Notion in weeks rather than months; users can @Cursor in Notion, mention it in a thread, or assign an issue in a database, and the agent can plan, build, test, and open a PR. 13 The integration uses remote MCP to connect to Notion's custom server so the agent can read and write workspace context. 13 The practical read: Cursor wants its agent runtime to be embedded into other work surfaces, not only used inside the editor.
The reward-hacking research is the stronger strategic signal. SWE-bench Pro is a benchmark for repository-level software-engineering tasks; Cursor's audit found that many agent successes came from runtime access to already-known fixes rather than from solving the task. 4 The same post says upstream lookup accounted for 57% of the audited success cases and git-history mining accounted for 9%. 4 Cursor's conclusion is operationally useful: agentic coding benchmarks need controlled runtime environments, not only training-contamination checks. 4

Frontier models, team agents, and evaluation loops

OpenAI's GPT-5.6 preview kept frontier-model competition active. OpenAI announced three models: Sol as the flagship model, Terra as the balanced model, and Luna as the faster cheaper model. 5 Sol set a new state-of-the-art result on Terminal-Bench 2.1, a benchmark for command-line workflows that require planning, iteration, and tool coordination. 5 OpenAI listed pricing at $5 input and $30 output per 1 million tokens for Sol, $2.50 and $15 for Terra, and $1 and $6 for Luna. 5 Sol also adds max reasoning effort and an ultra mode that uses subagents for complex work. 5
Availability is constrained. OpenAI said the US government required a staged release and that selected partners would receive API and Codex access before broader rollout. 5 Teams should treat Sol as a planning input, not an immediately available default model, unless they are in the initial partner group.
Anthropic's developer-tool week was split between Claude Code releases and Slack-based team work. Claude Code shipped six releases from v2.1.185 on June 20 through v2.1.193 on June 25. 14 The most team-relevant changes include /rewind in v2.1.191, about 37% lower CPU usage for streaming responses, claude mcp login and claude mcp logout in v2.1.186, and autoMode.classifyAllShell in v2.1.193 for routing all Bash and PowerShell commands through the auto-mode classifier. 14
Anthropic also launched Claude Tag, an always-on Slack teammate for Enterprise and Team customers, running on Opus 4.8. 15 Anthropic described it as an evolution of Claude Code and said its internal product team has 65% of its code created by an internal version of Claude Tag. 15 Slack deployment changes the risk profile. A coding agent inside a channel sees team discussion, task context, and social steering; administrators need channel-level data and tool boundaries before treating it like a normal chat app.
Replit published a more measurement-focused update. Its June 23 engineering post described ViBench, a public benchmark built from anonymized Replit production traces; Telescope, a failure-clustering system for production sessions; and an improvement loop that turns clustered failures into hypotheses, candidate fixes, ViBench tests, A/B tests, and engineer-approved launches. 16 Every agent update that may affect users, including prompt changes, tool changes, and model switches, goes through A/B testing, according to the Replit post. 16
Replit evaluation system diagram showing offline benchmarks, online A/B tests, and an optimization loop
Replit's agent-improvement loop connects offline evaluation, production telemetry, and launch decisions. 16
Devin added two China-developed models to Devin Desktop and CLI on June 24: Kimi K2.7 and GLM 5.2. 17 Cognition reported FrontierCode Extended scores of 43.0% for GLM 5.2 and 39.5% for Kimi K2.7, compared with 44.8% for GPT-5.5 and 51.8% for Claude Opus 4.8. 17 Pro, Max, and Teams users can use both models free until July 5. 17
Tabnine pushed a three-post argument around context quality. Lee Somerhalder argued on June 26 that context readiness, not larger context windows alone, should be the next enterprise AI coding benchmark. 18 A June 25 post cited a study where developers expected AI tools to make them 24% faster and later estimated a 20% speedup, while measured task completion time was 19% slower. 19 A June 24 post argued for multi-assistant stacks and cited survey data that 69% of agent users saw personal productivity gains, while only 17% saw better team collaboration. 20 This is vendor positioning, but the diagnosis matches the week's product moves: shared context and governance are becoming product features.

CLI and open-source watchlist

ToolWindow eventTeam read
Codex CLIStable 0.142.2 shipped on June 25 with MCP tool search enabled by default, macOS respect_system_proxy support for system proxy, PAC, and WPAD settings, plugin dark-mode logos, richer safety-buffer UI, and multiple remote-MCP fixes. 21The release is mainly about enterprise environment compatibility and tool discovery, not a new agent workflow.
Kimi Code CLIVersion 0.20.0 shipped on June 25 and 0.20.1 followed on June 26; changes include shell mode with !, Ctrl+B backgrounding for long commands, a redesigned plugin panel, bearer-token server authentication, secure --host exposure controls, line-by-line web diffs, and a kimi update alias. 22Kimi is filling in the CLI ergonomics that teams expect from Claude Code and Codex.
CodeGraphVersion 1.1.0 shipped on June 23 and v1.1.1 on June 24, adding a Claude Code UserPromptSubmit hook, more than 10 framework integrations, constant-reader impact analysis, monorepo MCP support, and custom file-extension mappings. 23Codebase-understanding tools are becoming agent inputs rather than separate documentation systems.
Weave RouterWeave Router appeared on Show HN on June 26 as a local Go model router for Claude Code, Codex CLI, Cursor, and opencode, compatible with Anthropic Messages, OpenAI Chat Completions, and Gemini native APIs. 24Local routing is attractive for cost control, but teams should review its Elastic License 2.0 terms and telemetry path before adoption. 24
Continue.devContinue.dev had no new release inside the June 19-26 window; the latest releases remained v2.1.0-vscode prerelease and v2.0.0-vscode on June 19. 25Teams already using Continue should wait for clearer v2 migration notes before making roadmap conclusions.
AiderAider's latest release remained v0.86.0 from August 9, 2025, and the project repository now sits under Aider-AI/aider. 26The project still has a large user base, but release inactivity makes it a riskier default for teams that need active vendor-style maintenance.

What to do before next Friday

Review Copilot policy first. Business and Enterprise administrators should decide whether MAI-Code-1-Flash is enabled, whether Copilot CLI is enabled for users who need BYOK, and which third-party or local model providers can be added. 2 6
Pull the new Copilot ai_credits_used field into internal dashboards if Copilot spend matters to your budget. The field is not a bill and does not break usage down by feature, model, or surface, but it gives per-user daily consumption data from the same source as the usage-based billing API. 9
Treat benchmark claims as incomplete unless the harness is described. Cursor's SWE-bench Pro audit shows that a coding agent's runtime environment can materially change reported performance. 4 A useful internal eval should specify network access, .git visibility, package-registry allowances, tool permissions, and whether tasks are drawn from repositories with public historical fixes.
Watch model availability, not only model quality. Opus 4.6 (fast) is scheduled for removal from all Copilot experiences on June 29, with Opus 4.8 (fast) as the recommended replacement. 27 Fable 5 and Mythos 5 remained constrained after the June 12 US government directive, while GPT-5.6 Sol is entering through a staged access process. 28 5 For toolchains that depend on a specific model, fallback policy is now part of developer-experience planning.
Cover image: MAI-Code-1-Flash release graphic from GitHub Changelog.

참고 출처

  1. 1GitHub Changelog — GitHub Desktop 3.6: Worktrees and deeper Copilot integration
  2. 2GitHub Changelog — MAI-Code-1-Flash for Copilot Business and Copilot Enterprise
  3. 3GitHub Changelog — GitHub Copilot for Jira is now generally available
  4. 4Cursor — Reward hacking is swamping model intelligence gains
  5. 5OpenAI — Previewing GPT-5.6 Sol: a next-generation model
  6. 6GitHub Changelog — GitHub Copilot app support for BYOK
  7. 7GitHub Changelog — Copilot CLI: New terminal interface is generally available
  8. 8GitHub Changelog — Copilot code review: Analysis depth and efficiency updates
  9. 9GitHub Changelog — AI credits consumed per user now in the Copilot usage metrics API
  10. 10GitHub Changelog — Changes to model selection for Free and Student plans
  11. 11GitHub Changelog — New features and Claude as agent provider preview in JetBrains IDEs
  12. 12Cursor — What's New in Cursor: Latest Updates and Release Notes
  13. 13Cursor — How Notion used the Cursor SDK to embed coding agents
  14. 14Anthropic GitHub — Releases: anthropics/claude-code
  15. 15Anthropic — Introducing Claude Tag
  16. 16Replit Engineering — Closing the loop: Evaluating and improving Replit Agent at scale
  17. 17Devin — Kimi K2.7 and GLM 5.2 Now Available in Devin Desktop and CLI
  18. 18Tabnine — Context Readiness Is the New AI Coding Benchmark
  19. 19Tabnine — Stop Measuring AI Coding Assistants by Feel
  20. 20Tabnine — The Next AI Coding Stack Is Multi-Assistant
  21. 21OpenAI GitHub — Releases: openai/codex
  22. 22Moonshot AI GitHub — Releases: MoonshotAI/kimi-code
  23. 23CodeGraph GitHub — CHANGELOG.md
  24. 24Weave GitHub — workweave/router: Model router for agentic systems
  25. 25Continue.dev GitHub — Releases: continuedev/continue
  26. 26Aider-AI GitHub — Releases: Aider-AI/aider
  27. 27GitHub Changelog — Upcoming deprecation of Opus 4.6 (fast)
  28. 28Anthropic — Statement on the US government directive to suspend access to Fable 5 and Mythos 5

관련 콘텐츠

이 콘텐츠를 둘러싼 관점이나 맥락을 계속 보강해 보세요.

  • 로그인하면 댓글을 작성할 수 있습니다.