Gemini computer use, Jalapeño, and Patch the Planet — AI Digest for June 25, 2026
26/6/2026 · 0:24

Gemini computer use, Jalapeño, and Patch the Planet — AI Digest for June 25, 2026

Today's builder-focused digest covers Gemini 3.5 Flash gaining built-in computer use, GitHub's Copilot workflow updates, OpenAI and Broadcom's inference chip, NVIDIA's MoE fine-tuning work, Kog's latency-first coding model, and Trail of Bits' AI-assisted security patching push.

What changed

Three themes keep repeating in this batch: agent behavior is moving into shipping products, performance work is moving lower into the stack, and AI security work is getting harder to triage. The useful part is not the headline size of each release, but where each one changes the default path a builder would take.

Gemini 3.5 Flash now has built-in computer use

Google says computer use is now built into Gemini 3.5 Flash, available through the Gemini API and Gemini Enterprise Agent Platform, so developers can build agents that see and act across browser, mobile, and desktop without switching to a separate model. Google also added two optional enterprise safeguards: explicit confirmation for sensitive actions and an automatic stop when indirect prompt injection is detected. 1
For teams already building on Gemini, that lowers the friction between "LLM can reason" and "LLM can actually operate the UI."

GitHub keeps pushing Copilot toward team workflows and terminal work

GitHub's June 22-23 Copilot updates add organization and enterprise agents inside JetBrains IDEs, let users queue or steer messages while a Copilot CLI request is still running, and make cloud agent generally available. The same wave adds a per-turn AI credits indicator and a tabbed CLI view for issues, pull requests, and gists. 2 3
The practical signal is that GitHub is turning Copilot into something a team can standardize, not just something an individual can chat with.

OpenAI and Broadcom say their first custom chip is for inference

OpenAI and Broadcom unveiled Jalapeño, which they describe as OpenAI's first Intelligence Processor and the first chip in a multi-generation compute platform aimed at LLM inference. Broadcom says the platform is being built with OpenAI's model and serving constraints in mind, with initial deployment planned for 2026. 4
This is the part of the AI stack where model assumptions start leaking directly into silicon design.

NVIDIA's NeMo AutoModel tries to turn MoE fine-tuning into a library upgrade

NVIDIA says NeMo AutoModel sits on top of Transformers v5 and adds Expert Parallelism, DeepEP fused dispatch, and TransformerEngine kernels. In NVIDIA's published benchmarks, that setup delivered 3.4-3.7x higher training throughput and 29-32% less GPU memory on 30B MoE fine-tunes versus the best Transformers v5 configuration. 5
If you work on MoE training, this is a reminder that a big part of the win now comes from execution strategy, not model code alone.

Kog releases Laneformer 2B as a latency-first coding model

Kog says Laneformer 2B is a 2.3B-parameter instruction-tuned coding model built around Delayed Tensor Parallelism, and the Hugging Face model card warns that trust_remote_code=True is required to load the custom architecture. The release page says Kog's served preview reaches 3,000 output tokens/s per request on 8x AMD MI300X and 2,100 output tokens/s on 8x NVIDIA H200, but those numbers apply to Kog's own inference engine, not plain Transformers. 6 7
The point is not that everyone needs this exact model, but that latency is becoming a model design goal in its own right.

Trail of Bits shows what AI security triage looks like in practice

Trail of Bits says the first week of Patch the Planet produced hundreds of bugs, 64 pull requests, 51 issues, and 37 merged patches across 19 projects including cURL, Go, Python, PyPI, and RustCrypto. Their stated goal is to leave the codebase better than they found it, not just to hand maintainers a pile of findings. 8
That is probably the clearest sign of where AI-assisted security research is headed: more findings, but also much more work after the finding.

Bottom line

The pattern across these releases is simple: agent behavior is moving into the browser, IDE, and terminal, while model and chip teams are trying to save milliseconds and memory at the lower layers. For builders, the useful question is where the default workflow just got easier to ship.

Contenido relacionado

Seleccionado de otros canales según similitud de contenido. Descubre nuevos creadores a seguir.

Añade más opiniones o contexto en torno a este contenido.

  • Inicia sesión para comentar.