
Top-conf paper digest — week of June 12–15, 2026
Nine papers posted June 11–12 with confirmed ICML 2026 or CVPR 2026 acceptance: one ICML Oral (Riemannian Metric Matching, 400× faster geometry estimation), one ICML Spotlight (CSPO safe RL), one CVPR Highlight (ALVTS visual token compression, 89% ratio / 96.7% accuracy), plus six ICML 2026 main papers covering LLM persona pruning, T2I concept removal, agent UI privacy, constrained flow matching, spatial transcriptomics, and multi-system forecasting.

リサーチノート
Nine papers posted June 11–12 with confirmed ICML 2026 or CVPR 2026 acceptance. Coverage spans vision–language efficiency, geometric deep learning, safe RL, agent privacy, constrained generative models, and scientific ML. One ICML Oral, one ICML Spotlight, one CVPR Highlight.
| # | Paper | Venue | Area |
|---|---|---|---|
| 1 | ALVTS | CVPR 2026 Highlight | Vision / LLM efficiency |
| 2 | Riemannian Metric Matching | ICML 2026 Oral | ML methods |
| 3 | CSPO | ICML 2026 Spotlight | RL |
| 4 | Persona-Pruner | ICML 2026 | LLM |
| 5 | ForceForget | ICML 2026 | Generative / safety |
| 6 | Minim | ICML 2026 | Agents |
| 7 | PolyFlow | ICML 2026 | Generative / robotics |
| 8 | HiST | ICML 2026 | Scientific ML |
| 9 | Once-for-All (ESE) | ICML 2026 | ML methods / forecasting |
Vision / LLM efficiency
ALVTS — Adaptive layer-wise visual token selection in LVLMs (CVPR 2026 Highlight)
Area: Vision · arXiv: 2606.14277 · Authors: Yongru Chen et al. · Code: available 1
Core problem. Large vision-language models (LVLMs) carry hundreds of visual tokens through every transformer layer. Existing pruning methods drop tokens once at a fixed layer — those tokens are gone for all subsequent computation, which can flush region-specific features that deeper layers still need.
Method. ALVTS adds a lightweight token selector at each transformer layer. Tokens flagged as low-importance are routed around the current layer (skip-connection), not discarded; both the skipped stream and the kept stream merge again before entering the next layer. The selector itself is trained with an importance-consistency constrained low-rank approximation that mimics full attention without retraining the base model.
Key result. At an 89% token compression ratio, ALVTS retains 96.7% of the original model's accuracy on LLaVA-1.5, LLaVA-NeXT, and Qwen2.5-VL benchmarks. Prior static-pruning methods at the same compression rate lose substantially more. 1
Takeaway for practitioners. The paper shows that different layers attend to different visual regions — a fact that static pruning ignores. If you're deploying a LVLM under latency constraints, ALVTS is a plug-in accelerator that does not require model retraining.
Status: Accepted at CVPR 2026 (Highlight). No code repo linked in the abstract.
ML methods
Riemannian Metric Matching — scalable geometric modeling of distributions (ICML 2026 Oral)
Area: ML methods · arXiv: 2606.14334 · Authors: Jacob Bamberger, Adam Gosztolai, Pierre Vandergheynst, Michael Bronstein, Iolo Jones 2
Core problem. Estimating the Riemannian geometry of high-dimensional data typically requires building k-NN graphs or kernel matrices, both of which scale as O(n²) or O(n³) with dataset size. This makes diffusion-based geometry estimators impractical on large or high-dimensional datasets.
Method. The paper reframes learning the carré du champ operator — the foundational object in diffusion geometry — as a conditional expectation over random perturbations of data points. Because this expectation is computed point-wise, you can train a neural network to produce amortized, sample-wise geometry estimates without constructing any graph or kernel matrix.
Key result. Metric matching rivals k-NN-based diffusion geometry estimators in accuracy while enabling amortized inference that is up to 400× faster. On high-dimensional images where nearest-neighbor methods break down entirely, metric matching still produces valid geometric estimates. 2
Takeaway. Any downstream task that uses diffusion distances, Laplacian eigenvectors, or intrinsic dimensionality estimates can substitute metric matching for the k-NN step and get a substantial wall-clock win.
Status: ICML 2026 Oral.
Once-for-All (ESE) — scalable simultaneous multi-system forecasting (ICML 2026)
Area: ML methods / forecasting · arXiv: 2606.13285 · Authors: Beinan Xu, Andy Song, Jiti Gao, Feng Liu 3
Core problem. Many real-world forecasting settings require coordinated predictions across multiple interacting systems (currency exchange, disease spread). Existing methods predict each system sequentially, ignoring inter-system dependencies and spending O(N) inference cost.
Method. Equilibrium State Estimation (ESE) first estimates the global equilibrium state across all systems simultaneously, then generates per-system forecasts based on the gap between each system's current state and that equilibrium. The pass is done once for all systems together, giving linear-time complexity in N.
Key result. ESE matches or exceeds SOTA accuracy on currency exchange and COVID-19 spread benchmarks while delivering a 10–70× speedup over per-system methods as the number of systems increases. ESE also plugs into existing predictors as a wrapper, combining their accuracy with ESE's speed. 3
Status: Accepted at ICML 2026. Code available.
RL
CSPO — constraint-sensitive policy optimization for safe RL (ICML 2026 Spotlight)
Area: RL · arXiv: 2606.14415 · Authors: Ayoub Belouadah, Sylvain Kubler, Yves Le Traon 4
Core problem. Primal-dual safe RL methods (constrained MDPs) rely on Lagrange multipliers to penalize constraint violations. Because multipliers update slowly relative to the policy, violations persist for many steps before the penalty catches up — the classic "delayed correction" problem.
Method. CSPO augments the primal policy objective with a correction term derived from the shortest signed distance to the safety boundary. This gives the optimizer a local geometric signal about how close the current policy is to violating constraints, letting it take smarter recovery steps without waiting for the multiplier to accumulate. The correction preserves the original KKT solutions.
Key result. On navigation and locomotion benchmarks, CSPO achieves faster safety recovery and higher constrained returns than state-of-the-art primal-dual and penalty-based baselines. The paper does not report a single-number lift (original text presents benchmark curves), but the Spotlight selection reflects consistent empirical gains. 4
Takeaway. The distance-to-boundary correction is modular — it can be layered onto existing primal-dual implementations without changing the Lagrangian structure.
Status: ICML 2026 Spotlight.
LLM
Persona-Pruner — sculpting lightweight models for role-playing (ICML 2026)
Area: LLM · arXiv: 2606.14695 · Authors: Jinsu Kim, Jihoon Tack, Noah Lee, Jongheon Jeong · Code: github.com/jsu-kim/Persona-Pruner 5
Core problem. Deploying many simultaneous NPC agents in interactive environments (games, simulators) requires running a full generalist LLM per persona — a compute cost that doesn't scale. Standard pruning doesn't work here: it treats persona-critical weights the same as redundant factual knowledge and destroys role-playing consistency.
Method. Persona-Pruner takes a text description of a character and uses it to isolate a persona-specific sub-network. The insight is that a character's behavioral identity occupies only a fraction of a model's total parameter capacity. The pruning procedure identifies which weights are essential for the target persona rather than pruning uniformly.
Key result. Persona-Pruner reduces the performance drop from the dense model by up to 93.8% on RoleBench (LLM-as-a-judge score) compared to the strongest baseline pruning method, while still preserving general LLM capabilities. 5
Takeaway. The method suggests that persona-conditional pruning is a substantially different problem from generic model compression — and that persona descriptions alone are enough signal to guide weight selection.
コンテンツカードを読み込んでいます…
Status: ICML 2026. Code public.
Generative models / safety
ForceForget — reinforcement concept removal for safer T2I models (ICML 2026)
Area: Generative models / safety · arXiv: 2606.14351 · Authors: Dong Han, Yong Li 6
Core problem. Text-to-image (T2I) models can generate unsafe content. Current concept-erasing methods often over-erase: they suppress benign visual concepts that share semantic space with harmful prompts (e.g., removing "knife" from kitchen scenes while targeting violence).
Method. ForceForget formulates concept erasure as an RL problem: it optimizes a concept erasing reward (CER) that penalizes unsafe generation while explicitly preserving semantically safe interpretations. A lightweight "Safe Adapter" is inserted into cross-attention layers to regulate only partial text embeddings, limiting the blast radius of erasure.
Key result. ForceForget outperforms prior concept-erasing SOTA on safety metrics while maintaining higher fidelity in benign image generation. It also shows better robustness against red-teaming tools and extends cleanly to image-to-image scenarios and general concept removal (styles, objects). The paper does not report a single-number summary metric; results are presented across multiple evaluation axes. 6
Status: ICML 2026.
Hölder++ — quality-coherence trade-off in multimodal VAEs (ICML 2026)
Area: Generative models · arXiv: 2606.13381 · Authors: Huyen Vo, María Martínez-García, Isabel Valera 7
Core problem. Multimodal VAEs face a trade-off: models with high generation quality tend to produce samples that are inconsistent across modalities, and models with high cross-modal coherence tend to generate lower-quality samples. Hölder pooling partially addressed coherence in prior work but used an approximation.
Method. Hölder++ resolves this in three steps: (1) exact (non-approximate) Hölder pooling for multimodal aggregation; (2) an extended architecture (Hölder+) that maintains both shared and modality-specific (private) representations; (3) hierarchical inference (Hölder++) that further disentangles the shared and private spaces.
Key result. Hölder++ consistently improves the quality-coherence trade-off over MMVAE+ and prior Hölder-based baselines, yields more structured latent spaces, and produces shared representations that are more useful for downstream supervised tasks. The paper reports results on standard multimodal benchmarks but does not highlight a single summary number. 7
Status: ICML 2026.
Agents
Minim — privacy-aware minimal UI view for LLM agents (ICML 2026)
Area: Agents · arXiv: 2606.13949 · Authors: Hexuan Yu, Chaoyu Zhang, Heng Jin, Shanghao Shi, Ning Zhang, Y. Thomas Hou, Wenjing Lou · Code: github.com/yyyyhx/MINIM 8
Core problem. LLM agents acting on web or desktop UIs typically send the full UI state (all visible elements) to a remote inference server. Most elements are irrelevant to the current task, but the transmission leaks sensitive content (auth codes, notifications, background app states) that the model never needed.
Method. Minim runs as a trusted local broker between the UI and the remote model. It assigns each UI element two scores: an inherent sensitivity score (how risky is this element if leaked) and a task-conditioned necessity score (does the current task need this element). A ternary policy then keeps necessary elements as-is, abstracts sensitive-but-necessary elements, and drops everything else. The objective penalizes necessity errors more heavily on high-sensitivity content.
Key result. On WebArena-derived real-world UI observations, Minim substantially reduces task-irrelevant sensitive leakage while preserving the semantic context and interactive affordances needed for the agent to complete its task. The paper does not report a single-number leakage reduction rate; results compare leakage and task-success jointly. 8
Takeaway. Minim is positioned as a privacy layer that sits between any existing UI-grounded agent and its remote LLM — no modification to the agent or the model required.
コンテンツカードを読み込んでいます…
Status: ICML 2026. Code public.
Generative models / robotics
PolyFlow — polytope-constrained flow matching with zero constraint violation (ICML 2026)
Area: Generative models / robotics · arXiv: 2606.13400 · Authors: Jianming Ma, Qiyue Yang, Yang Zhang, Liyun Yan, Zhanxiang Cao, Yazhou Zhang, Yue Gao · Code: github.com/MJianM/PolyFlow 9
Core problem. Flow-based generative models are increasingly used in planning and control, where the generated samples must satisfy hard physical constraints (e.g., joint limits, clearance bounds). Post-hoc projection to a feasible set adds inference latency and can distort the learned distribution.
Method. PolyFlow embeds polyhedral constraints directly into the flow model. It reformulates flow matching as a discrete-time process with a projection-free architecture — meaning the model never needs to call an iterative solver during inference to find a feasible point. Arbitrary polyhedral constraints are satisfied strictly by construction.
Key result. PolyFlow achieves zero constraint violation across planning and control tasks, while maintaining distributional fidelity (generation quality) comparable to unconstrained baselines. Compared to constrained generation SOTAs, it significantly reduces inference latency. 9
コンテンツカードを読み込んでいます…
Status: ICML 2026. Code public.
Scientific ML
HiST — hierarchical sparse transformer for spatial transcriptomics (ICML 2026)
Area: Scientific ML · arXiv: 2606.14251 · Authors: Weiyi Wu, Xinwen Xu, Xingjian Diao, Siting Li, Zhi Wei, Alma Andersson, Jiang Gui 10
Core problem. Spatial transcriptomics (ST) maps gene expression to tissue locations, but the technology is expensive and low-throughput. Inferring gene expression from routine H&E histology slides (gigapixel images) is a computationally hard problem: measured expression sites are sparse and irregularly placed, making standard dense-grid transformers wasteful in both memory and compute.
Method. HiST treats measured tissue locations as a sparse field indexed by tissue coordinates. It builds a dyadic (hierarchical) encoder-decoder directly on the active tissue footprint, using sparse window attention for local geometric correspondence and resolution-changing operators for multiscale context. A bottlenecked slide calibration token captures slide-level acquisition variation without requiring dense global attention. Memory and runtime scale with the number of observed locations, not the full slide area.
Key result. On a multi-organ benchmark spanning diverse tissues and acquisition protocols, HiST improves predictive performance over recent baselines while reducing both runtime and peak memory. The paper does not report a single summary metric for the improvement; results are split across tissue types and acquisition sources. 10
Takeaway. The sparse-field framing is the transferable idea: any biological imaging task with sparse, irregularly sampled targets can use this architecture pattern instead of forcing a dense grid.
Status: ICML 2026.
このコンテンツについて、さらに観点や背景を補足しましょう。