New releases
5
Worth upgrading
2 / 5
Active regressions
2
Items tracked
24

This week's upgrade signal across the five tracked repos splits clearly. huggingface/transformers v5.9.0 and langchain 1.3.2 are both worth taking — transformers adds new architectures and a vision memory-leak fix, while langchain closes a real streaming PII gap. Both pytorch (v2.12.0 breaking changes) and vllm (v0.21.0 Qwen MTP regression) advise holding. llama.cpp continues on build tags with no breaking changes.

| Repo | Release this week | Breaking / deprecation | Verdict |
|---|---|---|---|
| pytorch/pytorch | No (v2.12.0, May 13) | ⚠️ Quantized tensor creation deprecated | Hold |
| langchain-ai/langchain | ✅ 1.3.2 + fireworks 1.4.0 + openai 1.2.2 + perplexity 1.3.0 | ⚠️ fireworks SDK migration | Upgrade cautiously |
| vllm-project/vllm | No (v0.21.0, May 15) | ✅ logit_bias/logit_scale removed | Hold |
| ggml-org/llama.cpp | No formal release (b9329–b9357) | None | Follow latest build |
| huggingface/transformers | ✅ v5.9.0 (May 20) | ⚠️ SAM3 text_embeds input changed | Upgrade |
main since then — a patch is building but not yet cut. 1torch.distributed.nn.functional ops throw RuntimeError under torch.compile (migrate to _functional_collectives), and torchrun now defaults to an OS-assigned port instead of 29500. 1torch.quantize_per_tensor, torch.quantize_per_channel, and related functions producing quint8/qint8/qint32 tensors. PR author vkuzo states: "deprecated and will be removed in a future PyTorch release." Migration path tracked at issue #184982. torchvision already dropped the affected calls. 2sympy.Min/Max construction sites in Inductor with PyTorch's custom torch.utils._sympy.functions.Min/Max classes, eliminating expensive is-connected simplification checks on the hot path. Marked "not user facing" — transparent to end users. 1,696 comments over five days. 3native_group_norm_backward Metal implementation, CUDA sdpa edge-case fix.langchain-fireworks needs a targeted test pass before deploying.PIIMiddleware (PR #37616, 34 comments this week — the most-discussed PR in the repo). Prior to this, PIIMiddleware only scrubbed at the state level via after_model, so "consumers reading the live stream saw raw model text until the run finished," as contributor Nick Hollon described in the PR. The fix extends real-time redaction to text-delta, reasoning-delta, tool-call arguments, and tool-output streams — paths that were previously bypassed entirely. 5block strategy bypass on streaming paths. Those were resolved across 10+ iterations before merge. If your deployment relies on PIIMiddleware in a streaming context, this upgrade directly addresses those paths.langgraph dependency floor raised to >=1.2.2, glob_search results now sorted by mtime (newest first), and langsmith bumped 0.7.31→0.8.0. A fourth package, langchain-perplexity 1.3.0 (released May 27), adds a use_responses_api flag to ChatPerplexity for routing to Perplexity's Agent API instead of standard chat completion — low-risk addition. 4fireworks-ai 1.x SDK. This is an API compatibility change — test against fireworks-ai 1.x before deploying. A 1.4.1 patch followed within 24 hours, fixing retry logic on bare APIConnectionError (default max_retries=2).httpx finalizer crash and switches LLM context-size lookup from hardcoded values to dynamic model profiles. Low-risk upgrade.
logit_bias and logit_scale aliases removed from PoolerConfig (use logit_mean / logit_sigma directly), and vllm/utils/profiling.py deleted entirely (66 lines gone). The cprofile decorator and cprofile_context context manager no longer exist; switch to Python's standard cProfile. 10reasoning_ended=True and force-fed the unconstrained bonus token to the grammar, corrupting its state." Fix adds mid-batch detection and a suppress_accept_errors flag. Under review by WoosukKwon, njhill, mgoin, and others. 11master as usual. 12ffn_latent MUL_MAT flag fix by ServeurpersoCom recovered Nemotron 3 Super 120B Q5_K_M throughput from 64.9 t/s back to 103.22 t/s. 13Gemma4ForCausalLM architecture conversion added.MUL_MAT pipeline.AudioFlamingoNext checkpoints, audio/visual encoder compilability improvements.lru decorator-induced memory leak in vision models.text_embeds input for SAM3, EdgeTAM, and SAM3-Lite-Text now requires full text embeddings rather than pooled output. Update your inference code before upgrading if any of these models are in use. 14deepseek_v4: hc_head, sinks, and position_bias tensors kept in fp32 (PR #46198) — addresses a silent precision downgrade. Issue #46129 (MTP keys silently random-initialized due to missing conversion_mapping entries) remains open as of May 27. 14from_pretrained now supported (PR #46102).ProcessorMixin._load_tokenizer_from_pretrained forces subfolder lookup, breaking root-level tokenizer files — open, tagged deprecation, watch if you use non-standard repo layouts.
围绕这条内容继续补充观点或上下文。