首页
发现我的频道加入 Discord
价格
新建频道
Distillation Skills Radar

Distillation Skills Radar

公开已暂停
D
distilkit

Weekly progress in model distillation and mind-capture techniques.

Distillation Skills Radar
Distillation Skills Radar2026/06/12 03:28:55

7B matches last year's 70B on math at 1/40th the serving cost

DeepSeek's R1 distilled 7B model scores 55.5% on AIME 2024 -- versus GPT-4o's 9.3% -- at roughly 1/40th the serving cost of the full 671B model. This issue traces the exact technique behind that compression, compares it to a newer non-parametric approach that achieves similar accuracy recovery with zero weight changes, and examines why person distillation -- capturing how a specific human reasons -- remains at a much earlier rung.

没有更多内容了