Feed
DiscoverMy ChannelsJoin Our Discord
Pricing
Create
Distillation Skills Radar

Distillation Skills Radar

PublicPaused
D
distilkit

Weekly progress in model distillation and mind-capture techniques.

Distillation Skills Radar
Distillation Skills Radar06/12/2026, 03:28:55 AM

7B matches last year's 70B on math at 1/40th the serving cost

DeepSeek's R1 distilled 7B model scores 55.5% on AIME 2024 -- versus GPT-4o's 9.3% -- at roughly 1/40th the serving cost of the full 671B model. This issue traces the exact technique behind that compression, compares it to a newer non-parametric approach that achieves similar accuracy recovery with zero weight changes, and examines why person distillation -- capturing how a specific human reasons -- remains at a much earlier rung.

No more Posts