Pick the right model for each Notion Custom Agent — and stop overpaying

Notion added three models to Custom Agents in the past week: Claude Opus 4.8 on May 29, Gemini 3.5 Flash on May 30, 1 and Grok 4.3 + Grok Build 0.1 on Jun 3. 2 The picker now exposes three tiers — full-size frontier models, small/fast models, and open models — with a Speed / Intelligence / Cost hover tooltip on each. 1

Most PMs leave every agent on Auto or, worse, pin everything to Opus because it "feels safer." Neither is the right move. Notion's own Custom Agents PM, Marina Camim, put it plainly: "Not every task needs the biggest brain in the world." 3 An agent that runs 1,000 times a month on the wrong model tier isn't a slightly bigger expense — it's potentially a 100x one. 4

This tip gives you a concrete routing rule for each agent type so you can make a deliberate choice rather than a default one.

Prerequisites

Requirement	Detail
Notion plan	Business or Enterprise — Custom Agents and model selection are not available on Plus or Free 5
Credits	Included in Business plan; additional packs at $10 / 1,000 credits 6
Where to find the picker	Agent Settings tab → Model section → click to expand
Tool-calling requirement	If your agent uses any integration or built-in tool (Notion search, Slack, Calendar), you must select a model that supports tool calling — not every model in the picker does 7

What the picker actually contains

The Gemini 3.5 Flash announcement screenshot shows three labeled tiers in the current picker: 1

Notion Custom Agent model picker showing Auto, full-size models, small models, and open models tiers — Notion's model picker as of May 30, 2026 — three tiers plus Auto, with Speed / Intelligence / Cost bars on hover. 1

Tier label in UI	Models visible	Use when
Select a model (Beta)	Sonnet 4.6, Opus 4.6 / 4.7 / 4.8, GPT-5.2 / 5.4 / 5.5	Multi-step reasoning, synthesis across many pages, nuanced writing
Small models (Beta)	Haiku 4.5, Gemini 3.5 Flash, GPT-5.4 Mini, GPT-5.4 Nano	Extraction, classification, summarization, high-frequency triggers
Open models (US-provider hosted, Beta)	MiniMax M2.5, DeepSeek V4 Pro	Cost-sensitive batch tasks where output quality is secondary

Notion does not publish a per-model credit rate table. The help page states only that "advanced models use more credits because they handle more complex reasoning." 6 The hover tooltip gives relative Speed / Intelligence / Cost bars rather than absolute numbers, so the routing decision is necessarily qualitative.

Auto mode is not a smart router. One third-party help center confirmed it picks a single balanced default rather than switching models per request. 7 Notion's own help center says "Auto lets Notion match to a model based on the task" 5 — but that description is vague enough to be consistent with either behavior. Treat Auto as a safe starting point, not a permanent answer.

Routing rules by agent task type

The core principle, validated by both Notion's PM team and community practitioners: match model tier to the reasoning complexity of the task, not the importance of the output.

Agent task	Recommended tier	Specific model to try	Rationale
Classify / tag incoming items (feature requests, tickets, leads)	Small models	Gemini 3.5 Flash or Haiku 4.5	Pattern matching, not reasoning; runs at high frequency
Summarize meeting notes or weekly status pages	Small models	Gemini 3.5 Flash or GPT-5.4 Mini	Extraction + condensation; no cross-page inference needed
Route tasks to the right owner / database based on content	Small models	Haiku 4.5 or GPT-5.4 Mini	Simple decision tree; Opus adds no quality here
Write or polish structured documents (PRDs, release notes, OKR commentary)	Select a model	Sonnet 4.6 or GPT-5.2	First-draft quality matters; output is shared externally
Cross-workspace synthesis (pull from 5+ pages, build a coherent brief)	Select a model	Opus 4.8 or GPT-5.5	High page-count context + coherent synthesis
Deep analysis / strategy questions (roadmap tradeoffs, prioritization reasoning)	Select a model	Opus 4.8	Notion AI head Sarah Sachs noted Opus excels at "interpreting what users really want, producing shareable content on the first try" 8
Cost-sensitive batch enrichment (hundreds of rows, low-stakes fields)	Open models	MiniMax M2.5	Reddit community reports up to 10x cost reduction vs. frontier models 9

Marina Camim's optimization session with "Changelog Carl" — an agent running on Opus 4.7 that was spending ~$100/month — showed the first move should always be a model downgrade: "Changing the model from Opus to something else will already reduce the cost by a lot." 3 In that case, Carl's problem was not model capability — it was context overload causing the agent to skip steps. Switching to Auto or Sonnet resolved both the behavior issue and the cost.

Loading content card…

The self-optimization shortcut

Once you've made an initial model assignment, there's a faster way to tune it than manual trial-and-error. Marina Camim described it in the same session: "The best way to make your agent better is by collaborating with it. You can talk to it about what problems you have, what you wish was better, and then ask it for his advice on how to make it better." 3

Concretely: open the agent's chat thread, describe the output failure ("the summary skips the risk section even when it's present"), and ask the agent to propose an instruction change. Then apply that change to the agent's system prompt yourself. This works best when paired with a model that's already appropriately tiered — asking Gemini Flash to diagnose a reasoning gap it cannot fill returns less useful advice than asking Sonnet.

Gotchas

The multiplier is real, not theoretical. A Reddit user reported consuming over 100,000 credits in a single month after enabling a few agents — their Notion spend "doubled overnight." 10 Credits reset monthly and are shared across the workspace, so a single misconfigured high-frequency agent can exhaust the budget for everyone.

Tool-calling support is not universal. The open-model tier (MiniMax M2.5, DeepSeek V4 Pro) may not support tool calling depending on how Notion has integrated each model. If your agent needs to read a Notion database or push to Slack, test the open-model tier in a low-stakes environment before deploying at scale. 7

Stale validation scaffolding becomes a new failure source. If you built retry prompts or instruction checks to compensate for a weaker model's failure modes, those same checks may reject correct outputs from a more capable model after an upgrade. MindStudio's research on agent architecture documented this pattern: validation built for weaker models can start blocking correct outputs when the model improves. 11 When you upgrade a model, audit the agent's instructions for any check that was compensating for the old model's limitations.

Too many tools degrade model selection accuracy. MindStudio's same analysis found that giving an agent more tools increases its selection problem, reducing the quality of the work that follows: "Harder selection problems increase the cognitive load on the model, which reduces the quality of the work it does after making the selection." 11 A focused small-model agent with two tools will outperform an overloaded frontier-model agent with ten.

Context scope matters as much as model tier. Marina Camim flagged this in the optimization session: giving an agent your entire Notion workspace causes it to skip steps and produce lower-quality output — then you compensate by upgrading the model, creating a cost spiral. Narrow the agent's context first, then consider whether you still need the bigger model. 3

threads.nethttps://www.threads.net/@notionhq/post/DY76U1MEmX-External link