Week 1: Soft 404s are draining your crawl budget — here's how to find and fix them in 30 minutes

Week 1: Soft 404s are draining your crawl budget — here's how to find and fix them in 30 minutes

Soft 404 errors silently consume Google's crawl budget and suppress indexing, causing traffic to bleed out over weeks. This guide shows indie developers how to spot them in Google Search Console and fix the most common causes in under 30 minutes.

Google Search Console SEO Pitfall Guide
June 3, 2026 · 5:53 PM
1 subscriptions · 1 items
Every indie developer knows the gut-punch moment: you check Google Search Console one day and your organic traffic has dropped 40%, 60%, 80% — with no obvious cause. No manual action. No spammy backlinks you know of. The rankings just quietly bleed out.
One of the most common culprits is also one of the easiest to overlook: soft 404 errors. Unlike a hard 404 that clearly signals "this page doesn't exist," a soft 404 returns a normal 200 OK status code while serving an empty or near-empty page. Google crawls it, wastes budget on it, and starts trusting your domain less — all without triggering any obvious alarm.
The full case study behind this week's tip — a 90% traffic collapse traced back to soft 404s across 13 country domains — is documented on Search Engine Land:
Loading content card…

What a soft 404 actually does to your site

When Google's crawler visits your site, it works with a finite budget — a fixed number of pages it's willing to crawl per day before moving on. Soft 404s eat into that budget without delivering value. The crawler sees an empty page, can't confidently mark it as 404, and keeps coming back.
The compound effect is brutal. A Search Engine Land investigation of a multinational crypto news publisher documented exactly how this plays out at scale: soft 404 errors accumulated silently across 13 country domains. As they grew, Google reduced the daily crawl rate for the French subdomain from 60,000–70,000 requests per day down to 20,000–30,000. New articles stopped getting indexed within hours. By the time the problem was identified, 513,000 pages on the Brazilian domain alone sat in a "crawled but not indexed" limbo, and overall traffic had dropped 90% from its pre-migration peak. 1
For an indie developer running a SaaS app, a tool directory, or a content site, the trigger is usually something innocuous: a migration that didn't fully clean up old URLs, auto-generated parameter pages (like /convert?from=USD&to=ETH&amount=250), or a CMS that creates empty tag or category pages.

How to find your soft 404s in Google Search Console

Open Search Console and go to Indexing → Pages. Look for two sections:
  • "Soft 404" — pages Google has explicitly labeled as returning 200 but containing no meaningful content
  • "Crawled – currently not indexed" — a broader signal that Google visited these pages but chose not to include them in the index; soft 404s are one common reason among several
Click into the soft 404 report. Export the URL list and scan for patterns. In most indie projects, you'll find one of three sources:
  1. Auto-generated parameter pages — your app creates combinatorial URLs (user IDs, filter combinations, currency pairs) that produce thin or empty responses for invalid inputs
  2. Deleted content with no redirect — a blog post or product page was removed, but the URL still loads a page shell instead of returning a proper 404 or 410
  3. CMS stub pages — empty tag archives, empty search result pages, empty author profiles that render a template but contain no actual content
The distinction between a "real" empty page and a soft 404 is straightforward: does a human visiting this URL get useful information? If not, the page is either soft 404 territory or should be blocked from indexing entirely.
Soft 404 and crawl budget data from a real-world multi-domain recovery case 1
Loading stats card…

The fix: three actions, priority order

1. Return proper status codes (urgent)
For pages that genuinely don't exist or no longer have content, configure your server or CMS to return a 404 or 410 status code — not a 200. A 410 (Gone) is preferable if the content is permanently deleted; it tells Google to stop crawling this URL faster than a 404 does.
In most frameworks: check your routing config, your 404.html handler, and any custom page templates. If your app generates URLs dynamically, add server-side validation that returns the correct status code when the resource doesn't exist.
2. Block or noindex auto-generated low-value pages
If your app generates parameter-based URLs at scale (user settings pages, filters, search result pages), either:
  • Use robots.txt to disallow those URL patterns entirely
  • Add a <meta name="robots" content="noindex"> tag to the page template
The distinction matters: robots.txt tells Google not to crawl (saves budget immediately); noindex tells it not to index even if crawled (better for pages you still want to be accessible to users but not indexed). For pages you don't want users to find via Google, robots.txt is the faster fix.
3. Review canonicalization
For any pages that exist in multiple URL variants (e.g., /product and /product?ref=email), add a canonical tag pointing to the preferred version. This prevents Google from treating each variant as a separate page that might fail to return useful content.

What "recovery" actually looks like

The turnaround from fixing these issues isn't instant, but it's measurable. After the multinational publisher implemented the soft 404 fixes described above, soft 404 errors across all 13 domains fell from a peak of ~120,000 affected pages to under 20,000 within 12 weeks — an 83% reduction. Indexed page counts on several domains doubled. Traffic on Germany's subdomain climbed from ~8,000 to 12,000–15,000 clicks per day. 1
For smaller indie projects, the fix cycle is faster. A site with a few hundred problematic URLs typically sees Google recrawl and update the index within 2–4 weeks of resolving the status codes. The key is starting with the Pages report, not guessing.

This week's action

Open Google Search Console → Indexing → Pages right now. If your soft 404 count is above zero, export the list and spend 20 minutes identifying the pattern. The fix for most indie projects is one configuration change — either a routing rule or a robots.txt entry — that takes less time than diagnosing the problem.
Getting indexing right is the prerequisite for everything else. Content quality, backlinks, and Core Web Vitals all matter more when Google can actually see your pages.

Add more perspectives or context around this Post.

  • Sign in to comment.