Your page might have perfect content, optimized keywords, and fast loading times - but if search engines can't crawl it, none of that matters. Pages blocked from indexing are invisible to organic search.
Search engines respect your instructions about what to crawl and index. When you (intentionally or accidentally) tell them "don't index this page," they comply. The page disappears from search results.
Three mechanisms can block indexing:
1. Meta robots tag
<meta name="robots" content="noindex">
This tells all search engines to not index the page.
2. X-Robots-Tag HTTP header
X-Robots-Tag: noindex
Same effect, but set at the server level rather than in HTML.
3. robots.txt disallow
User-agent: *
Disallow: /private/
Prevents crawlers from accessing the URL at all.
The most common cause is accidental. A developer adds noindex during staging and forgets to remove it before launch. A robots.txt rule intended to block one directory matches more than expected. A CMS setting gets toggled without understanding the consequences.
The result is the same: search engines skip your page entirely, and you get zero organic traffic regardless of how good your content is.
Check for meta robots tags:
Ctrl/Cmd + F and search for robots<meta name="robots"> with noindex or none in the content// Console check for meta robots
const robotsMeta = document.querySelector('meta[name="robots"]')
if (robotsMeta) {
const content = robotsMeta.content.toLowerCase()
if (content.includes('noindex') || content.includes('none')) {
console.warn('Page is blocked from indexing:', robotsMeta.outerHTML)
}
}
Check HTTP headers in Network panel:
X-Robots-Tag in Response HeadersRun a Lighthouse SEO audit. Look for "Page is blocked from indexing" in the results.
The audit checks:
<meta name="robots"> for noindex or none directivesX-Robots-Tag HTTP header for blocking directivesrobots.txt rules that disallow the page URLunavailable_after directives with past datesLighthouse tests against major bot user agents: Googlebot, Bingbot, DuckDuckBot, and others. If all bots are blocked, you fail. If at least one is allowed, you pass with a warning.
Find and remove or modify the blocking meta tag:
<!-- Remove this entirely -->
<meta name="robots" content="noindex">
<meta name="robots" content="noindex, nofollow">
<meta name="robots" content="none">
<!-- Or change to allow indexing -->
<meta name="robots" content="index, follow">
If you need nofollow (don't follow links) but want the page indexed:
<meta name="robots" content="index, nofollow">
For bot-specific tags, check for both generic and specific:
<!-- These also block indexing -->
<meta name="googlebot" content="noindex">
<meta name="bingbot" content="noindex">
The fix depends on your server:
Apache (.htaccess):
Header set X-Robots-Tag "noindex"
<If "%{REQUEST_URI} =~ m#^/admin/#">
Header set X-Robots-Tag "noindex"
</If>
Nginx:
add_header X-Robots-Tag "noindex";
location /admin/ {
add_header X-Robots-Tag "noindex";
}
Node.js/Express:
// Remove this middleware
app.use((req, res, next) => {
res.setHeader('X-Robots-Tag', 'noindex')
next()
})
// Or apply selectively
app.use('/admin', (req, res, next) => {
res.setHeader('X-Robots-Tag', 'noindex')
next()
})
Check your robots.txt for overly broad rules:
User-agent: *
Disallow: /
User-agent: *
Disallow: /admin/
Disallow: /api/
Disallow: /tmp/
Common mistakes:
Disallow: /*?*
Disallow: /*.pdf$
Disallow: /staging # Also matches /staging-guide, /staging-area
Test your robots.txt rules:
curl https://yourdomain.com/robots.txt
Use Google Search Console's robots.txt Tester to verify specific URLs are allowed.
Some setups add noindex based on conditions:
// Development environment check gone wrong
if (process.env.NODE_ENV !== 'production') {
// This is correct
}
// But this is dangerous if NODE_ENV isn't set correctly in production
if (!process.env.PRODUCTION) {
meta.push({ name: 'robots', content: 'noindex' })
}
Verify your production environment doesn't have staging configurations.
// Remove robots noindex from your metadata
export const metadata = {
robots: {
index: true, // Explicitly allow indexing
follow: true
}
}
// For pages that should not be indexed (intentional)
export const metadata = {
robots: {
index: false,
follow: false
}
}
next.config.js for any custom headers adding X-Robots-Tag.nuxt.config.ts and page-level settings:// nuxt.config.ts - remove if blocking unintentionally
export default defineNuxtConfig({
app: {
head: {
meta: [
// Remove this if present
// { name: 'robots', content: 'noindex' }
]
}
}
})
<script setup>
// Page-level - only use for pages that shouldn't be indexed
useSeoMeta({
robots: 'index, follow'
})
</script>
After removing indexing blocks:
1. Re-run Lighthouse
The "Page is blocked from indexing" audit should pass.
2. Check multiple sources
Verify all three mechanisms are clear:
curl -s https://example.com/page | grep -i "robots"
curl -I https://example.com/page | grep -i "x-robots"
curl -s https://example.com/robots.txt
3. Google Search Console
Use URL Inspection to submit the page for indexing. The tool shows exactly what Google sees and whether indexing is allowed.
4. Wait and verify indexing
Search site:example.com/page-url after a few days. If the page appears, it's being indexed correctly.
unavailable_after: 01-Jan-2024 blocks indexing after that date. If you used this for temporary content, remove it when you want the content back.Crawlability issues often appear alongside:
One misconfigured template can block thousands of pages. A robots.txt typo can exclude your entire site. CMS updates can reset SEO settings to defaults. Unlighthouse scans every URL on your site and identifies every page blocked from indexing, so you can catch issues that would otherwise remain invisible until you notice organic traffic dropping.
HTTP Status Code
Fix pages returning 4xx or 5xx HTTP status codes that prevent proper indexing. Learn status code meanings, common causes, and how to resolve server errors.
Link Text
Learn how to replace generic link text like "click here" with descriptive text that helps search engines understand your content.