A malformed robots.txt file can silently prevent Google from crawling your entire site. According to Google Search Console data, 23% of websites have robots.txt configuration errors that affect their search visibility.
Lighthouse flags "robots.txt is not valid" when your robots.txt file contains syntax errors, malformed directives, or structural problems that crawlers cannot parse correctly. When search engine bots encounter an invalid robots.txt, they may interpret your crawling instructions incorrectly or ignore them entirely.
The robots.txt file follows a strict specification. Each directive must be on its own line, use a colon separator, and follow specific formatting rules. Common errors include missing colons, invalid URL patterns, directives without a preceding User-agent declaration, and unrecognized directive names. These seem like minor issues, but they can cascade into major crawling problems.
The stakes are high: if Googlebot misinterprets your robots.txt, it might crawl pages you wanted blocked (wasting crawl budget and potentially indexing private content) or skip pages you wanted indexed (killing your search rankings). A single syntax error can flip the meaning of your entire file.
https://your-site.com/robots.txt directlyRun a Lighthouse SEO audit. The "robots.txt is not valid" audit will fail and display:
Lighthouse also fails this audit when the robots.txt request returns a 5xx server error, indicating your server cannot reliably serve the file.
Every directive needs the format Directive: value with a colon separator:
User-agent *
Disallow /admin
User-agent: *
Disallow: /admin
Group member directives (Allow, Disallow) must always follow a User-agent declaration:
Disallow: /private/
User-agent: *
Disallow: /private/
Allow and Disallow patterns must start with /, *, or be empty:
Disallow: admin/
Disallow: private
Disallow: /admin/
Disallow: /private
Disallow: *private*
Disallow: # Empty disallow (allows everything)
The $ wildcard is only valid at the end of a pattern:
Disallow: /page$.html
Disallow: /page.html$
Sitemap directives require fully qualified URLs with valid protocols:
Sitemap: /sitemap.xml
Sitemap: ftp://example.com/sitemap.xml
Sitemap: https://example.com/sitemap.xml
Stick to universally supported directives. Unknown directives cause validation failures:
User-agent: *
Allow: /
Disallow: /admin/
Sitemap: https://example.com/sitemap.xml
Crawl-delay: 10
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /private/
Disallow: /*.json$
User-agent: GPTBot
Disallow: /
User-agent: CCBot
Disallow: /
Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/sitemap-blog.xml
public/robots.txt for static content, or use app/robots.ts for dynamic generation. Next.js serves files from public/ at the root path automatically. For dynamic robots.txt based on environment, export a robots() function from app/robots.ts.robots.txt in the public/ directory, or use the @nuxtjs/robots module for dynamic generation. The module supports environment-based configuration and automatic sitemap URL injection via nuxt.config.ts.https://your-site.com/robots.txt and visually inspect for errors/css/ or /js/ directories. Googlebot needs these to render your pages correctly. Blocking render resources hurts your rankings.noindex meta tags or authentication for truly private content./admin blocks only the /admin file, while /admin/ blocks the directory. Be explicit about what you're blocking.Robots.txt issues often appear alongside:
A valid robots.txt is just the first step. Search engines still need to successfully crawl and index your pages. Run a comprehensive scan to verify your entire site is accessible and returns proper status codes.
Scan Your Site with Unlighthouse