---
title: "Fix Invalid robots.txt for Better SEO"
description: "Fix robots.txt validation errors that prevent search engines from properly crawling your site. Learn syntax rules, common errors, and proper configuration."
canonical_url: "https://unlighthouse.dev/learn-lighthouse/seo/robots-txt"
last_updated: "2025-01-18"
---

A malformed robots.txt file can silently prevent Google from crawling your entire site. According to Google Search Console data, 23% of websites have robots.txt configuration errors that affect their search visibility.

**Key limits to know:**

- [Max file size: 500 KiB](https://developers.google.com/search/docs/crawling-indexing/robots/intro) - Google stops processing midway if larger
- [Sitemap limits: 50,000 URLs OR 50MB uncompressed](https://developers.google.com/search/docs/crawling-indexing/sitemaps/build-sitemap) (whichever first)
- [Google ignores `crawl-delay`](https://developers.google.com/search/blog/2019/07/a-note-on-unsupported-rules-in-robotstxt) - retired all code handling it on Sept 1, 2019

## What's the Problem?

Lighthouse flags "robots.txt is not valid" when your robots.txt file contains syntax errors, malformed directives, or structural problems that crawlers cannot parse correctly. When search engine bots encounter an invalid robots.txt, they may interpret your crawling instructions incorrectly or ignore them entirely.

The robots.txt file follows a strict specification. Each directive must be on its own line, use a colon separator, and follow specific formatting rules. Common errors include missing colons, invalid URL patterns, directives without a preceding User-agent declaration, and unrecognized directive names. These seem like minor issues, but they can cascade into major crawling problems.

The stakes are high: if Googlebot misinterprets your robots.txt, it might crawl pages you wanted blocked (wasting crawl budget and potentially indexing private content) or skip pages you wanted indexed (killing your search rankings). A single syntax error can flip the meaning of your entire file.

## How to Identify This Issue

### Chrome DevTools

1. Navigate to `https://your-site.com/robots.txt` directly
2. Look for obvious syntax errors: missing colons, typos in directive names
3. Check that every Allow/Disallow directive has a User-agent above it
4. Verify sitemap URLs are fully qualified (include https://)

### Lighthouse

Run a Lighthouse SEO audit. The "robots.txt is not valid" audit will fail and display:

- The specific line number where errors occur
- The problematic content on that line
- A description of what's wrong (e.g., "Unknown directive", "No user-agent specified")

Lighthouse also fails this audit when the robots.txt request returns a 5xx server error, indicating your server cannot reliably serve the file.

## The Fix

### 1. Correct Basic Syntax Errors

Every directive needs the format `Directive: value` with a colon separator:

```txt
User-agent *
Disallow /admin

User-agent: *
Disallow: /admin
```

Group member directives (Allow, Disallow) must always follow a User-agent declaration:

```txt
Disallow: /private/

User-agent: *
Disallow: /private/
```

### 2. Fix URL Pattern Errors

Allow and Disallow patterns must start with `/`, `*`, or be empty:

```txt
Disallow: admin/
Disallow: private

Disallow: /admin/
Disallow: /private
Disallow: *private*
Disallow:  # Empty disallow (allows everything)
```

The `$` wildcard is only valid at the end of a pattern:

```txt
Disallow: /page$.html

Disallow: /page.html$
```

### 3. Validate Sitemap URLs

Sitemap directives require fully qualified URLs with valid protocols:

```txt
Sitemap: /sitemap.xml

Sitemap: ftp://example.com/sitemap.xml

Sitemap: https://example.com/sitemap.xml
```

### 4. Use Only Recognized Directives

Stick to universally supported directives. Unknown directives cause validation failures:

```txt
User-agent: *
Allow: /
Disallow: /admin/
Sitemap: https://example.com/sitemap.xml

Crawl-delay: 10
```

### Complete Valid Example

```txt
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /private/
Disallow: /*.json$

User-agent: GPTBot
Disallow: /

User-agent: CCBot
Disallow: /

Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/sitemap-blog.xml
```

## Framework-Specific Solutions

<callout icon="i-logos-nextjs-icon">

**Next.js** - Create `public/robots.txt` for static content, or use `app/robots.ts` for dynamic generation. [Next.js](https://nextjs.org) serves files from `public/` at the root path automatically. For dynamic robots.txt based on environment, export a `robots()` function from `app/robots.ts`.

</callout>

<callout icon="i-logos-nuxt-icon">

**Nuxt** - Place `robots.txt` in the `public/` directory, or use the `@nuxtjs/robots` module for dynamic generation. The module supports environment-based configuration and automatic sitemap URL injection via `nuxt.config.ts`.

</callout>

## Verify the Fix

1. Navigate to `https://your-site.com/robots.txt` and visually inspect for errors
2. Use Google Search Console's robots.txt Tester (Settings > robots.txt)
3. Run Lighthouse SEO audit and confirm the robots.txt audit passes
4. Test specific URLs with Google's URL Inspection tool to verify intended behavior
5. Check server logs to check that robots.txt returns 200 status consistently

## Common Mistakes

- **Blocking CSS and JavaScript**: Don't block `/css/` or `/js/` directories. Googlebot needs these to render your pages correctly. Blocking render resources hurts your rankings.
- **Using robots.txt for sensitive content**: robots.txt is public and doesn't prevent indexing if pages are linked elsewhere. Use `noindex` meta tags or authentication for truly private content.
- **Forgetting trailing slashes**: `/admin` blocks only the `/admin` file, while `/admin/` blocks the directory. Be explicit about what you're blocking.
- **Testing only in production**: Many sites serve different robots.txt in staging vs production. Validate your production file, not your local one.
- **Unicode BOM at start of file**: A byte-order mark makes Google ignore invalid lines including the BOM character.
- **robots.txt in subdirectory**: Invalid. Must be in domain root (`/robots.txt`). Bots won't find it anywhere else.
- **Path values not starting with / or *** - Directive values like `Disallow: admin/` (missing leading slash) are invalid and ignored.

## Related Issues

Robots.txt issues often appear alongside:

- [Is Crawlable](/learn-lighthouse/seo/is-crawlable) - Robots.txt can block pages from indexing
- [HTTP Status Code](/learn-lighthouse/seo/http-status-code) - A 404 robots.txt causes different behavior than missing
- [Canonical](/learn-lighthouse/seo/canonical) - Don't block canonical URLs in robots.txt

## Test Your Entire Site

A valid robots.txt is the first step. Search engines still need to successfully crawl and index your pages. Run a complete scan to verify your entire site is accessible and returns proper status codes.

<u-button :trailing="true" icon="i-heroicons-rocket-launch" label="Scan Your Site with Unlighthouse" size="lg" to="/">



</u-button>
