Robots.txt Generator

Create compliant robots.txt files to control search engine crawler access. Define user‑agents, allow/disallow rules, sitemap locations, and crawl‑delay — all with instant preview and downloadable output.

Quick templates:

✅ Allow All ? Block All ? WordPress ? SEO Optimized ? Shopify ✏️ Custom

100% local: No data is sent to any server. Your configuration stays in your browser. We do not track or store your inputs.

Understanding robots.txt — The Gatekeeper of Search Engine Crawling

The robots.txt file (officially the Robots Exclusion Protocol) is a plain‑text file placed in the root directory of a website. It communicates with web crawlers — such as Googlebot, Bingbot, and other search engine spiders — instructing them which parts of your site they are allowed or disallowed to access. Although it is not a security mechanism (malicious bots ignore it), it is an essential tool for managing crawl budget, protecting sensitive content, and guiding search engines toward your most valuable pages.

? The robots.txt protocol is defined by the RFC 9309 standard and is recognised by all major search engines including Google, Bing, Yahoo, and Yandex.

The file is fetched each time a crawler visits your site. Google typically checks for updates every few hours.

How Search Engines Interpret robots.txt

When a crawler arrives at your domain, it first requests /robots.txt. The server returns the file, and the crawler parses the directives line by line. The most critical directives are User-agent, Disallow, and Allow. The crawler builds an internal list of allowed and disallowed paths and then proceeds to crawl only the URLs that are permitted. If no Disallow rule matches a URL, the crawler may fetch it — unless other directives (like Noindex meta tags) prevent indexing.

It is crucial to understand that robots.txt does not prevent a page from being indexed if it is linked from other sites. It only controls crawling. To prevent indexing, you must use noindex meta tags or X‑Robots‑Tag headers. This distinction is one of the most common misconceptions in SEO.

Core Syntax — A Quick Reference

Directive	Purpose	Example
`User-agent`	Specifies which crawler the following rules apply to.	`User-agent: Googlebot`
`Disallow`	Prevents the specified crawler from accessing the given path.	`Disallow: /admin/`
`Allow`	Permits access to a specific path (overrides a broader disallow).	`Allow: /public/`
`Sitemap`	Points to the location of your XML sitemap.	`Sitemap: https://example.com/sitemap.xml`
`Crawl-delay`	Requests a delay (in seconds) between successive crawls.	`Crawl-delay: 5`

Best Practices for an SEO‑Friendly robots.txt

Allow important content: Ensure your main pages (home, product, blog) are not accidentally disallowed.
Block duplicate content: Use disallow rules for parameterised URLs, print versions, and pagination to avoid crawl waste.
Always include your sitemap: Even if you submit it via Search Console, including it in robots.txt helps crawlers discover your content structure.
Be cautious with wildcards: The * wildcard is widely supported but can be overly broad. Test thoroughly.
Separate user‑agent sections: Each User-agent directive starts a new block. Rules within a block apply only to that crawler.
Use comments for clarity: Add comments (#) to explain your logic, especially in larger files.

Case Study: E‑Commerce Site Optimisation

An online retailer with over 50,000 product pages noticed that Googlebot was spending excessive crawl time on faceted navigation filters (e.g., /products?color=red&size=large). This consumed crawl budget and delayed the discovery of new product pages. By implementing a strict Disallow: /products?* rule for all bots, while allowing the main product URLs (/product/) and the sitemap, they reduced crawl waste by 62% and saw a 17% increase in indexation of new products within two weeks. The key was to selectively block without hiding valuable content.

Common Mistakes That Harm SEO

Blocking CSS/JS: If you disallow CSS and JavaScript files, Googlebot cannot render your pages correctly, which can negatively impact mobile‑friendliness and page‑experience signals.
Using Disallow: / globally: This tells all crawlers to stay away from your entire site. Only use this for staging environments or when you deliberately want to de‑index.
Case sensitivity: Paths in robots.txt are case‑sensitive. /About and /about are treated differently.
Missing trailing slashes: /admin and /admin/ are distinct. Be consistent with your URL structure.
Over‑reliance on robots.txt for security: Since it's publicly accessible, never use it to hide sensitive data — use authentication or IP restrictions instead.

The Evolution of the Robots Exclusion Protocol

The original robots.txt specification was created in 1994 by Martijn Koster, a webmaster at Nexor, in collaboration with early search engine developers. It was never an official standard but became a de‑facto convention adopted by virtually all crawlers. In 2022, the Internet Engineering Task Force (IETF) published RFC 9309, formally standardising the protocol. This update clarified ambiguous cases, introduced explicit support for Allow and Sitemap, and established guidelines for wildcard usage. Today, robots.txt remains one of the most foundational tools for website owners to manage search engine interaction.

Frequently Asked Questions

robots.txt controls crawling (whether a bot can access a URL), while noindex controls indexing (whether a page appears in search results). A page can be crawled but not indexed, or indexed without being crawled (if discovered through links). For optimal control, use both appropriately.

Major search engines (Google, Bing, Yandex, Baidu) support the core directives. However, Crawl-delay is not officially supported by Google — they recommend using Google Search Console's crawl rate settings instead. Always test your robots.txt using tools like Google's robots.txt tester.

Yes, the * wildcard is supported by most major crawlers. It matches any sequence of characters. For example, Disallow: /*.pdf blocks all PDF files. However, not all bots support wildcards, so use them with caution and test thoroughly.

Yes, paths in robots.txt are case‑sensitive. /Images and /images are considered different. This is a common source of errors, so always double‑check your URL paths.

Use Google's Search Console robots.txt tester, Bing's Webmaster Tools, or third‑party validators. Our generator provides a preview, but we always recommend verifying against the official tools for your target search engines.

If a robots.txt file is missing (404), crawlers assume they are allowed to access the entire site. If the server returns a 5xx error, crawlers may delay crawling until the file is accessible. Always ensure your robots.txt is served with a 200 OK status and is readable.

Expert‑reviewed content — This guide and tool are built upon the Google Search Central documentation, the RFC 9309 specification, and industry best practices compiled by SEO professionals with over a decade of experience. The generator logic follows the parsing rules implemented by major search engines. Last reviewed and updated July 2026.

Further reading: Google robots.txt Guide · RFC 9309 – Robots Exclusion Protocol · Bing Webmaster robots.txt

Robots.txt Generator

Understanding robots.txt — The Gatekeeper of Search Engine Crawling

How Search Engines Interpret robots.txt

Core Syntax — A Quick Reference

Best Practices for an SEO‑Friendly robots.txt

Case Study: E‑Commerce Site Optimisation

Common Mistakes That Harm SEO

The Evolution of the Robots Exclusion Protocol

Frequently Asked Questions

What is the difference between robots.txt and noindex?

Do all search engines support the same directives?

Can I use wildcards in robots.txt?

Is robots.txt case‑sensitive?

How do I test my robots.txt before deploying?

What happens if my robots.txt is missing or inaccessible?

Related Tools