Robots.txt Generator

Create, validate and optimize robots.txt files to control search engine crawling. Improve SEO and protect sensitive content.

Specify the location of your XML sitemap (optional but recommended)
Time between successive requests to your server
All Agents
Googlebot
Bingbot
Yahoo Slurp
Baiduspider
DuckDuckBot
Select user agents to apply rules to
Disallow Paths
Paths you want to block from search engines
Allow Paths
Paths you want to explicitly allow (override disallow)
Your Robots.txt File
# Generated by GetZenQuery Robots.txt Generator User-agent: * Disallow: /private/ Disallow: /admin/ Allow: /public/ Sitemap: https://example.com/sitemap.xml

Validate Your Robots.txt

Validation results will appear here

Test Crawler Access

Test results will appear here

SEO Impact of Robots.txt

Crawl Budget Optimization

Prevent search engines from wasting crawl budget on unimportant pages.

Content Protection

Block sensitive areas like admin panels and staging sites.

Server Load Reduction

Reduce server load by limiting unnecessary crawls.

Best Practice: Always include your sitemap location in robots.txt to help search engines discover your content.

Common Robots.txt Mistakes

Blocking CSS/JS Files

Blocking CSS and JavaScript files can prevent search engines from properly rendering your pages.

# Bad practice Disallow: /assets/css/ Disallow: /js/
Using Comments Incorrectly

Robots.txt doesn't support comments on the same line as directives.

# Bad practice Disallow: /private/ # Block private area # Good practice # Block private area Disallow: /private/
Conflicting Directives

Longer paths take precedence over shorter ones. Be careful with conflicting rules.

# This will block /category but allow /category/products Disallow: /category Allow: /category/products
Blocking with Allow All

Order matters in robots.txt. Specific rules should come after general ones.

# Bad practice - /private will still be blocked Allow: / Disallow: /private # Good practice Disallow: /private Allow: /

Real-World Robots.txt Examples

Google
User-agent: * Allow: /search/about Allow: /search/static Allow: /search/howsearchworks Disallow: /search Disallow: /m/ Disallow: /u/ Allow: /$ Sitemap: https://www.google.com/sitemap.xml
Amazon
User-agent: * Disallow: /exec/obidos/tg/browse/-/ Disallow: /gp/aw/ Disallow: /gp/offer-listing/ Disallow: /gp/pdp/ Disallow: /gp/help/ Disallow: /gp/cart/ Disallow: /gp/shipping-rates/ Disallow: /gp/prime/ Sitemap: https://www.amazon.com/sitemap.xml
Wikipedia
User-agent: * Allow: /w/api.php?action=mobileview& Allow: /w/api.php?action=parse&mobileformat=html& Disallow: /w/ Disallow: /trap/ Disallow: /wiki/Special:Search Disallow: /wiki/Special%3ASearch Crawl-delay: 10 Sitemap: https://en.wikipedia.org/sitemap.xml

Implementation Guide

1

Upload to root directory: Place robots.txt in the root directory of your domain (e.g., https://example.com/robots.txt)

2

Test with Google Search Console: Use the robots.txt tester in Google Search Console to validate your file.

3

Monitor crawl errors: Regularly check for crawl errors in search console to identify issues.

4

Update regularly: Review and update your robots.txt file as your site structure changes.

Important: Robots.txt is not a security measure. Blocked pages can still be accessed if linked from other sites. Use proper authentication for sensitive content.

Robots.txt Directives

  • User-agent Specifies which crawler the rules apply to
  • Disallow Blocks access to specific paths
  • Allow Allows access to specific paths (override Disallow)
  • Crawl-delay Time between successive requests
  • Sitemap Location of XML sitemap
  • Host Preferred domain (deprecated)

Common User Agents

  • * All user agents
  • Googlebot Google's web crawler
  • Googlebot-Image Google's image crawler
  • Bingbot Microsoft Bing's crawler
  • Slurp Yahoo's crawler
  • Baiduspider Baidu's crawler
  • DuckDuckBot DuckDuckGo's crawler

Robots.txt Best Practices

  • Always include your sitemap
  • Test with Google Search Console
  • Use Allow to override Disallow
  • Place in root directory
  • Use Crawl-delay for large sites
  • Don't block CSS/JS files
  • Update regularly