SitemapScan

Robots Signals

Robots Signals is a public archive of user-agent declarations found in robots.txt across recent sitemap checks. It helps separate search crawlers, AI bots, assistants, ads, publishers, commerce, security, and the long tail of crawler families in the current all time snapshot.

Snapshot window: All time.

What robots signals tell you

These pages do not show access-log traffic. Instead, they show which crawler families site owners explicitly mention in robots.txt. That makes the dataset useful for understanding intent: search indexing intent, AI access policy, syndication posture, monitoring behavior, platform verification, and the long tail of operational bot handling.

How to use this archive

Use the overview page when you want to see the top families and the top raw user-agent lines. Use subgroup pages when you want a cleaner long-tail landing page around one crawler family, such as search crawlers, AI crawlers, assistant bots, or security bots.

Related pages

Why the all time view matters

The all-time window is better for seeing durable long-tail bot patterns and broader robots.txt taxonomy coverage.

FAQ

What are Robots Signals on SitemapScan?

Robots Signals is a public aggregation view of which user-agent families appear in robots.txt across recent sitemap checks. It groups raw user-agent lines into search, AI, assistants, ads, publishers, monitoring, security, commerce, and other bot families.

Does Robots Signals measure real crawler traffic?

No. It measures robots.txt declarations found during public checks. It tells you which bots are mentioned, not how often they actually visited a site.

What does the all time window change?

It changes the public archive window used for aggregation. A shorter window shows fresher behavior, while a longer one shows more stable patterns and a broader long tail of agents.

Open the live interactive Robots Signals view