developers.google.com

Google Crawler (User Agent) Overview | Google Search Central  |  Documentation  |  Google for Developers

  • ️Thu Mar 06 2025

Stay organized with collections Save and categorize content based on your preferences.

Overview of Google crawlers and fetchers (user agents)

Google uses crawlers and fetchers to perform actions for its products, either automatically or triggered by user request. Crawler (sometimes also called a "robot" or "spider") is a generic term for any program that is used to automatically discover and scan websites. Fetchers act as a program like wget that typically make a single request on behalf of a user. Google's clients fall into three categories:

Common crawlers The common crawlers used for Google's products (such as Googlebot). They always respect robots.txt rules for automatic crawls.
Special-case crawlers Special-case crawlers are similar to common crawlers, however are used by specific products where there's an agreement between the crawled site and the Google product about the crawl process. For example, AdsBot ignores the global robots.txt user agent (*) with the ad publisher's permission.
User-triggered fetchers User-triggered fetchers are part of tools and product functions where the end user triggers a fetch. For example, Google Site Verifier acts on the request of a user.

Technical properties of Google's crawlers and fetchers

Google's crawlers and fetchers are designed to be run simultaneously by thousands of machines to improve performance and scale as the web grows. To optimize bandwidth usage, these clients are distributed across many datacenters across the world so they're located near the sites that they might access. Therefore, your logs may show visits from several IP addresses. Google egresses primarily from IP addresses in the United States. In case Google detects that a site is blocking requests from the United States, it may attempt to crawl from IP addresses located in other countries.

Supported transfer protocols

Google's crawlers and fetchers support HTTP/1.1 and HTTP/2. The crawlers will use the protocol version that provides the best crawling performance and may switch protocols between crawling sessions depending on previous crawling statistics. The default protocol version used by Google's crawlers is HTTP/1.1; crawling over HTTP/2 may save computing resources (for example, CPU, RAM) for your site and Googlebot, but otherwise there's no Google-product specific benefit to the site (for example, no ranking boost in Google Search). To opt out from crawling over HTTP/2, instruct the server that's hosting your site to respond with a 421 HTTP status code when Google attempts to access your site over HTTP/2. If that's not feasible, you can send a message to the Crawling team (however this solution is temporary).

Google's crawler infrastructure also supports crawling through FTP (as defined by RFC959 and its updates) and FTPS (as defined by RFC4217 and its updates), however crawling through these protocols is rare.

Supported content encodings

Google's crawlers and fetchers support the following content encodings (compressions): gzip, deflate, and Brotli (br). The content encodings supported by each Google user agent is advertised in the Accept-Encoding header of each request they make. For example, Accept-Encoding: gzip, deflate, br.

Crawl rate and host load

Our goal is to crawl as many pages from your site as we can on each visit without overwhelming your server. If your site is having trouble keeping up with Google's crawling requests, you can reduce the crawl rate. Note that sending the inappropriate HTTP response code to Google's crawlers may affect how your site appears in Google products.

HTTP Caching

Google's crawling infrastructure supports heuristic HTTP caching as defined by the HTTP caching standard, specifically through the ETag response- and If-None-Match request header, and the Last-Modified response- and If-Modified-Since request header.

If both ETag and Last-Modified response header fields are present in the HTTP response, Google's crawlers use the ETag value as required by the HTTP standard. For Google's crawlers specifically, we recommend using ETag instead of the Last-Modified header to indicate caching preference as ETag doesn't have date formatting issues.

Other HTTP caching directives aren't supported.

Individual Google crawlers and fetchers may or may not make use of caching, depending on the needs of the product they're associated with. For example, Googlebot supports caching when re-crawling URLs for Google Search, and Storebot-Google only supports caching in certain conditions.

To implement HTTP caching for your site, get in touch with your hosting or content management system provider.

ETag and If-None-Match

Google's crawling infrastructure supports ETag and If-None-Match as defined by the HTTP Caching standard. Learn more about the ETag response header and its request header counterpart, If-None-Match.

Last-Modified and If-Modified-Since

Google's crawling infrastructure supports Last-Modified and If-Modified-Since as defined by the HTTP Caching standard with the following caveats:

  • The date in the Last-Modified header must be formatted according to the HTTP standard. To avoid parsing issues, we recommend using the following date format: "Weekday, DD Mon YYYY HH:MM:SS Timezone". For example, "Fri, 4 Sep 1998 19:15:56 GMT".
  • While not required, consider also setting the max-age field of the Cache-Control response header to help crawlers determine when to recrawl the specific URL. Set the value of the max-age field to the expected number of seconds the content will be unchanged. For example, Cache-Control: max-age=94043.

Learn more about the Last-Modified response header and its request header counterpart, If-Modified-Since.

Verifying Google's crawlers and fetchers

Google's crawlers identify themselves in three ways:

  1. The HTTP user-agent request header.
  2. The source IP address of the request.
  3. The reverse DNS hostname of the source IP.

Learn how to use these details to verify Google's crawlers and fetchers.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-03-06 UTC.

Need to tell us more? [[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-03-06 UTC."],[[["Google uses crawlers and fetchers, categorized as common, special-case, and user-triggered, to automatically discover and scan websites or make single requests on behalf of users."],["Google's crawlers and fetchers, distributed globally for optimized performance, primarily egress from US IP addresses and support HTTP/1.1, HTTP/2, FTP, and FTPS protocols for content access."],["Google aims for efficient crawling without overloading servers and supports content encodings like gzip, deflate, and Brotli, while also respecting robots.txt rules for automatic crawls."],["Google utilizes HTTP caching mechanisms, primarily ETag and Last-Modified headers, to minimize redundant data transfer and improve crawling efficiency."],["Google's crawlers can be verified through their user-agent, source IP address, and reverse DNS hostname, ensuring authenticity and security."]]],["Google's crawlers, which automatically discover and scan websites, and fetchers, which make single requests, serve Google products. Clients are categorized as common crawlers, special-case crawlers (with site-specific agreements), and user-triggered fetchers. They operate from global datacenters, use HTTP/1.1 or HTTP/2, and support gzip, deflate, and Brotli compression. Crawl rates can be adjusted to prevent server overload. Caching, via ETag and Last-Modified headers, is supported to optimize crawling efficiency. To identify a google crawler, use HTTP user-agent, source IP address, and the reverse DNS hostname.\n"]]