Using 403/404 Errors for Rate Limiting Googlebot? Not Recommended

Google has observed that some websites are incorrectly using response codes (403/404) to limit how often their website is crawled by Googlebot. 

This can harm the website’s performance. To help website owners reduce the crawl rate of Googlebot, Google has released guidance. It has seen an increase in the misuse of response codes from web publishers and content delivery networks.

Rate Limiting Googlebot

Googlebot is a software that Google uses to crawl websites and download content automatically. Rate limiting means slowing down how fast Googlebot visits a website. 

The speed at which Googlebot requests webpages is called Google’s crawl rate. Sometimes a website owner may want to slow down Googlebot because it’s using up too many server resources.

Google recommends using the Google Search Console to limit the speed of Googlebot’s website crawl. This will slow down the crawl rate for a period of 90 days. 

Another way to limit Google’s crawl rate is to use Robots.txt to block Googlebot from crawling certain pages, categories, or the entire website.

Using robots.txt will only stop Google from crawling but not from indexing the website. However, using robots.txt may affect Google’s crawl pattern in the long run. 

Therefore, it is better to use the Google Search Console for limiting the crawl rate.

Stop Rate Limiting With 403/404

Google has advised publishers not to use 4XX response codes (except for the 429 response) to limit Google’s crawl rate. 

This is because they have noticed an increase in publishers using 403 and 404 error response codes for this purpose. 

The 403 response code means that the webpage cannot be visited by the visitor, while the 404 response code indicates that the webpage is completely gone. 

On the other hand, server error response code 429 means “too many requests” and is a valid error response.

If publishers continue to use 403 and 404 error response codes, Google may eventually remove those web pages from their search index, causing them to not appear in search results.

Google says :

“In recent months, Google has observed an increase in website owners and content delivery networks using 404 and other 4xx client errors (excluding 429) to limit the crawl rate of Googlebot. However, this approach is not recommended by Google. 

In short, they are advising against this practice.”

To limit Googlebot’s crawl rate, Google recommends using 500, 503, or 429 error response codes. The 500 response code means there was an internal server error, while the 503 response means that the server is currently unable to handle the request for a webpage. 

Google treats both of these responses as temporary errors, so it will come back later to check if the pages are available again. 

A 429 error response means that the bot is making too many requests and can be asked to wait before re-crawling. 

Google recommends checking their Developer Page for more information on rate limiting Googlebot and advises against using 403s or 404s for this purpose.

Read Google’s blog post to more about it: Information source 

error: Content is protected !!