Find Resources Bigger Than 15 MB For Better Googlebot Crawling - Search Engine Journal

Last updated Wednesday, September 7, 2022 09:45 ET , Source: NewsService

Unlock and Access Your Most Valuable SEO Data

Replace (not provided) with ALL of your organic keywords inside of Adobe & Google Analytics. Analyze performance by 400+ dimensions and metrics.

TRY FOR FREE

Googlebot is an automatic and always-on web crawling system that keeps Google’s index refreshed.

The website worldwidewebsize.com estimates Google’s index to be more than 62 billion web pages.

Google’s search index is “well over 100,000,000 gigabytes in size.”

Googlebot and variants (smartphones, news, images, etc.) have certain constraints for the frequency of JavaScript rendering or the size of the resources.

Google uses crawling constraints to protect its own crawling resources and systems.

For instance, if a news website refreshes the recommended articles every 15 seconds, Googlebot might start to skip the frequently refreshed sections – since they won’t be relevant or valid after 15 seconds.

Years ago, Google announced that it does not crawl or use resources bigger than 15 MB.

On June 28, 2022, Google republished this blog post by stating that it does not use the excess part of the resources after 15 MB for crawling.

To emphasize that it rarely happens, Google stated that the “median size of an HTML file is 500 times smaller” than 15 MB.

Above, HTTPArchive.org shows the median desktop and mobile HTML file size. Thus, most websites do not have the problem of the 15 MB constraint for crawling.

But, the web is a big and chaotic place.

Understanding the nature of the...

Read Full Story: https://www.searchenginejournal.com/large-resources-googlebot-crawling/461937/

Your content is great. However, if any of the content contained herein violates any rights of yours, including those of copyright, please contact us immediately by e-mail at media[@]kissrpr.com.