Googlebot only crawls and indexes first 15MB of HTML content on page

Posted by Edith MacLeod on 27 Jun, 2022
View comments SEO
Google has documented an existing policy. What are the SEO takeaways?

Googlebot.

In an update to its Googlebot Help document, Google has specified that Googlebot will only crawl and index the first 15MB of an HTML file or supported text-based file.

"Googlebot can crawl the first 15MB of an HTML file or supported text-based file. Any resources referenced in the HTML such as images, videos, CSS, and JavaScript are fetched separately. After the first 15MB of the file, Googlebot stops crawling and only considers the first 15MB of the file for indexing. The file size limit is applied on the uncompressed data. Other crawlers may have different limits."

The update caused some head scratching among SEOs. For example, would images count towards the size limit, meaning text below images which had reached the limit just be ignored?

In response, Google’s John Mueller tweeted on 24 June to clarify that embedded resources or content with IMG tags would not count as part of the HTML file.

John Mueller tweet.

John Mueller also confirmed that this is not a change, just official documentation of an already existing policy.

John Mueller 2.

SEO best practice

Google has now put on the record what the crawl cutoff is for Googlebot. 15MB is a large amount, however, so there’s no need for undue worry.

It’s good practice (also editorially) to place important content at the top of the page to ensure it’s not missed, so Google can rank your page appropriately.

It’s also a good idea to keep your web pages light. This is better both for users, who will just move on if your page takes too long to load, and for crawlers such as Googlebot. 

You can check your HTML page size with free tools such as sitechecker, and you can use the URL Inspection tool in Search Console to see which parts of the page Google renders and sees within the debugging tool.

Update: In light of the confusion caused by this documentation of the crawl limit, Google published a blog post clarifying the content the 15MB limit applies to.

The post reiterates that, with the existing median size for an HTML file being 30KB, the overwhelming majority of users will not be affected by this crawl limit.  Google adds:

"However, if you are the owner of an HTML page that's over 15 MB, perhaps you could at least move some inline scripts and CSS dust to external files, pretty please."

Read full details in Google's Search Central blog.

Recent articles

Google retires Page Experience report in Search Console
Posted by Edith MacLeod on 19 November 2024
Google Maps now lets you search for products nearby
Posted by Edith MacLeod on 18 November 2024
Google rolls out November 2024 core update
Posted by Edith MacLeod on 12 November 2024
14 essential types of visual content to use [Infographic]
Posted by Wordtracker on 3 November 2024
OpenAI launches ChatGPT search
Posted by Edith MacLeod on 31 October 2024