The latest episode of the ‘Search Off The Record’ podcast features the Google Search Relations team sharing valuable insights on managing Googlebot’s interactions with webpages.
Highlights
- Specific sections of an HTML page cannot be blocked from Googlebot crawling.
- The use of the data-nosnippet HTML attribute or an iframe can influence how content is displayed in search snippets.
- Implementing a disallow rule in the robots.txt file or setting up firewall rules based on Googlebot’s IP addresses can block Googlebot from accessing a website.
During the latest episode of the ‘Search Off The Record’ podcast, Google’s Search Relations team addressed various inquiries concerning webpage indexing. The discussion centered around methods to block Googlebot from crawling specific sections of a page and how to prevent Googlebot from accessing a website entirely. Notably, the questions were answered by Google’s John Mueller and Gary Illyes.
Blocking Googlebot From Specific Web Page Sections
When asked about blocking Googlebot from crawling specific sections of webpages, such as “also bought” areas on product pages, Google’s John Mueller stated that it is impossible to do so.
He acknowledged that there are no direct solutions to this issue, but suggested two potential approaches, neither of which he deemed ideal. One option is to use the data-nosnippet HTML attribute to prevent the text from appearing in search snippets.
Another approach involves using an iframe or JavaScript with the source blocked by robots.txt, although Mueller cautioned against this method.
John Mueller cautioned against using a robotted iframe or JavaScript file to block Googlebot, as it can lead to challenging issues in crawling and indexing that are difficult to troubleshoot.
He reassured listeners that if the content in question is duplicated across multiple pages, there is no need to prevent Googlebot from accessing it.
Blocking Googlebot From Accessing A Website
Gary Illyes offered a straightforward solution when asked about preventing Googlebot from accessing any part of a website.
He recommended using the robots.txt file and adding a “disallow: /” directive specifically for the Googlebot user agent. By keeping this rule in place, Googlebot will refrain from crawling the site.
For those in search of a more comprehensive solution, Gary Illyes presented an alternative method.
He suggested creating firewall rules that include Google’s IP ranges in a deny rule, enabling the blocking of network access from Googlebot. This approach provides an additional layer of control over preventing Googlebot from accessing a website.
See Google’s official documentation for a list of Googlebot’s IP addresses.
In Summary
While it is not possible to prevent Googlebot from accessing specific sections of an HTML page, there are techniques like using the data-nosnippet attribute that provide some control.
To block Googlebot from accessing your entire site, a straightforward disallow rule in the robots.txt file is effective. Alternatively, for more stringent measures, one can implement specific firewall rules.