Discovering How Your Content Is (and Isn’t) Being Indexed

April 17, 2018

In the earlier days of SEO, things were pretty black and white: Google indexed your HTML content, and if you searched for that content, you’d see it in bold type within the search results; your JavaScript content, on the other hand, was completely invisible to search engines.

Things have changed a lot since then, to the point where Google is so confident about their ability to render JavaScript that they are no longer going to support the workaround they developed for crawling AJAX content. However, JavaScript crawling is still far from perfect. Even when Google is able to render the content, it can take significantly longer for the fully rendered content to be indexed.

Furthermore, even when your content is completely loaded within the on-page HTML, Google takes CSS and JavaScript into account, and some indexed content may be treated differently depending on how it is displayed on the page.

In the past, Google’s cache was a reliable way to verify how your content was being picked up, but these days, there are sometimes discrepancies between the cached and the actual indexed version of a page. For example, a text-only cache will only display content loaded within the HTML source.

In this article, we’ll go over different ways to determine if and how your content is being seen by Google.

 Tools

  • Google’s “info:” operator
  • Google’s “site:” operator
  • Chrome’s DevTools (“Elements” panel)
    • Alternate: FireFox’s “View Selection Source” feature
  • The Fetch as Google tool in Google Search Console

Is Your Page in the Index?

The first thing to determine is whether your page is even indexed. Fortunately, this is easy. Google’s “info:” operator will give you details about a particular page. If the page is not indexed, no information will be displayed.

Simply type “info:” followed by the page’s URL. There should be no space between the colon and the start of the URL. The HTTP/HTTPS protocol is optional.

Be sure to use the canonical version of the URL. If a page exists at multiple URLs, it’s most likely to be indexed at the URL referenced in the canonical tag.

If a URL isn’t indexed, nothing will be displayed.

There are several reasons a page may not be indexed, including:

  • Google hasn’t found/crawled it yet.
  • It is blocked in robots.txt.
  • It has a noindex tag.
    • When looking for a noindex tag on the page, be sure to use Chrome’s DevTools (“Elements” panel) to search within the code for the fully rendered DOM, rather than just the HTML source. 

Is Your On-Page Content Indexed?

Once we’re certain a page is indexed, we’ll want to determine if Google is actually indexing all of the content on the page. The easiest way to do this is by using the “site:” operator and searching for a snippet of text from the page, in quotes. If only some of the content on your page is loaded via JavaScript, be sure to test a snippet of that content to verify that it’s indexed. Simply type “site:” followed by the page’s URL. Again, there should be no space between the colon and the start of the URL, and the HTTP/HTTPS protocol is optional. After the URL, include a space, followed by a short snippet of the content you want to verify is indexed.

For example, we can verify that the main text on The Search Agency’s Search Engine Optimization page is properly indexed:

If the page being investigated doesn’t appear in these results, this indicates the content wasn’t indexed.

“Hidden” Content

You’ll notice the content we searched for in the above example is bolded in Google’s search result. This indicates that Google knows this content is immediately visible to users on the page. That is, they don’t have to click a “Read More” button or scroll through a carousel to view the content.

If Google doesn’t think your content is immediately visible to users, they’ll still index this content, and the page will still appear in the results for this type of search. However, the content may not appear in the snippet for the result.

This is important because research has shown that content that isn’t immediately visible to users isn’t weighted as heavily as content that is. See Moz’s article on CSS and JavaScript “hidden text” for more information.

That said, Google has stated this won’t be the case when mobile-first indexing rolls out. Per Google’s John Mueller, “So with the mobile-first indexing will index the the mobile version of the page. And on the mobile version of the page it can be that you have these kind of tabs and folders and things like that, which we will still treat as normal content on the page even. Even if it is hidden on the initial view.” 

Is Google Technically Able to Render Your Content?

If the steps above show that some or all of the content on your pages isn’t being indexed, the next step is to find out if Google is technically able to render the content.

Fetch as Google

The easiest way to determine if Google is capable of rendering your content is to use the “Fetch as Google” tool available in Google Search Console. Just put in the URL of the page you want to check and click the “Fetch and Render” button.

It may take a minute to process, but once the render is complete, you can click on the URL to see an image of how Google was able to see the page. Here, you can verify that the content you want indexed appears in the screenshot.

Ideally, the screenshot under “This is how Googlebot saw the page” will match the screenshot under “This is how a visitor to your website would have seen the page.”

If the desired content isn’t included in Googlebot’s screenshot on the left, there may be an easy fix. Below the content shown in these screenshots, the tool tells you if any resources are being blocked by robots.txt. If any relevant JavaScript files are being blocked, this may prevent Google from being able to render the content. Any blocked scripts or CSS files with a medium or high severity should be unblocked to allow Google to fully render the page. Unblock and try again.

If there are no blocked resources and the “Fetch as Google” tool still doesn’t display your content, there may be other technical issues on the page, or Google may not understand the framework you’re using. Further investigation is necessary.

If the “Fetch as Google” tool shows that Google is able to see your content, and yet your content still isn’t being indexed, read on.

How Quickly is Your Content Being Indexed?

If your page is indexed and your content is rendered within the HTML source, chances are that content will be indexed at the time the page is indexed. However, when it comes to JavaScript-loaded content, it’s inconsistent.

If a page is indexed and “Fetch as Google” shows that Google can see the content on the page, but the content isn’t being indexed, it may simply take more time (several days or more) for Google to fully render and index the page.

If you publish content frequently, you can get an idea of how many pieces of your recently published content are indexed (and which ones) by using the “site:” operater to do a search of your site, then selecting “Tools” and a span of time from the first drop-down that appears. For example, you can see all content indexed in the past week:

Any delays in getting your content indexed can mean lost traffic. If your content is particularly newsworthy or time sensitive, you may completely miss your chance to appear in the results for relevant searches.

If you’ve made your content fully accessible to Google (i.e. the page isn’t noindexed and it and none of its resources are blocked via robots.txt) and you continue to see inconsistent or delayed indexation of on-page content, there are other options, such as:

  • Rebuilding the site to render all crucial content server-side
  • Using a different JavaScript framework that may be more SEO-friendly
  • Using a pre-render service to create HTML snapshots of your JavaScript-dependent pages

The alternative is simply waiting for Google to get even better at crawling all JavaScript-rendered content, but no one knows how long that will take. Can you afford to wait?

Follow ForwardPMX

Our Newsletter

Sign up to receive our monthly insights.

  • This field is for validation purposes and should be left unchanged.

You May Find These Interesting

How Can Algorithms Help Brands Win the COVID-19 Rebound?

“We are living through challenging times.” The now everyday adage referring to the current global pandemic has presented every industry across the globe with a very human problem. For those lucky enough to be unaffected by its health implications, the threat of a...

read more