The very excellent Google Search Console needs a bit of “real world” help.
For anyone that doesn’t know what GSC is, it’s a tool for website owners and webmasters to understand how they are performing on Google Search, and what they can do to improve their appearance on search to bring more relevant traffic to their websites.
Google Search Console provides excellent Learn More tips and advice, but the reality is, not all data and websites are the same.
Page indexing – reasons why pages aren’t indexed:
- Excluded by ‘noindex’ tag
- Page with redirect Website
- Duplicate without user-selected canonical Website
- Soft 404 Website
- Not found (404) Website
- Crawled – currently not indexed Google systems
- Discovered – currently not indexed Google systems
Lets’s take them one at a time and offer an alternative view
- robots.txt – Excluded by ‘noindex’ tag
While we understand Google may simply be telling us about the exclusion, put simply, the owner does not want the page/folder indexed. Call it blocked if you like.
- Duplicate without user-selected canonical Website
This one is very frustating, largely because Google does not provide the address of the duplicate page.
Even with a clear canonical tag, Google reports the page as a duplicate and refuses to index it.
There are many instances where a page may appear a duplicate, when in fact two pages with same name and slug have different content.
Example: New Year’s Eve is the same date in multiple locations (states).
- Events – Not found (404)
This is a big issue for us. The majority of reports are for “event” pages, where the event has taken place and the page deactivated.
It is NOT 404 or not found, it is a deactivated event. The content is appropriately marked (schema.org, class=”event-date” itemprop=”startDate/endDate”).
- Soft 404 – These pages aren’t indexed or served on Google