So what does “canonical” mean … what is a Canonical URL?
A canonical URL is the URL of the specific page that Google considers to be the master document. The master is the most representative version of a set of duplicate or close copies of the same page. There should be only one master and used correctly, a canonical approach will prevent Google becoming confused about similar content.
I don’t know about Google becoming confused … not a particularly useful statement if you don’t know what the words mean. Is that the sort of thing you were looking for?
I’m guessing that you’re here because you are just starting out. You’re at the point where you might not be ready for the deep-dive, but you would still like to understand the concepts. You want to know what “Canonical URL” means but you don’t want to be digging around in the techie details to find out. You can save that for later.
You don’t need to know all the details to understand the principles, but if you understand the principles, you can begin to improve the way you do stuff. If you know what a canonical URL is, what it does and why it’s needed, you can start tweaking your site for the better.
It’s all about maximising the potential of your visitors’ searches.
Here are the basics that you need to know so that you can have a conversation about Canonical URLs without having to resort to looking confused.
There’s enough here to get started but if you have some burning desire to extend your knowledge further into the complexities of canonical URLs, I have provided some links. If you want to go and do that now, here’s a YouTube video about Canonical URLs that tells you more than you probably need to know at the moment.
5-things you need to know about Canonical URLs
When I began, I knew virtually nothing about canonical URLs. I had that canonical look about me … errr! I didn’t even know what “canonical” meant. I had come across references to “canonical” across WordPress; I could spell “canonical” but I didn’t understand what it meant … and everything I googled looked a bit complicated.
But … being an analyst, I rolled up my sleeves and got stuck in. It seems to me that all of this canonical stuff is about managing different pages with similar content.
What happens if you have different pages with similar or duplicate content?
Managing duplicate or similar content looks, on the surface, to be fairly straightforward, but when search engines are crawling multiple URLs with similar content, they can get a bit confused. Without clear guidance, that confusion can cause problems for SEO.
Duplication of content across pages can dilute the ability of any single page to rank at the top end of the search results. Also, because the bots are following a plan they can ignore stuff. A bot crawling through similar pages – and ignoring stuff as it goes – may miss one of you more interesting pearls of wisdom. And … if the bots can’t tell which page is the master, your readers, looking for the right answer, may end up on the wrong page.
According to Google support, a canonical page is the preferred version of a set of pages of highly similar content. There are legitimate reasons for having duplicate and similar content on a site so it’s not necessarily an issue, but it can mess up SEO because it can confuse the bots.
Google, and the other search engines, will have a go at managing any similar content, but it’s much better to manage the relationship yourself rather than leave it to the bots.
What is a Canonical Page?
The canonical page here is the top-level, “Master” page. The canonical URL (that of the “Master” page) is used to create a link from the non-canonical page to the canonical page. This is for the benefit of the search engine. It helps the search engine determine which version of the page to return in the search results.
The “duplicate” page (above) is a duplicate of the “Master” page but the search engine can’t tell.
Let’s say for the moment that it doesn’t know. The search engine could return either the Master or the Duplicate in the search results, depending on the circumstances of the search. This isn’t necessarily a problem for your visitors as the pages are duplicate, but that’s not the whole story.
The history of the searches is based on interactions with visitors. The history will relate to one or the other of the pages, and each page will have its own profile. Is that the right word? I’m trying to keep the language simple without losing the detail. There is no particular issue with there being two profiles, but since the content is the same, each profile is effectively diluting the total value.
For example …
If an external site links to the Master and another site links to the Duplicate, there are two links to the same content but the profiles record one link for each page. If the relationship between the Duplicate and the Master is made visible to the search engine, both external links can be associated with the same content. This is why there is a canonical link. The canonical link associates the two pages, aggregating their individual values and raising the profile of the content. Each page by itself has a value of one but collectively, the content has a value of two.
With more pages, the effect is more pronounced.
What is a Canonical Link?
The canonical link connects two pages of similar content. In the canonical link diagrams, the canonical link links the bottom end of the arrow with the pointy end of the arrow. The link tells the search engine that the pages at both ends of the arrow should be treated as a single piece of content, with the page at the pointy end being the master.
Using a canonical tag in the right place will help prevent any issues that might be caused by the confused indexing of similar of duplicate content. The bots can and will have a go but ultimately the bots don’t know.
What is a Canonical tag?
This is an illustration rather than an instruction. This post is about helping beginners to understand the concept and the terminology, it’s not about implementing it. That being said, the canonical link looks a bit like this:
<link rel=”canonical” href=”https://example.com/bloogerme/start-here/” />
It is added to the <head> section of the non-canonical page and effectively merges the two pages into one, for the search engine. The external, and internal, links to both URLs (pages) are now counted against a single, canonical version of the URL. The value of those links is therefore aggregated.
What is a Similar Page?
A similar page is a page that contains pretty much the same content as another page, regardless of the circumstances. You may have a desktop version of a page and a mobile version. They are formatted differently but they contain the same information. The same is also true for pages that are largely similar but are not duplicates.
Google sees the desktop and mobile version of a page as duplicates.
Google sees the same page with multiple URLs as duplicates.
Google also sees archives and close copies of pages as duplicates.
It may be that you have an archived version of a page that is similar to a current version. I have published a snapshot of the beginner.beginner.beginner.beginner.beginner.beginner.beginner.beginner.blooger.me site as it stood at three weeks old. It was quite difficult to do and at the time, I didn’t know about the impact of publishing pages that are similar without control oversight. The impact for me isn’t going to be huge because there aren’t that many pages. It was only three weeks of effort …
I now need to go back and review the relationships between those “3-week” pages and the current, similar pages.
I am going to have to identify all the the “3-week” pages that are potential duplicates of current content. I also need to be looking for extremely similar content. Once I have done that, I need to identify the current pages that relate to similar and duplicate content because these are the relationships that will need to be identified by the canonical links.
Have a look at my canonical diagram, it shows the simple relationships between similar content and the master. The relationships could be more complex, but that’s too much for today.
What is a Master Page?
I guess the definition of Master Page should be the easy bit, the clue is in the title. The master page is the page that represents all the pages that are canonically linked to it.
Everything the search engine knows about the canonical group is attributed to the master. If the search engine follows an inward link to one of the non-canonical pages, the canonical link will point the search engine in the direction of the master. Simplistically, everything the crawler does is attributed to the master, which also means that the master is going to be returned in any search results.
A canonical, or master page is the preferred version of a set of pages of highly similar content. It does not need to be the definitive page but it probably makes sense if it is.
The canonical relationships you set up are going to tell the search engine which page it should be indexing. The bots can do it, but you can do it better.
Finally, the master page can include a self-referential link: a canonical link from the Master back to the Master. This leaves the search engine in no doubt as to the canonical status of the master page.
What have we learned?
Having similar or duplicate pages is not an error.
The canonical page is the most representative page of a set of similar pages.
A page can be self-referential, indicating that it is the original, or master, version.
Pages do not need to be fully identical to be considered identical.
The canonical URL is the one we want the bots to index.
Don’t leave it to the Bots!