Have you found duplicate content on your site? No worries, it’s all right.
There are weighty reasons why your website includes several URLs that lead to the same page or duplicate content on different URLs. Moreover, this is a usual practice for online stores in general to have content like that; which is normal because of the typical e-commerce functionality.
And though the case is not unheard of and your rankings won’t be hurt for it, you need to eliminate the problem and following are a few reasons why.
Can’t wait to get down to business? Do all the basic settings with our Magento 2 SEO extension.
- Why work out the duplicate content issue?
- What are the types and most common examples of duplicate content in Magento?
- What is a canonical URL?
- How does Googlebot choose the canonical URL?
- How to learn what URL is canonical?
- Additional methods on how to specify a canonical page
- How to set canonical URLs in Magento 2?
- What are the other options to deal with duplicate content?
- Partial duplicates in Magento
- Product sorting
- Variations of the same product
- Useful add-ons
Why work out the duplicate content issue?
- To point out a URL that you want users to see in search results;
- To help search engines assign all the duplicate URLs (from your and other websites) to a canonical one;
- To make product metrics easier by allotting the duplicate product pages to a definite piece of content;
- To manage the syndicated content posted on other resources and assign ranking to a preferred URL;
- To exclude the duplicate pages from crawling and let Google spend time on crawling new pages to gain the maximum out of your site.
So, how to solve the duplicate issue for good? The easiest and most powerful way is to set canonical URLs in your Magento.
What are the types and most common examples of duplicate content in Magento?
Partial duplicates are the case when only a minor part of the content or its layout is unique.
Here are the most common examples of partial duplicates in Magento stores (you can click on any example and jump to its detailed overview).
Full duplicates are the case when the content on two or more pages is identical.
The most common example of full duplicates in Magento is:
- The same product in different categories
Are you ready? Then let’s get down to the details!
What is a canonical URL?
A canonical URL is an address that is chosen as a ‘preferred’ one for search engine indexation. You may also hear users say ‘canonical tag’ keeping in mind an HTML attribute, which you apply to website pages to tell search engines where to assign the search value.
The canonicalization is necessary for pages with duplicate or very similar content. Thus, you can indicate which page is the main.
E.g.: Let’s say that you have these pages for the same product:
If you won’t make any settings on your Magento store, search engines will automatically choose one of them as canonical considering it the most relevant. This means you won’t be able to control the choice, until making any changes.
Therefore, you need to tell search engines which one is canonical and set the 301 redirect for all the rest.
How does Googlebot choose the canonical URL?
Googlebot uses multiple tools to choose the canonical URLs. It analyzes the main content and compares them for info duplication. Between two very similar pages, it will choose the most full and useful for users.
Besides content, Google checks security protocols of the page (https), sitemaps, and “rel=canonical” labels.
The page that was chosen as canonical by Google will be more often crawled and will have a better position on SERP. Google search results usually contain canonical URLs. The only exception is when the duplicate suits better, for example, Google will choose https://m.example.com/news/ page for mobile users even if the canonical URL is https://example.com/news/. Moreover, although you can indicate the needed page as canonical, Google still may choose another one.
If your Magento 2 website has multiple languages, pages with the same content in different languages will not be considered as duplicates. But the page has to be fully translated, including header, footer, body, and other text.
How to learn what URL is canonical?
Google has the URL Inspection tool that can provide you with information about your URL. Here are a few notes that you should know about this test:
- You have to own the URL that you want to test. Please make sure that you are using the right account.
- If the testing page has duplicates, you will see the info about the canonical URL in the report, but only in case it also belongs to you.
- It is possible to test both AMP and non-AMP URLs.
For more information about this tool, please visit the official page.
Sometimes it happens that canonical URL is in a property that you don’t own. Here are some reasons why this issue can appear:
- Mistakes in site content localization. In this case, you need to check the official localization guidelines.
- Incorrect canonical tags. Click here to learn how to set up a canonical URL in Magento 2.
- Incorrect server settings. Contact your hosting to solve this problem.
- Hacker attack. Sometimes malefactors use 301 redirect or cross-domain rel=”canonical” link into the HTML <head> to mark malicious URL as canonical.
- External websites copy your content. If you are sure that a third-party website hosting your content, please leave the request to Google.
Additional methods on how to specify a canonical page
In addition to the methods described above, there are several other options for how to mark a link as canonical:
- rel=canonical <link> tag – Add this tag with the canonical link in the code for duplicate pages.
- rel=canonical HTTP header – Send a rel=canonical header in your page response.
- Sitemap – Define canonical URLs in a sitemap.
- 301 redirect – Set us 301 redirect to indicate the canonical page for Googlebot if the duplicate page is out to date.
How to set canonical URLs in Magento 2?
- Log in to the Admin Panel, go to Stores>Settings>Configuration:
- Expand the Catalog drop-down menu and choose Catalog. Then open the Search Engine Optimization section:
- Make the next changes:
If you need Google (or any other search engine) to index the pages with complete category URL path only, make the changes:
Use Canonical Link Meta Tag for Categories – ‘Yes’;
Use Canonical Link Meta Tag for Products – ‘No’;
If you want Google (or any other search engine) to index the products pages only, complete the next settings:
Use Canonical Link Meta Tag for Categories – ‘No’;
Use Canonical Link Meta Tag for Products – ‘Yes’;
If you want Google (or any other search engine) to index categories and products, enable both the options:
Use Canonical Link Meta Tag for Categories – ‘Yes’;
Use Canonical Link Meta Tag for Products – ‘Yes’;
Don’t forget to save the changes and clear the cache in the end.
What are the other options to deal with duplicate content?
You may have one product that can be found in two or more categories. For example:
There’s only one necklace but 3 different URLs! In case with Magento duplicate product URL addresses, no matter how great your product is, Google will consider it thin content. It’s so unfair! So let your great products be great to Google by making them unique.
Remove category from URL
Alternatively, you can remove the category path from the URL, so that each product will have only one address no matter in how many categories it can be found:
Leave only one category path in a product URL
If you have a red T-Shirt in 2 categories at once: T-Shirts and New, you can choose which category to use in the URL: either the longest one (T-Shirt) or the shortest one (new). This is possible with Unique Product URL extension.
Partial duplicates in Magento
As we already mentioned above, there are partial and full duplicates of content. All of the types you can resolve with canonicalization. But there are other options for how you can deal with them depending on where the issue appears.
That’s great when users can sort the products in your store by bestsellers, by newest, by Magento 2 price filter, number of reviews, etc. It’s even better if people can decide how many products should be displayed on the page: 20? 50? 100? But all these sorting options create pages with different characters (?, =, |) in the URLs:
The problem comes out when sorting pages get indexed and even cached by Google. Imagine how many such pages can exist! Thousands! And Google crawlers spend time indexing them while they could concentrate their resources on indexing more important pages of your site: categories, products, etc.
How to find product sorting pages
First go to your product pages and sort them by any option. Now you can see the parameters added to the URL after sorting (e.g., dir, sortby). Go to Google and search for site:yourdomain.com inurl:dir
Most likely you’ll see this:
Just click to include the omitted results and you’ll see the pages in your store containing “dir” in the URLs. It’s bad when these pages with parameters are indexed.
How to remove product sorting duplicates
Go to Google Webmaster Tools => Crawl => URL Parameters. Here you will see the parameters Google has found in the URLs of your store and how it crawls them. “Let Googlebot decide” is the default option there.
But when it comes to crawling your Magento store, it’s you but not Google who should decide which pages should be indexed, right? So if you haven’t decided this before, it’s high time you did it! Click “edit”, choose “Yes” in the dropdown menu and then – “No URLs”.
You can also add parameters that are not listed in GWT and set crawling options for Google. But be careful and check twice (or even three times) before blocking the URLs with these parameters.
Your Magento store is big as you have lots of great products there, right? But even you have only a few products (that are also great!), they are still placed on the pages with pagination options.
How to find paginated duplicates
To find paginated pages in your Magento store, go to Google and search for site:yoursite.com inurl:page. This search returns all the pages containing “page” in the address within your site.
How to remove duplicates
Pagination with rel=”next” and rel=”prev”
This option was specifically created by Google to fight duplicate paginated results. The idea is that all paginated pages are connected like links in a chain:
If we take these pages as an example:
Then we should put the prev/next instructions in the following way:
<link href=” http://www.site.com/category1.htm?page=2″ rel=”next” />
in the <head> of http://www.site.com/category1.htm.
<link rel=”prev” href=” http://www.site.com/category1.htm ” />
<link rel=”next” href=” http://www.site.com/category1.htm?page=3″ />
in the <head> of http://www.site.com/category1.htm?page=2.
<link rel=”prev” href=” http://www.site.com/category1.htm?page=2″ />
in the <head> of http://www.site.com/category1.htm?page=3 (let’s imagine this is the last page).
Pagination with robots meta tag
If you don’t want Google to crawl all the paginated pages, you can use this meta tag: <meta name=”value” content=”value”> within the head tag <head>…</head> of a duplicate content page. E.g.: <meta name=”robots” content=”noindex,nofollow”>. In doing so, search engines won’t ignore the links on the duplicate content. The value of robots with the name of a search engine can be changed depending on your needs.
Besides, make sure not to block URLs with the robot.txt file. Some Magento users indicate URLs that contain duplicate content in the robot.txt files to block the pages from crawling. However, Panda Update ignores practices that block crawlers and bots consider the pages as unique ones. But what is even worse, other websites will still be able to link to the blocked pages and as a result, you won’t get any benefit for your SEO from the backlinking. Therefore, it’s better to mark the pages with duplicate content with the ‘canonical’ tag.
There are a few things to remember:
- The first paginated page should contain only rel=”next”
- The last paginated page should contain only rel=”prev”
Google allows using both canonicals and rel=”next” and rel=”prev” on one page
Variations of the same product
Imagine you sell mugs (or you really sell them?) and have landing pages for each color:
The characteristics are the same, description is the same, layout is the same… So what’s new? Just picture! Unfortunately, it’s too little for Google to treat such pages as unique. This means that all product variations found on different pages are partial duplicates that act like a magnet for Google Panda.
How to find product variations
As a Magento shop owner, you probably know all your products and can make a list of their variations. Alternatively, you can search in Google for:
site:yoursite.com “here comes a short excerpt from the product description”
This way you will find all the pages of your site containing this very excerpt.
How to remove duplicates
One page for all variations
You can create a single page for a particular product and list all its variations there.
This way you have one unique page instead of several duplicate ones.
Make each variation page unique
The hardest way to solve duplication issues with product variations is to make each variation page unique. You will have to add different product descriptions and meta info. Yes, this is extremely time consuming. So be sure to grab a cup of coffee before starting this long journey!
When running several Magento localized store views in different languages, you can apply the rel=hreflang tag to show the search engines how to choose a needed version of the content. E.g.: If you have two store views in English and Italian, you should add the tag: “<link rel=”alternate” href=”https://example.com” hreflang=”en-it” />” to the Italian store view. The same method should be applied to all the rest localized store views.
What does it give? It allows you to avoid the risks when search engines consider the content as duplicate.
For working with duplicate content and canonicalization you need to have a real understanding of SEO. Otherwise, you may lose the rating positions you’ve already achieved.
If you want to make the work clearer, less time-consuming and more efficient, you may try our ready-made solutions for Magento 2.
With Magento 2 AJAX Layered Navigation, for instance, you get special possibilities, such as:
- flexible canonical modes;
- handy detached Canonical URL grid;
- and the ability to set structure of canonical URLs for ‘key’-related and category pages:
In addition to the AJAX Layered Navigation, Magento solutions such as SEO Toolkit for Magento 2 offer a complete list of SEO hits you may apply to boost your platform basic settings. Using the toolkit, you can act in numerous optimization ways at the same time:
- Unique product URLs;
- XML Sitemap;
- HTML Sitemap;
- Pager Optimization;
- Google Rich Snippets;
- Relevant Meta Tags;
- No duplicate content;
- Cross Linking.