An Extensive Guide on Magento Duplicate Content

Nothing is created equal, they say. They are wrong. And if you own an online shop, you are the one who knows that perfectly well as many pages on your site are literary equal, aren’t they? Our extensive guide will help you to solve the problem of Magento duplicate content and never allow it again.

So, you may ask: ‘Why do I need this if Google doesn’t ban websites with duplicate content?’. The arguments for are more than just auto-suggestions and overreacting:

  • Google can’t understand that all the Magento duplicate product URL addresses target the same location. Therefore, different variants of your website links won’t play out and get the page authority as it was expected;
  • Users will see the most relevant (according to Google) version of your website URL but not the one you need;
  • You are risking to lose crawler cycles as once Google bots discover the duplicate content, they won’t crawl your fresh content.

If you’re still not sure whether this guide is for you, just search for a keyword you rank for in Google and look for the results. If you see a not user-friendly URL or multiple URLs of pages with the same content, you have duplicate content on the website. Also, log in to your Google Webmasters search console account to see alerts about duplicate content. Examine the crawler metrics (GW account>Crawl>Crawl Stats) to see how many pages are already crawled, indexed. Then compare the stats with the real page quantity. If the number of those crawled, indexed pages is many times more than a real one, read on as there are most likely duplicate content problems.

 

What are the types and most common examples of duplicate content in Magento?

Partial duplicates are the case when only minor part of the content or its layout is unique.

partial duplicates in Magento

Here are the most common examples of partial duplicates in Magento stores (you can click on any example and and jump to its detailed overview).

Full duplicates are the case when the content on two or more pages is identical.

Full duplicates in Magento

The most common example of full duplicates in Magento is:

Are you ready? Then let’s get down to the details!

Partial Duplicates in Magento

 

1. Product Sorting

That’s great when users can sort the products in your store by bestsellers, by newest, by Magento 2 price filter, number of reviews, etc. It’s even better if people can decide how many products should be displayed on the page: 20? 50? 100? But all these sorting options create pages with different characters (?, =, |) in the URLs:

http://site.co.uk/category/products.htm?sortby=total_reviews|desc
http://site.co.uk/category/products.htm?sortby=total_reviews|asc
http://site.co.uk/category/products.htm?sortby=relevance|desc

The problem comes out when sorting pages get indexed and even cached by Google. Imagine how many such pages can exist! Thousands! And Google crawlers spend time indexing them while they could concentrate their resources on indexing more important pages of your site: categories, products, etc.

1.2. How to find product sorting pages

First go to your product pages and sort them by any option. Now you can see the parameters added to the URL after sorting (e.g., dir, sortby). Go to Google and search for site:yourdomain.com inurl:dir

Most likely you’ll see this:

Google Supplemental Index

Just click to include the omitted results and you’ll see the pages in your store containing “dir” in the URLs. It’s bad when these pages with parameters are indexed.

1.3. How to remove product sorting duplicates

 

1.3.1. Google Webmaster Tools

Go to Google Webmaster Tools => Crawl => URL Parameters. Here you will see the parameters Google has found in the URLs of your store and how it crawls them. “Let Googlebot decide” is the default option there.

URL Parameters in Google Webmaster Tools

But when it comes to crawling your Magento store, it’s you but not Google who should decide which pages should be indexed, right? So if you haven’t decided this before, it’s high time you did it! Click “edit”, choose “Yes” in the dropdown menu and then – “No URLs”.

Block indexing of URLs with parameters in GWT

You can also add parameters that are not listed in GWT and set crawling options for Google. But be careful and check twice (or even three times) before blocking the URLs with these parameters.

Pro Tip 1: Be patient It takes lots of time for Google to de-index URLs with parameters once they have been indexed, so be patient. You can also remove them from index manually via Google Webmaster Tools (Google Index > Remove URLs).

 

1.3.2. Rel=canonical

You can also use canonicalization for the sorting pages in your Magento store. This way they will be accessible for users but redirect the crawlers to pages without parameters.

Canonicals? Great! But how do I implement them?

You should add this code to the the sorting pages:

<link href="CategoryURL " rel="canonical" />

where CategoryURL is the address of the same category page without parameters. For example, the following pages:

http://site.co.uk/category/products.htm?sortby=total_reviews|desc 
http://site.co.uk/category/products.htm?sortby=total_reviews|asc 
http://site.co.uk/category/products.htm?sortby=relevance|desc

 

should canonicalize this page

http://site.co.uk/category/products.htm

 

After adding the code, give Google some time to re-index the pages and follow the new instructions, usually it takes a few days. Canonicalization set up correctly, you’ll see the cache of the canonicalized page (http://site.co.uk/category/products.htm) even if you check the cache of the sorting pages (cache:http://site.co.uk/category/products.htm?sortby=relevance|desc)

Pro Tip 2: Use canonicalization before GWT You can either block sorting pages duplicates in GWT or canonicalize the category pages. If you choose to make the both steps, use canonicalization first to be sure Google crawlers are able to crawl the page and follow the instructions. If you block them prior to implementing canonicalization, the latter might not work.

 

2. Pagination

Your Magento store is big as you have lots of great products there, right? But even you have only a few products (that are also great!), they are still placed on the pages with pagination options.

For example:

http://www.site.com/category1.htm?page=2 http://www.site.com/category1.htm?page=3

2.1. How to find paginated duplicates

To find paginated pages in your Magento store, go to Google and search for site:yoursite.com inurl:page. This search returns all the pages containing “page” in the address within your site.

2.2. How to remove duplicates

 

2.2.1. Canonicalization

You already know a lot about canonicalizing pages in your store, right? Just make rel=canonical tags on those paginated pages:

http://www.site.com/category1.htm?page=2
http://www.site.com/category1.htm?page=3
http://www.site.com/category1.htm?page=4

 

point to the category

http://www.site.com/category1.htm

Pro Tip 3: Use single canonical tag for each page Make sure each page contains only one rel=canonical tag. Otherwise, you’ll send Google contradictory signals that will be ignored or let Google decide how to act. Learn more about canonical URLs in Magento 1 and Magento 2.

 

2.2.2. Pagination with rel=”next” and rel=”prev”

This option was specifically created by Google to fight duplicate paginated results. The idea is that all paginated pages are connected like links in a chain:

how pages are connected by rel=prev and rel=next

If we take these pages as an example:

http://www.site.com/category1.htm http://www.site.com/category1.htm?page=2

Then we should put the prev/next instructions in the following way:

<link href=" http://www.site.com/category1.htm?page=2" rel="next" />

in the <head> of http://www.site.com/category1.htm.

<link rel="prev" href=" http://www.site.com/category1.htm " />
<link rel="next" href=" http://www.site.com/category1.htm?page=3" />

in the <head> of http://www.site.com/category1.htm?page=2.

<link rel="prev" href=" http://www.site.com/category1.htm?page=2" />

in the <head> of http://www.site.com/category1.htm?page=3 (let’s imagine this is the last page).

All this is not too complicated, so you can make it yourself or use a special module (our Improved Layered Navigation has such functionality) to implement rel=”next” and rel=”prev”. If you run an online store on the second platform version, try our Magento 2 Layered Navigation.

 

2.2.3 Pagination with Robots meta tag

If you don’t want Google to crawl all the paginated pages, you can use this meta tag: <meta name=”value” content=”value”> within the head tag <head>…</head> of a duplicate content page. E.g.: <meta name=”robots” content=”noindex,nofollow”>. In doing so, search engines won’t ignore the links on the duplicate content. The value of robots with the name of a search engine can be changed depending on your needs.
Besides, make sure not to block URLs with the robot.txt file. Some Magento users indicate URLs that contain duplicate content in the robot.txt files to block the pages from crawling. However, Panda Update ignores practices that block crawlers and bots consider the pages as unique ones. But what is even worse, other websites will still be able to link to the blocked pages and as a result, you won’t get any benefit for your SEO from the backlinking. Therefore, it’s better to mark the pages with duplicate content with the ‘canonical’ tag.

There are a few things to remember:

  1. The first paginated page should contain only rel=”next”
  2. The last paginated page should contain only rel=”prev”
  3. Google allows using both canonicals and rel=”next” and rel=”prev” on one page

Pro Tip 4: Make a plan Before implementing rel=”next” and rel=”prev”, you should create an outline of all your paginated pages and note the next and previous ones for each of them.

 

3. Variations of the same product

Imagine you sell mugs (or you really sell them?) and have landing pages for each color:

variations of one product

The characteristics are the same, description is the same, layout is the same… So what’s new? Just picture! Unfortunately, it’s too little for Google to treat such pages as unique. This means that all product variations found on different pages are partial duplicates that act like a magnet for Google Panda.

3.1. How to find product variations

As a Magento shop owner, you probably know all your products and can make a list of their variations. Alternatively, you can search in Google for:

site:yoursite.com “here comes a short excerpt from the product description”

This way you will find all the pages of your site containing this very excerpt.

3.2. How to remove duplicates

 

3.2.1. One page for all variations

You can create a single page for a particular product and list all its variations there.

product variations on one page

This way you have one unique page instead of several duplicate ones.

3.2.2. Rel=canonical

You can also use rel=canonical. Just choose one variation page and put canonical to it from other variations. This way the content will be seen by users but Google will have a copy of only one page that you’ve chosen.

3.2.3. Make each variation page unique

The hardest way to solve duplication issues with product variations is to make each variation page unique. You will have to add different product descriptions and meta info. Yes, this is extremely time consuming. So be sure to grab a cup of coffee before starting this long journey!

3.2.4. Rel=hreflang

When running several Magento localized store views in different languages, you can apply the rel=hreflang tag to show the search engines how to choose a needed version of the content. E.g.: If you have two store views in English and Italian, you should add the tag: “<link rel=”alternate” href=”https://example.com” hreflang=”en-it” />” to the Italian store view. The same method should be applied to all the rest localized store views.
What does it give? It allows you to avoid the risks when search engines consider the content as duplicated.

Full Duplicates in Magento

Now when you’ve found all the partial duplicates, it’s time to look for full ones. Ready, steady, go!

The same product in different categories

You may have one product that can be found in two or more categories. For example:

http://www.site.com/jewellery/necklace.html
http://www.site.com/for-her/necklace.html 
http://www.site.com/gifts/necklace.html

There’s only one necklace but 3 different URLs! In case with Magento duplicate product URL addresses, no matter how great your product is, Google will consider it thin content. It’s so unfair! So let your great products be great to Google by making them unique.

1.1. Rel=canonical

Just like with product variations I’ve talked above, you can choose one URL to show your product (necklace, as seen in the example) and the other pages will canonicalize it.

1.2. Remove category from URL

Alternatively, you can remove the category path from the URL, so that each product will have only one address no matter in how many categories it can be found:

http://www.site.com/necklace.html

 

Don’t know how to do that in Magento? There was a good tip in one of our post on SEO in Magento. Just go to System => Configuration => Catalog => Search Engine Optimization and switch “Use Categories Path for Product URLs” field to “No” and both “Canonical Link Meta Tag” fields to “Yes”.

remove category from URL in Magento

1.3. Leave only one category path in a product URL

If you have a red T-Shirt in 2 categories at once: T-Shirts and New, you can choose which category to use in the URL: either the longest one (T-Shirt) or the shortest one (new). This is possible with Unique Product URL extension.

Summary

Now you know how Google sees duplicate content on your store. Don’t let it decide how to treat your pages, suggest the best way possible instead. Take control over your site crawling!