We have all heard it before – Content is king!
With the push to please the king and develop as much content as possible, websites sometimes engage in practices that end up hurting them. Duplicating content is one of these black hat techniques that webmasters end up using.
In this article we discuss what duplicate content is, how it can impact SEO performance, and ways to monitor duplicate content on your website.
What is duplicate content?
Duplicate content is any content that appears in more than one place on the internet. This can include content that is exactly the same or that is very similar. Moreover, duplicate content can be an entire page of content or even just an excerpt from a particular page.
Duplicate content is closely related to plagiarism. Whereas, a third party has taken the content and tried to claim it as their own. Just as plagiarism in the academic field can be damaging, so can duplicate content for the online world and websites.
To reiterate, duplicate content is content that is word for word copied and pasted onto a new third party URL or even on the same URL.
Now that we know what duplicate content is, let’s discuss how it can impact the SEO performance of a website.
How does duplicate content impact SEO?
When it comes to indexing, crawling and ranking pages, search engines do not want to spend the resources in looking through duplicate content. Whereas, duplicating exact copies of content will add no value to their users.
Google and other search engines want to index and rank content that is unique and is distinct information that adds a new perspective, or new information on a specific topic. Google does not like this type of content so much that they have released extensive documentation on Google Search Central.
Google’s documentations states:
“Google tries hard to index and show pages with distinct information.”
And in regards to SEO and ranking, Google says:
“As a result, the ranking of the site may suffer, or the site might be removed entirely from the Google index, in which case it will no longer appear in search results.”
If Google is releasing statements on the topic, and has devoted a whole page of documentation to the topic, you know it must be important.
Therefore, let’s dive a little deeper into the effects of duplicate content on your website.
Less Organic Traffic and fewer Indexed Pages:
As you can see in Google’s statements above, they want to provide and show ‘distinct information’ to their search engine users. This means that if they discover duplicate content, either on your own website, or on a third party website, you are forcing Google to choose which one is the original.
And making that choice can be difficult for Google sometimes.
For example, if there is three different pages that have the same information on them (this can be because you copy and pasted content, or because a duplicate page has been formed by your server for whatever reason) Google will have to decide which one is the original, more valuable piece of content to index and rank.
Sometimes, Google doesn’t choose right though. Resulting in the sub-optimal page being displayed on their SERPs. When this happens, you will have a difficult time ranking well and thus driving organic traffic to your website.
To go a little deeper into the issues of indexing let’s look at three issues that could take place:
- Search engines will struggle to identify which version(s) to include/exclude from their indices
- They will have a difficult time in knowing what to do with the various signals and direct link metrics the web page is receiving (trust, authority, anchor text, etc.) and if they should direct it to one page, or separate it between multiple web page versions
- The engines won’t know which version or versions of the page they should rank for particular queries
Penalty (Extremely Rare):
Another issue, although rare, that can occur is the website being penalized, which can result in a complete indexing of the website.
This result is quite rare though. And is usually reserved for websites that are scraping content, or copying content directly from one site and placing it on their own, with no citations or references back to the original page.
Penalties like this can be disastrous for websites as they will result in your website being whipped clean from search engines’ indexes. Once this happens, it is nearly impossible to drive any sort of traffic to your website, or ever have the website back in good standing with search engines.
Now that we know how duplicate content will affect a website’s SEO performance, let us discuss what a few different types of duplicate content are.
Examples of duplicate content
In this section we will look at three common duplicate content examples. Two out of our three examples may go unnoticed, or are committed unintentionally by webmasters. This is because certain settings or parameters were set and resulted in a duplicate piece of content being created.
However, that doesn’t mean that website owners don’t purposely create duplicate content. Whereas, Raven Tools conducted a study which showed that around 29% of websites currently do, or have done, some form of duplicate content on purpose.
To ensure your website stays out of that 29%, let’s take a look at three examples or duplicate content.
1. URL variations
Various parameters for your URLs, set by tracking code and analytic code can result in URLs being duplicated.
URL parameters, such as click tracking and some analytics code, can cause duplicate content issues. This can be a problem caused not only by the parameters themselves, but also the order in which those parameters appear in the URL itself.
- One such example of this is when your parameters set different session ID tags for users. The user ID is then stored in the URL, making the URL different, but still housing the same content as before.
- Another way URL variations can end up causing problems is with printer-friendly versions of content. Whereas, one URL will have the word ‘print’ in it as an extension and the other doesn’t.
To avoid these common issues, best practices state that when possible, try avoiding any URL parameters as they can easily result in duplicate content. Instead try using scripts to pass on the information instead!
2. HTTP vs. HTTPS
With HTTPS becoming the standard for web pages and their security, sometimes HTTP pages still get created. When this happens, it means that two separate versions of the same web page are being hosted by the website. Whereas, if one page has the prefix http:// and the other has https://, and both of them are live on your website, and discoverable by search engines, you have actually created duplicate content. This type of duplicate content is what confuses search engines as they now have to pick between the two pages to rank.
3. Scraped or copied content
Content that is copied or ‘scraped’ from a website is one of the leading issues when it comes to duplicate content. With this method, websites and content writers will select a keyword and then go through the top ranking pages for the specific query. They will then proceed to copy all of the content, and rewrite/repurpose it or completely copy it and try to pass it as their own.
This method can be done for any type of content including blog posts, web pages, editorials, product pages and news.
Copying content is bad for both your website visitors as well as for search engines. First, by copying content, a website is not providing any new value to the industry or community. They are simply copy and pasting content without bringing any new ideas, conversations or statistics to the overall discussion.
In terms of search engines, they will ultimately notice that the content has been scraped and plagiarized. And, if they notice this enough, they will place penalties on your website, which will successfully stop all organic traffic to your website.
How to monitor duplicate content:
Even if you don’t believe you are making duplicate content, it is a good practice to frequently check at times. To check your content, there are a couple of quick and easy ways we at Clear Door SEO like to use.
- Site Audit Tools: This is probably the easiest way to check if you have mistakenly added multiple URLs. To check for duplicate URLs, you can use Ahrefs free site audit tool. By running a site crawl, you can see which pages have been duplicated and which ones you may want to consider putting a canonical tag on.
- Website Analysis tools: While site audit tools will help you find issues with duplicate pages, Siteliner will help you check the percentage of duplicate content internally on your site. It crawls through all of your pages and looks for similarities in the content. Duplicate content on your own site will also damage your performance as you will have multiple pages trying to rank for the same keywords.
Key takeaways: Duplicate content
It is important to monitor and check your website’s content for duplications. Whereas, as we have discussed above, it will hurt both your SEO performance as well as damage your reputation in front of your visitors.
With damaged reputations, and Google enforced penalties at stake, it is best to steer clear of trying to duplicate other website’s content. Remember, your website is your brand and company, if people believe that you don’t provide value, or are trying to manipulate the system in any way, they will make sure other people know in your reviews and on social media.
Even though content is king and we are trying to please search engines with new content, it will always be better in the long run to produce content that is unique and that adds new ideas to the industry. Take your time, and set out to make unique and high-quality content for your audience.