Duplicate Content: Copy Cats Don't Win
More often than not, the issue of duplicate content for many businesses is not intentional. Most small to medium business owners are not well versed in search engine algorithms so they do not realize why their pages just aren’t showing up. The culprit, many times is duplicate content. This doesn’t always mean as the title suggests, plagiarism, but it can. Sometimes, it’s just because for the sake of efficiency, a business has used the same product descriptions on another site, or throughout the pages of that same site- there are many reasons why duplicate content may be an issue, and I thought I’d go into that a little bit. This is a pretty easily avoided problem, and one that can ultimately lead to the search engines blocking your site all together from indexing entirely.
Google defines non malicious duplicate content as:
Discussion forums that can generate both regular and stripped-down pages targeted at mobile devices
Store items shown or linked via multiple distinct URLs
Printer-only versions of web pages
The suggested means of handling this, in these cases is presenting your case to the search engines, and indicating which URL you prefer to have indexed. This is known as “canonicalization” – which makes a great deal of sense if you are selling products and you don’t want to rewrite the same information over a few URLs. You basically pick the URL you prefer to have indexed, let the search engines know. How you let the search engines know is a little involved.
First, you can specify canonical link for the different pages. (Link: http://wiki.whatwg.org/wiki/RelExtensions) In short, this code tells the search engine they all relate to one page. Secondly, you may want to simply include the relevant articles in your site map and then, set your preferred domain. Finally, with Google, at least, you can use something known as perimeter handling to present those areas you’d just rather it did not index, so that you don’t get filtered for duplicate content.
However, Google recommends you do all of these things, and even if you do these things, there are no guarantees. This is the most intelligent way you can handle issues with non malicious duplicate content in regards to the search engines, however. Generally speaking, in cases like this, you won’t be penalized, because the search engines usually can differentiate- but, you’re better optimized and there is a faster crawl through rate if you do avoid these issues. When it comes to the malicious variety of duplicate content- that which IS intentionally misleading- they have another policy on it all together. The good thing here, is, once you have made the changes that the search engines request, you can appeal and get your site back in the search engine’s good graces, and once again trying to climb the ranks- but, having to do that is a big waste of time. It’s really just better to do it right the first time, so you can avoid this entirely.