How Google wants you to optimize your site- Part II

In part one of this article, we looked at how Google feels about SEO, the proffered structure of URLs, dispelled the myth of TrustRank, and looked at how Google identifies paid links.

In part II of this article we will look closer at how Google handles duplicate content.

Google’s Susan Mosque, the webmaster trends analysis stated “Let’s put this to bed once and for all, folks: There’s no such thing as a “duplicate content penalty.” At least, not in the way most people mean when they say that.”  But what exactly does that mean?

Generally speaking, duplicate content refers to blocks of content that either exactly match other content or are very noticeably similar.  The two types of duplicate content are when the duplicate material is found within your own website and when the duplicate concept is found across multiple domains.

When the duplication happens inside your own domain it affects how Google filters content in different ways.  For example, if your website has a regular and a print version and none is blocked in robots.txt or with a “noindex” meta tag, Google will just pick one and eliminate the other.  E-commerce sites have a different situation. Because they are likely to have multiple store items shown and linked via multiple distinct URLs Google will group the duplicate URLs and display the one it considers best in the search match.  The downside to this is that you are likely to dilute your link popularity and that search results will likely display long URLs with tracking Ids which is bad for site branding.

When the duplication is happening across more than one domain name you have a dreaded issue for web developers known as “content scraping”.  Someone could be scraping your content to use as their own, web site proxies could index part of the site they access through the proxy.  Google claims this is not an issue to be concerned about although many experienced web publishers will disagree.  These publishers claim to have experienced situations where the “scraper” outranks the original, which seems like a penalty in and of itself.

Google makes claims that they can tell the difference between the original and the duplicate.  Sven Naumann from Google’s search quality team suggests that you “check if your content is still accessible to our crawlers.  You might unintentionally have blocked access to parts of your content in your robots.txt file.  You can look in your Sitemap file to see if you made changes for the particular content which has been scraped.” He also said to “check if your site is in line with our webmaster guidelines.”

According to Naumann, “To conclude, I’d like to point out that the majority of cases, having duplicate content does not have negative affects on your site’s preference in the Google index.  It simply gets filtered out.”

Last time I checked, being filtered out was indeed a penalty.   So in conclusion, avoid duplicate content whenever possible to avoid the ranking eating algorithms all together.