On March 14, 2010, Eric Enge of Stone Temple Consulting interviewed Matt Cutts of Google and uncovered some interesting revelations and confirmations about duplicate content and many other issues. Here are a few highlights from the interview, but please take time to read the entire article for yourself.
Duplicate Content Pages Might Not Get Crawled
“Imagine we crawl three pages from a site, and then we discover that the two other pages were duplicates of the third page. We’ll drop two out of the three pages and keep only one, and that’s why it looks like [your site] has less good content. So we might tend to not crawl quite as much from that site.”
…”Typically, duplicate content is not the largest factor on how many pages will be crawled, but it can be a factor.”
In other words, duplicate content on the same domain can result in Google not crawling as many pages from your site. According to Matt, you have a certain “crawl budget” – and allotment of pages they are willing to crawl within your domain. Having the same content on multiple pages of your website means Google will likely crawl less pages of your site.
Do Affiliate Links Pointing to Your Site Create Duplicate Content?
A Stronger Connection Means More Conversions
With this easy target audience discovery worksheet & video, you’ll have a handy tool that helps all your copy & marketing efforts hit the mark!
I understand that I will also receive weekly articles & videos plus periodic discounts, product notices & more. I can unsubscribe at any time.
“Duplicate content can happen. If you are operating something like a co-brand, where the only difference in the pages is a logo, then that’s the sort of thing that users look at as essentially the same page. Search engines are typically pretty good about trying to merge those sorts of things together, but other scenarios certainly can cause duplicate content issues.”
What About Ecommerce Product Pages that are Almost Identical?
Matt says the canonical tag is one answer. “There are a couple of things to remember here. If you can reduce your duplicate content using site architecture, that’s preferable. The pages you combine don’t have to be complete duplicates, but they really should be conceptual duplicates of the same product, or things that are closely related. People can now do cross-domain rel=canonical, which we announced last December.”
.
I must admit to being a little scared of the google spider (what female LIKES spiders?). I diligently ensure that my articles are not duplicated in any way. This is hard going at times and I have to admit to spinning a few articles in order to utilise backlinking etc. How does Google go with spun articles? I haven’t had a problem so far so assume they are ok so long as they are done properly.
Hi Jay. Spinning articles probably (and I haven’t done any research on it) doesn’t have much of an impact on Google. I don’t know that their algo is sophisticated enough to be able to detect spun articles. However, whether or not Google can or can’t isn’t really the issue. Why would you want to serve your site visitors spun content? If someone searches for your name in order to find articles by you and all they get is a list of the same material re-written 10 different ways they’ll immediately lose respect and interest. The damage of spinning content far outweighs the benefits, imo.
I don’t think you meant what you wrote. It is not about duplicate [different] content on the same URL . It is about duplicate [same] content on different URL. When we have different content on the same URL, it might be cloaking. I believe that Google indexes one content per URL. If you have different content for the same URL, then Google sees something different and this is, by definition, cloaking. I came across your site because I wanted to know for sure if really Google indexes one page per URL. For example, will Google index different pages for the same URL with an IP delivery adapted to the country ?
You are correct, Dominic. I should have said domain, not URL. I’ll go back and fix that. I’m not sure I understand your question about page vs. URL.
Sorry for the late reply. I am reading again your article and it flows well. Perhaps you changed ‘URL’ for ‘domain’ somewhere and it fixes the issue. The point was that, if you wrote “duplicate content for a given URL”, then I could have understood “different content for a given URL.” This is a completely different situation. Still, nowadays it happens all the time, say using IP delivery. In this completely different context, my question is the following. Could it be that when Google detects more than once the same URL, say with different IP (or other criteria), that it indexes this URL more than once, say in different databases.