For some reason, international websites love to produce a specific kind of duplicate content.
Take the cat.com case. Caterpillar is a large industrial S&P 500 company with a huge website catering to customers all around the world. They have in particular the following sub websites:
- http://www.cat.com/en_GB.html for customers in Europe speaking english (though the URL implies it’s specifically targeted to Great Britain)
- http://www.cat.com/en_US.html for customers in North America speaking english (though again, the URL implies US)
- http://www.cat.com/en_MX.html for customers in Latin America speaking english
Turns out, all three websites have almost identical content. Although there are few differences (an Investor page on the GB version which is not present on the MX version) most pages have the exact same text.
And no, exchanging two navigational items and changing the custom measurement from inch to meter doesn’t make a website have different content.
To make this explicit, here is a sketch of the web architecture:
Google Doesn’t Like Duplicate Content
If you search in google.com.mx (using English as language and a Mexican IP) you won’t be able to find the en_MX website on the first page.
Here is an excerpt from the search results.
Let’s just say, Google doesn’t consider the pages below cat.com/en_MX/ to be relevant to English speaking Mexicans.
And that is, from a human standpoint easy to understand.
There is lots of, probably more frequented, English information about the products out that is on other websites with identical text (the en_US version or the en_AU version).
Canonical Links Won’t Fix This
Canonical links, geo meta tags and so on are not there to fix a broken information architecture.
If the website is confusing for a human visitor, and this one is, unless one assumes the any visitor comes through the “front door”, then behavioral data will always display this messiness to Google.
Search bots will find themselves in the same confusion, and no matter how well thought out the canonical link structure is, the website will keep getting miss indexed and miss ranked.
Why Companies Produce Such Duplicate Content
For some reason, whenever companies built, relaunch, or remodel their international websites, the ones targeted to visitors from different countries speaking different languages, they very often try to make the same mistake.
I can usually identify one of the following four reasons behind that decision:
- The company has different product portfolios in different countries which have the same language (for instance Australia and the US)
- The company has different contact people in different countries (which have the same language or the company is only able to produce one language version for e.g. an english one for Africa and an english one for the US)
- The company has different business units or sub companies in different countries managing the distribution (and possibly the website).
- The company actually has very different content on the websites in different countries with the same language.
3. and 4. usually go hand in hand. If your company really has different content on the websites, then there is no duplicate content problem.
If you on the other hand only want to display the correct contact person to people (case 2.), then producing this kind of massive duplicate content won’t help. In fact, as you can see above, no one who Googles for caterpillar on google.com.mx will get the (right) Mexican contact person.
If your company has different product portfolios in different countries, than the case is a little bit trickier but still managable. See the next blogpost for a couple of good examples.
But the mantra is simple: If your company can’t handle targeted content for a specific country (i.e. unique content, not duplicate content) then it shouldn’t try to bother creating a separate website.