What do I mean by matching URLs? Suppose multiple people on your team work on your websites (run by WordPress). One product page might have the URL “www.yourcompany.com/amazing-product/“. Sooner or later, people will start, by mistake, to create a bunch of variations of this URL. For instance, you will see links from other pages to:
- https://www.yourcompany.com/amazing-product/ (SSL Problem)
- http://www.yourcompany.com/amazing-product (Trailing slash Problem)
- http://www.yourcompany.com/Amazing-product/ (Capitalisation Problem).
- We have no control of whether Google really does always pick what we see as the right version of the web page tobe displayed in the SERPS.
- In Google Analytics pages might appear as multiple pages, thus it’s harder for us to analyze even the total traffic to one page.
- In other Tools, like in Google Data Studio, you will have problems if you’re trying, for instance, to filter for certain page categories.You will always have to be on the look out for potential problems (like for instance that the default ReGex in Google is case sensitive)
To reduce the complexity you have to fix a set of rules to match those URLs together.Then you have to properly redirect them, following the three best practices for redirects.
Rule 1, Don’t capitalize anything in your URL.
Example: You have a product called Amazing Big-O Product. the URL to a landing page is “company.com/amazing-big-O-product/“, you might argue the big “O” is important for branding purposes.
I argue, you simply shouldn’t do that. Actually, I’ve written a complete blog post on why this is a bad idea.
Rule 2, Deal with your SSL version.
By all odds, you already have an SSL version. Google likes SSL and will soon start to mark everything else as “not safe”.However, if your non-SSL website version is still available, it’s a good idea to 301 it to the SSL version.
Rule 3, Decide on a trailing slash and then force it.
Do you want a trailing slash? Do you want it only for sites which appear as “directories”? Don’t you want it at all?
If I try to visit https://www.ge.com/investor-relations/overview/ I get redirected (301) to https://www.ge.com/investor-relations/overview. That’s a basic forcing of the correct way to use the trailing slashes. Even if someone on the team puts up a link with trailing slash in the newsletter, it still will get redirected and never cause any trouble in Google Analytics or other tools.
Best Practice 1, Redirect permanently with 301.
If you want to tell Google that https://www.ge.com/investor-relations/overview is the permanent and right version of a web page (and the only one), do so by using a 301 redirect for everything else. Don’t try to use a 302 or any other redirect.
Best Practice 2, Don’t (accidentally) remove your parameters.
If you do redirect, make sure you don’t loose any parameters. If you write your rules, make sure https://www.ge.com/investor-relations/overview/?utm=… redirects to https://www.ge.com/investor-relations/overview?utm=…
Best Practice 3,Implement a proper 404 page.
What if things go wrong? You have to have a proper 404 page. By that I mean two things.
One, it actually has to serve a 404 error code. Otherwise, you won’t be able to tell the amount of errors your visitors really get. If they do serve a 404 code, you can use any SEO site crawler to check the dead links.
Two, it has to be useful.Please don’t just write “Error 404” on a plain page without even the main navigation, trust me I’ve seen this numerable times.
Example: Check out http://www.shell.com/arg (nonexistent URL). It forces to http://www.shell.com/arg.html since they always force non-trailing slashes to end on .html and serves the Error page.It’s populated with popular topics, the main navigation and mentions the search function.
Btw. obviously forcing to http://www.shell.com/arg.html and marking it as the error page is a little bit weird. This, of course, will produce many variants of the 404 page – duplicate content. But it will enable them to spot dead links right from Google Analytics.