Posted on

After I went through the list of ways to stop spam in my last article, I decided to try to implement something even better.

So in this post we will tackle the problem a little differently. We will stop the spam as early as possible through three different techniques.

The first method is enough to stop all spam to my account currently.

I’ll implement things with the Google Tag Manager, but any other TMS (Tealium or Adobe) will work just as well.

The three methods are:

  1. A custom dimension to stop spammers to send hits to random Google Analytics accounts.
  2. A hostname filter which gets rid of fake hostname spam. This should always be implemented.
  3. An IP filter to exclude malicious URLs.

Method 1:  Custom Dimension with Filter

As you may know, Google Analytics spammers don’t really have to program a bot to visit your site. Instead they can just scrape your ID off your site or simply randomly walk through IDs. In that case, the bot won’t execute the complete tracking code on your website.

The solution to this kind of spam is simple, you set up a custom dimension to mark hits sent via actual visitors. They will execute the complete tracking code and thus also set the custom dimension in their hits.

The specifics are simple.

Step 1 (in Google Analytics)

Create a custom dimension called “trackedviaGTM” and activate it.

Step 2 (in Google Tag Manager)

Create a variable to hold your key which you will filter for in the custom dimension. I’d recommend this to

  1. Make it easy and possible to change the value in case spammers adapt to this (and scrape the right value off something somehow).
  2. You have a place “References to this Variable” to check whether all your tags are using the custom dimension.

Step 3 (in Google Tag Manager)

Change your tags that send hits to Google Analytics to send the custom dimension with your value as well. To do this in your tags of type “Universal Analytics” (I assume) you set

“Custom Dimensions” > Index “1” (in my case, or whatever your index is) to “{{specialKey}}””

Step 4 (in Google Analytics)

In Google Analytics you can now create a view with a new filter, set to “Custom”, “Include” “Filter Field = trackedviaGTM” and set the Filter Pattern to your secret value.

Method 2: Hostname Filter

Hostname filters also work in the TMS. With hostname filters you will get rid of any kind of fake hostnames. Those include in particular the last wave of “Vote for Trump” spam.

The hostname is the domain the data is sent from. That should usually be your website and only that. If you use the tracking code on a micro site, this is also a possible hostname. But nothing else should be on that list.

If you’re unsure whether you get that kind of spam, go to Analytics to the hostname report to find out (just select the dimension “hostname” somewhere). But regardless of whether you get this kind of spam, you should always implement a hostname filter.

And it’s dead simple, we simply always use a lookup table variable to save our tracking ID, and put some kind of default value in. That way, all data which has bad hostnames is simply not sent. Even if you just have one ID and one website.

In Google Tag Manager that looks like this:

Don’t forget to deal with whether you use www.domain.com or domain.com. If your site is accessible through both urls, you should input both into the table, or preferably, rewrite them. Also the table is case sensitive, so don’t try to uppercase your website.

Method 3: IP Filters

What kinds of spam do you get rid of with this method? Any bot that visits your website and is on some kind of blacklist. Of course we don’t get rid of botnets this way.

https://www.simoahava.com/analytics/block-internal-traffic-gtm/ wrote an explanation on how to exclude internal IPs. But we can simply use that guide to exclude spam IPs.

Is There an Automated Way to do this?

If you come across one, tell me. What I’ve found are services like https://referrerspamblocker.com/.The caveat, they work on Google Analytics level, not the data layer and they only block referrer spam.

In my opinion, just as with Akismet for WordPress, the cloud approach is just the right way to handle such things. Indeed I think the best place to rid data of spam is at the data layer level. Thus Tag Management systems like Google Tag Manager (unlikely…) Tealium or Adobe should get their hands dirty or integrate other apps to do this.

Leave a Reply