How to clean up spam traffic from your website analytics

Have you noticed strange traffic patterns in your analytics reporting? Massive spikes in traffic that don’t relate to any of your marketing activity or are you sitting back wondering why what appear to be thousands of visitors aren’t translating to new business?

It could well be that all isn’t as it seems and that you’re suffering from ‘Referral Spam’.

Can of spam

Referral Spam is effectively ‘ghost traffic’: rather than real site visitors, it’s traffic generated by computers, and in some cases never even touches your website server.

Whilst it’s normally fairly harmless, it can prove a massive annoyance for people looking to get meaningful information from their website analytics, and mislead website owners into thinking they’re getting more traffic than they really are.

Referral Spam has become more and more prevalent over the last year or so to the point that most websites will be affected by it in some way.

Read on to find out what it is, how to spot it, and more importantly – how to correctly strip it out of your analytics reporting.

What is Referral Spam

Referral spam is the footprint left by web ‘bots’ (computer programs) that scan the web looking for targets to attack. These bots scan thousands of websites, sending out a fake request to the website’s server for a page. Each web server records the details of these requests in their access logs.

Alternatively, other bots just target your Google Analytics ID, leaving behind a fake set of data about traffic that’s visited your site.

If you look in Google Analytics for your website, you may well be able to see some of these requests (Look in the report at Acquisition > All Traffic > Referrals).
Referral Sources Report showing Referral Spam

In the column of referrals sources, you may see websites that you’d expect to send traffic to you, but you’re also likely to see names like:

    • semalt.semalt .com
    • buttons-for-website .com
    • anticrawler.org

These are referral spammers.

Why do they do it?

Referral Spam tends to be happen because either:

1. Spammers want to promote a website and are trying to get you to visit the site or use a search engine to search for it.

They’re hoping that you will notice these strange URLs in your analytics or weblogs and visit the pages to see what they are about. This could just be because they’re looking to drive traffic to pages selling something, or sometimes these pages contain malware or viruses, and their goal is to infect as many people as possible.

2. They want to boost their rank on Google search engine results pages.

Some websites’ access logs are publicly available via a webpage so, when spammers make their fake requests to websites, they are hoping that this will be publicly recorded along with their URL. When search engines later scan these pages, they’ll see these links to the spam site which will potentially improve the spammer’s SEO rankings.

Can I stop Referral Spam?

The short answer is not really.

It’s nigh on impossible to block all the ghost traffic to your website without being sure you’re not affecting real users, BUT  you can make sure that it doesn’t affect your reporting.

There’s lots of advice as to how you can exclude spam traffic from your analytics reporting, however some of it (such as filtering out individual sites) is like playing whack-a-mole, and some (such as using the referral exclusion setting*) is just wrong.

The most effective approach is to create a data view that filters out all traffic that doesn’t physically visit your website – and it’s fairly simple to set up.

Filtering Referral Spam from Analytics

We’ll show you the steps you need to follow in Google Analytics as that’s the most common system used out there, but similar approaches can be followed on most other analytics platforms

We’ll be creating a filter on your  analytics data.

You can apply a filter to your main view in Google Analytics, but if you do that you’re permanently excluding this information from your reports, which means you’ve no way of seeing what you’re filtering out.

When creating filters, we always suggest using a new ‘view’ of your data. That way, if something goes wrong,  you’ve still got a complete set of data that you can compare back to.

So, first of all, let’s create a new view…

Log into Google Analytics and click on Admin in the top navigation. This will take you to the Administration screen where information on your Account, Properties and Views is displayed.

Make sure the correct website property is selected and then click on the ‘view’ drop down:

You’ll then see an option to ‘Create new view’

Google Analytics Screenshot - Create New View

You’ll then be asked to enter a ‘Reporting View Name’ and set the ‘Time Zone’. We’re going to call this view “Spam Free” and set the Time Zone for “United Kingdom”, “London” time, and then click the ‘Create View’ button.

Google Analytics Screenshot - New View Settings

You’ll then be taken back to the Admin page with your new view selected.

Google Analytics Screenshot - Admin Menu

First off,  click on ‘View Settings’. That will take you to a screen where you can set your default currency and enable ‘Bot Filtering’. Whilst this  function (Excluding all hits from known bots and spiders) sounds great, the reality is that it’s not all that effective – however, every little helps!

Google Analytics Screenshot - View Settings

Next, click on the ‘Filters’ tab on the left navigation. Here you’ll see a red button with an option to ‘Add Filter’.

Google Analytics Screenshot - Add Filter

Clicking on the Add Filter button brings up a series of options

Google Analytics Screenshot - Filter Options

To set up the correct filter, you’ll need to enter a name – we’ve called this one Exclude Referral Spam.

Then, leaving the ‘Filter Type as ‘Predefined’ select the following from the three drop down boxes:

‘Include only’  ‘traffic to the hostname’  ‘that contains’

Finally in the text field, type the main part of your web domain. It’s very important you don’t include the http or the www.

This instruction tells Google Analytics to only record traffic to your website when it emanates from your server name.

Google Analytics Screenshot - Filter Settings for Hostname Filter

Finally click the blue ‘Save’ button and you’re done.

This filter should now exclude the majority of spam from your analytics view – you just need to remember to select the ‘Spam Free’ view when you log on.

If you’ve set up Goals or Ecommerce Tracking on any other view, you’ll need to copy those onto the ‘Spam Free’ view if you want to be able to use them in your reports.

Finally, also remember that Google Analytics settings only affect reports from the date they are set up onward, so they can’t and won’t clean up historical data.

Let us know how you get on, and also if you know of any other ways to improve handling of referral spam in analytics reporting.

*Although the name sounds appropriate, referral exclusions are designed to ensure traffic being passed between a company’s domains is reported correctly – using this to tackle referral spam will make any data problems worse.

So what next?

Looking for help with your website? Get in touch with us here.

Or you can share your thoughts below - we'd love to hear what you think.