Don’t Let Your Google Analytics Data Get Polluted – 3 Things To Watch

Aug 20, 2015
Guest Post

When I take a look at data from a website for the first time, it’s with a specific goal in mind.
Usually that goal is improving conversions: increasing the number of people who buy a product, make a phone call, or leave their contact information.

The thing I look for most is anomalies.. Did they have a traffic spike after implementing an advertising campaign? Are all of their sales on a Sunday? Maybe (like Hasselhoff) they’re huge in Germany. All of these things
are important to know as they guide me in deciding how and where to start optimizing.

I have to be very careful though, because sometimes anomalies can be due to things that aren’t interesting. This can oftentimes be filtered out with common sense (it’s pretty unlikely that my local Seattle area plumber is attracting a ton of legitimate traffic from Russia), other times it requires digging a little deeper into the actual data.

Why is this important? Using Google Analytics is like using a telescope to try to see the Martian
Landscape, if you point it in the right direction and focus it properly you’ll be able to see hills and
valleys, rivers and ocean beds.

Analytics Data is like a landscape

It’s amazing what you can see when everything is put together properly!

The first thing you have to do is point the telescope at Mars (direct GA to your website) and then focus it properly (implement the tracking code). But that isn’t

You also have to make sure that your lens is clean, and there isn’t a bat flying between you and the sky. Using fuzzy data to make marketing and business decisions is like trying to plan a rover journey on Mars with a dirty telescope. A waste of time at best, and with the potential to actually be harmful!

Let’s take a look at some of the more common issues in Google Analytics.

Data Spikes

(Things in front of your telescope)


If we see something unexpected in our analytics data, the first thing we want to do is make sure it actually is something we’re interested in. We want to see the martian landscape, not a bat flying by, or worse yet some kids deliberately throwing a paper airplane in front of you.

Obscuring Data

I don’t think that shadow is actually part of the landscape…

The Problem

We’ll start with benign bots and spiders. What are bots or spiders? Basically these are computer
programs that automatically crawl your page. Search engines use these to index your site. There
are a lot of other reasons why these programs do that as well but I won’t go into them here. Bots
may or may not influence your analytics data, depending on if they run JavaScript codes;
however they will register as a hit on your server (so if you’re comparing server logs to Google
Analytics reports this will account for some discrepancy).

The Solution

There is a very simple way to deal with this traffic built right into Google Analytics. In the
Reporting View Settings page, simply check the box labeled “Bot Filtering”. (As demonstrated in the picture below). Will this filter out all bots? Of course not, but it filters them out based upon
the IAB/ABC International Spiders & Bots List, which should get the majority of them, and the list
is updated monthly.



Those Pesky Kids

Now lets learn to deal with those kids throwing paper airplanes in front of your telescope. This is
known as referrer spam and comes in mostly two types, Ghost and Crawler.

The Problem – Ghosts

Ghost referrals are called that because they never actually visit your site, instead they merely show up in your Google Analytics data by sending data directly to the Google Analytics Servers. They cannot be blocked by blocking access to your site because they never actually visit it.

The Solution

You can block Ghost Spam from interfering with your Google Analytics data by setting up filters to block invalid hostnames. Ghost spam uses fake hostnames, because usually they aren’t even aware of what site they’re spamming! We can take advantage of this and simply set up a filter that only includes only hostnames we actually want to see. How to do this is described in more details here or here.


This is a filter I set up to block Ghost Spam on our property. Easy peasy.

The Problem – Crawlers

Crawler spam operates more like traditional search engine bots. One of the main differences is that they’re “rude”. They ignore the site rules you post in your robots.txt file and just crawl over your entire site. These bots are not only interacting with your Google Analytics data, but they’re
also costing you money as they’re registering as hits on your server.

The Solution

There are two common ways to deal with this problem. One way is to make a filter very similar to the filter you made to block Ghost Spam, detailed here. This method has the advantage of not requiring any coding or uploading anything to your website, but it doesn’t stop the bots from hitting your servers and you have to manually update the filter as you find new bots. The second method is slightly more technical. Basically what you do is set up a trap for spambots by making a link to a part of your website and then telling all bots that the link is forbidden in your robots.txt. Good bots will obey your robots.txt, bad bots will ignore it. Once they’ve registered a hit on the forbidden page they are IP banned and can no longer access your site. Instructions and downloads are available here.

Capture spam bots with traps

It’s no danger at all if you just follow the signs and walk around

Not Seeing Data

(Dirt on your lens)


The Problem – GIGO

Google Analytics, like all computer programs, operates on the basis of GIGO, Garbage In Garbage Out. If you change something in your site, are trying to track something new, or if this is the first time you set up Google Analytics it’s very possible that you messed up somewhere. As an example, one of our clients redesigned their entire website (including their purchase funnel) and we noticed in GA that suddenly they were having zero conversions! We went through everything and found that it was because the goals were pointing to a purchase confirmation page that no longer existed. Once we figured that out it was very simple to make the switch to start tracking conversions again accurately.

The Solution

There is no silver bullet to fix this problem. That being said there are some steps you can take to try to minimise this as much as possible. First and foremost, make sure that you’re implementing Google Analytics properly. (See our guide on how to do this here). After this, just make sure you’re constantly doing sanity checks on your data, and if something doesn’t make sense, dig in a little and see what that is. If you just made a change to your website, make sure all the relevant URLs are up to date.

The Problem – Filters

Filters are a very powerful tool in Google Analytics. They let you see exactly the data you’re looking for, and exclude extraneous information. Of course like any powerful tool, the potential to injure yourself is ever present. Filters can behave unpredictably (or at least in ways we didn’t predict, computer programs are never actually unpredictable).

The Solution

Always, always, always leave one view unfiltered. This view will be your “reset button” if you mess up implementing a filter. And when you make a new filter, let it run for at least a week before you start analysing the data from it. That will allow you some time to analyse any kinks that need to be worked out.

The Problem – Size

Google Analytics only records 10 million server hits in a single month, and one visit can generate multiple hits. Not only that, but when you have more than 500,000 sessions, Google Analytics starts to sample your data.

The Solution

If you find that you’re hitting this wall, consider upgrading to Google Analytics Premium.

Misinterpreting Data

(Seeing UFO’s)

UFO Cloud

An alien ship that has travelled hundreds of light years? Or a cloud?

The final thing to keep in mind, is that analysis is only useful if you have a goal in mind. Anyone
can look at data and analyse it until the sun comes up, but if they aren’t analysing it with a specific
purpose in mind then they are just wasting your time as well as their own! That’s why it’s very
important to fully implement Google Analytics with Events, Goals and, if you’re an Ecommerce
site, Ecommerce tracking and Advanced Ecommerce Reports. Some of these (especially that last
one) take a little bit of know how and time to set up, but if you’re not tracking them you’re missing
out on valuable data.

Another important thing to do is make sure that all your Events and Goals have values attached to
them. Unless your site is purely a vanity project, you have some sort of bottom line; you should
be able to figure out exactly how much each view of a page is worth to you. Calculating ROI and
making sure you’re not wasting your money will be impossible otherwise.

A Cool Trick!

One more cool trick that’s often overlooked. In Google Analytics it’s possible to track your users across different devices! This feature can really give you a whole new perspective on your returning visitors and it’s really simple to set up. Here is a guide to setting it up on your webpage. Give it a shot and see how people interact with your site across all of their devices!

Take It Away!

Keeping all of that in mind when you’re looking at your data on Google Analytics should help you see a much clearer picture. And if any of that sounds a bit overwhelming, don’t worry! You can get in touch with us and we’ll help you get crystal clear images of Mars!

Don’t Let Your Google Analytics Data Get Polluted – 3 Things To Watch 5.00/5 (100.00%) 3 votes