The Google Analytics (GA) is probably the most well known web analytics tool. With GA and its Tag Manager companion it's possible to build web analytics for free without having to know much, if any, coding.

The challenge with this straightforward low code approach to web analytics is that you will be blind to proportion of the traffic that is blocked by ad blockers. Ad blockers are becoming more popular with some estimates saying that even 47% of users globally use an ad blocker. Whatever the portion is among your users, it means that you won't observe that portion of the traffic on your site with GA.

This is not really a GA specific problem though. Any low code analytics solution where you inject a script to your web site suffers from the same problem once they become popular enough for ad blocker vendors to notice them. The only sure way to avoid this is to implement reporting programmatically. For instance Google offers an API called measurement protocol to achieve this. The downside of this approach is that it requires coding which complicates adoption. Google has plans to mitigate this problem by using machine learning, but before we go into that, let's look at some real figures from a SaaS project where I implemented a measurement protocol integration to GA alongside with the standard low code integration.

What is the ad blocker impact on GA figures

In this section I compare some figures from a standard script based Google Analytics integration to one implemented with the GA measurement protocol in a SaaS product. I had both of these approaches enabled for the application for a duration of one month to see the difference in traffic figures.

Sessions is a standard figure in GA providing the number of distinct visits to your site by new or returning visitors. The standard GA integration reported 58% less sessions compared to the measurement protocol version. This number seems to be in line with estimates of proportion of users using ad blockers. In this case it seems that roughly 40 % of all the actual sessions were missed by the standard GA integration. It should be noted, that this number is slightly affected by how the sessions are implemented with the measurement protocol, but the effect shouldn't be large. Also some sites may also stop sending the data in to GA if you decline their cookie consent requests.

Another standard figure in GA are users. It differs from sessions in that it aims to estimate the number of distinct users of your site (so visits from the same user are counted only once). The standard integration estimated the number of users to be 84% larger compared to the measurement protocol version, which was configured to use the actual SaaS user ids. This sounds like a very large difference. What it means, however, is just that the same user visiting the site multiple times was seen as multiple users. The reason for this is that GA cannot really exactly know that I'm the same user if I visit the site first with my mobile phone and then later on with a computer. I may use multiple different devices. I may clear my cookies in between. There are many reasons that cause tracking cookie and ip-addresses to be different for the same user. And because they are different the same user then looks like many different users to GA and the estimate gets inflated.

I think that in our case the number was more inflated than normally due to active development and testing activities occurring at the site during that time. Nevertheless you can expect that your actual number of users is smaller compared to what is reported by standard GA integration if you are not providing the user id.

Is this a problem?

It depends. If you are already receiving a lot of traffic you will probably be able to see the important trends from the users that don't use ad blockers. Depending on your industry, the segment using ad blockers may behave differently to those who don't. I however think that the issue is more severe if you are just starting. When you are starting its crucial to understand how your site is being used to know how to improve it. And when your total number of visitors is still small, it hurts to lose half of that already little usage data. In those cases it may pay off to implement the integration programmatically to ensure you have all the data available. With a programmatic approach you can easily choose other analytics tools besides GA as well.

Google is well aware of this problem too. Their newly released version of Google Analytics, called Google Analytics 4, aims to leverage machine learning to fill in the gaps in the data. How good of a solution this is remains to be seen. Google is world top in machine learning and can most probably improve substantially upon the current state. I personally would still be a bit conscious about basing business decisions on data that was partly modeled especially if you are just starting and don't yet have much evidence.