Why Google Analytics is flawed

GOOGLE ANALYTICS – FLAWED AND MISLEADING?

As the resident SEO and SEM expert at WebSight Design, Inc., I am responsible for managing the search engine optimization efforts for all of our over 600 web sites. At WSD, our clients range from the smallest sole proprietors to Carlos Santana, Sammy Hagar, and the band Tool. The vast majority – over 90% of our clients, either can’t afford full-on SEO work, or don’t yet understand the importance of this aspect of business marketing strategy. As we continue to help educate our clients, at the very least, we provide basic SEO services – keyword research, proper “under-the-hood” and on-page content seeding, and more recently, providing sitemap.xml files (to help Google, Yahoo, and soon MSN to better index entire web sites).

One thing that more and more clients are asking for is Google Analytics. We previously provided WebTrends site activity log crunching and reporting. WebTrends, in our experience and in the experience of many other industry professionals, is too much of a drain on precious shared-hosting server resources. In addition, server based analytics solutions typically don’t filter out non-human site activity. This can include search engine spiders and web crawlers (bots)- automated software designed to scan web pages, either for legitimate or quite often illegal purposes (such as hackers looking for vulnerabilities that can be exploited.

So like many other web hosting providers, this past year we began setting up Google Analytics (GA) in many of our client’s sites. Their code resides right on web pages, thus dramatically reducing server drain. It’s free, which our small business clients appreciate, and it’s provided by one of the most well known brand names.

While less than ideal in it’s reporting and analysis methods (a topic I will discuss in an upcoming blog post), the user interface offers what has become a quite popular experience for many small business owners. Big charts and graphs, large fonts, and bright bold colors. Everyone loves all things Google. Well, I don’t anymore – read on to learn why.

______________________________________________________

For a few months I have been faced with a new challenge in the SEO arena – why Google Analytics is consistently showing dramatically less site traffic than WebTrends or my own web site visitor log analysis software. Upon my initial research, it was discovered that Google filters out spiders, bots and web crawlers from their statistics, in an effort to show only actual human visitor activity.

While this is a noble goal, it can not explain the vast difference I’ve seen consistently regardless of the size or scope of web site, or what the market focus of the site. For example, on one site, WebTrends was reporting over 100,000 unique visitors, while for the same period, GA showed just over 6,000!

Why am I so upset? Isn’t Google providing reliable information? Why would Google provide a flawed solution to millions of business owners?
______________________________________________________

EXPECTED AND TRUSTED BY ADVERTISERS

Some of our clients reach market segments that big name advertisers want to reach. So they sell ad space on their sites – typically big Flash banners. Those advertisers won’t bother placing ads on web sites that don’t garner what they consider to be a large enough viewership. They want to put their marketing and advertising money where it will do the most good. So they rely heavily upon marketing metrics (statistics that show how many unique page views a site garners, or how many unique visitors, or average time spent on a page are just three such metrics guages).

Because these major corporations spend big money, they need to find a way to get reliable, trustworthy information. It’s too easy for someone to generate phony, beefed up statistics, and there are hundreds of analytics tools out on the market.

Google has done such a great job at marketing their brand (have you seen their stock charts?) that they’re the darling of the advertising and public relations industry. Everybody wants to be the next Google, or they have Google on a solutions provider pedestal. Kind of like Microsoft was in the late 80′s and 90′s.

So here we have a situation where WebTrends was reporting ten, twenty, even 100 times as many visits to our client sites as compared to Google. If GA filters out non-human activity, then it’s easy to assume GA is going to be more trustworthy when the advertisers determine whether a site is worthy of their ad spending dollars.

______________________________________________________

RELIED UPON BY PROSPECTIVE INVESTORS

The other major area that GA can play a major role in affecting positively or negatively, is with prospective investors. As a web site garners more and more visitors, a time might come when the site owner seeks out investment money so they can grow or expand. Prospective investors rely on reliable and trustworthy market metrics to help in their “due diligence” process when deciding whether to put their money into the hands of web site owners.

For many of the same reasons as advertisers, and most especially because of how much money Google has generated in profits for early-on investors, a start-up’s prospective investors tend to view GA statistics as reliable and good enough for their decision making processes.
______________________________________________________

BUT WHAT IF GOOGLE’S NUMBERS ARE WRONG?
Because this issue has been bugging me for a few months, I’d already done some base-line research on where the discrepancies come from, however I’m a busy guy – there’s only so many hours in the day, and oddly enough, I have a life outside of my work. Until recently, I hadn’t had enough motive to figure out the nitty gritty details, and as a result, I’ve had my intuitive and preliminary beliefs, but hadn’t had a fact based professional opinion.

That changed recently when a client came to us in a panic – they recently switched to GA, right at a time when they are seeking major investor money. So how can they go to prospective investors and say that the numbers they were seeing through WebTrends were severely inflated, and that GA doesn’t justify investor money? IF GA were truly accurate, they would need to abandon investor opportunities. Their hopes and dreams, and all the hard work they put into their venture until now would either be lost forever, or at the very least, their business growth would be put on hold indefinitely.

This prompted me to really dig. As deep as I could. Casting as wide a net as possible to find what others have experienced, to learn more about how GA works, and once and for all, either confirm my suspicions and preliminary findings, or to back off and be in the position of needing to break the bad news to our clients. And if it turned out that I was right, I would need to be willing to find another solution – one that I would have the confidence to communicate to my clients that we were providing them a truly reliable and trustworthy analytics solution.
______________________________________________________

DOING THE HOMEWORK – AND WHAT I FOUND

The Google Analytics tracking code is a small snippet of code that is inserted into the body of an HTML page. When the HTML page is loaded, the tracking code contacts the Google Analytics server and logs a pageview for that page, as well as captures information about the visit and non-identifying information about the visitor.

If there is a disconnect between the site and Google when a page is loaded, that page view is not tracked.

If Google’s system is down, no visits during this time can be tracked. This happens from time to time. How often? Google claims it’s rare, and they also claim that even when their system is down, they’re still tracking – I don’t believe that’s 100% accurate at all.

If a site visitor has Javascript turned off, that page view is not tracked.

If you are in a different timezone than where the server is located, GA will report you visited the site at a different time, or even on a different day than you really did!

Google Analytics only records pageview requests with a status code of 200. If any of your site’s pages give a status code other than this, the page views won’t be recorded even though someone did view those pages.

Something the Google help files don’t discuss is on-page Javascript conflicts. I got hold of a live Google rep, and they confirmed that if there is other javascript on a page, a possibility exists that the code in that can conflict with Google Analytics – and an error in tracking can occur.

______________________________________________________

SO WHAT ABOUT SOLUTIONS LIKE NetRatings, Omniture, CoreMetrics?

Everyone has a solution. They all claim to be the best, the most accurate. If they do their statistics from code directly embedded on the site or from the server’s log files, the information is based on how well they filter out the noise of bots and spiders – yet none are perfect.

If they rely on their own pool of members – people who have agreed to put that company’s code on their computer so that the company can track their web surfing – then that company will create a baseline trending system – for example – if ten thousand of their members visited site X, the company will claim that 100,000 or 10,000,000 people around the globe visited that site during the same period.

How can they claim that? Well, they claim that their members are a “fair” or “reasonable” sampling of web users from enough different consumer groups, and that statistically it’s an accurate measure.

Personally, I say Bull! Too many sites are in too many niche-focuses – too many people surf the web for different reasons at different times… Having many years ago been a crime statistician, I know too well that such extrapolation is horse-hockey.

______________________________________________________

THE BOTTOM LINE – GOOD FOR TRENDING, BAD FOR FACT STATING

Okay – so Google Analytics isn’t perfect. No web visitor tracking solution is.

You can not say for a fact that the number of people supposedly visiting your site is really what they report.

But you can use it to get trends

If three months in a row Google or one of the other javascript or log file crunching solutions says 80% of your visitors go to a particular page or section of the site, that’s pretty much going to be accurate. If in that same three months nobody apparently went to that great FAQ page you spent fifty hours on, then maybe you don’t need to spend that extra 20 hours this month adding to the FAQ page.

In addition to having Google Analytics, it can’t hurt to have the site’s actual log files processed through a reporting solution that does a decent job at filtering out spiders and bots. There are many such solutions out on the market. I happen to use a program called WebLog Expert. The professional version filters out spiders and bots.

To what degree and how accurately Google or WebLog Expert or any analytics and site visitor tracking program filters out the noise is anyone’s guess. But if you use multiple solutions that each handles things in it’s own way, you can at least get a much better handle on general trends.

About Alan Bleiweiss

Just another guy. Who happens to have a lot of experience living, breathing and sleeping organic SEO. So that's my primary focus - high end SEO audits and consulting for sites ranging from thousands to tens of millions of pages. In my spare time I blog, rant, write eBooks, and speak at industry conferences.

Read more from

6 Comments

  1. Caleb says:

    Thanks Alan! I found this article extremely helpful and insightful.

  2. Daan says:

    Thanks for that, we’ve found inconsistencies in our statistics too; you’re article was very insightful and helped us better our service. Thanks!

  3. [...] Google Analytics is the way the information is sent to the server, as outlined in the blog entry “Google Analytics – Flawed and Misleading?”. One of the reasons for this inaccuracy is due to the use of javascript. If multiple scripts are [...]

  4. The concept that Google Analytics reports trends instead of 100% accurate numbers is important. Equally important is the fact that *all* major web analytics tools report trends. It’s important not to confuse them with hit counters.

    It’s worth clarifying a few points and correcting some incorrect assertions in this post, though:
    - Google Analytics requests a file called __utm.gif from Google’s servers. Appended to this file is a query string with all the relevant information about the visitor, information in their tracking cookies, the page they are on, where they came from, etc. Google Analytics only processes these gif requests in their log files. It will process for any gif requests with status codes of 2xx, 302 and 304. (This is how Urchin processes, and Google Analytics is the hosted version of Urchin.)
    - The time of a visit is based on the timestamp in the log file. It has nothing to do with where the visitor is located in relation to the server. It will report a visit based on the time zone that you specify in your Google Analytics profile.
    - If a visitor has first-party cookies or JavaScript disabled, they won’t be tracked.
    - All major web analytics tools *try* to track every single visitor (unless you specify not to), but because of the fluid nature of the Web, they end up with samples instead. These samples tend to be very high (say, 90%-95%). But the visit counts in Google Analytics are not artificially inflated based on an assumed sampling rate.
    - As to the point about JavaScript conflicts, I have never seen this happen. It could conceivably happen. But this is the risk with any analytics tool that uses any method other than IP-tracking to count unique visitors. IP-tracking analysis is so blatantly flawed that it’s not worth going into. That’s why all the major tools by default use some type of tagging.
    - Finally, the concern that Google’s servers going down will artificially deflate visit counts is absurd. The chances of Google’s servers going down (all of their redundant servers located around the world for millions of GA accounts), is next to zero. At any rate, the chances are much greater that a website owner’s own server will go down. In which case, it goes without saying that there will be no traffic recorded for that period. Google Analytics itself can go down periodically, but there won’t be any data loss unless the servers with the log files go down. I’ve never heard of that happening.

    Good post, but I think the concerns are exaggerated.

  5. Actual Metrics,

    Thank you for taking the time to provide your insights. When it comes specifically to Google Analytics, you’ve actually re-asserted a number of my points as to why GA or any tracking solution is not 100% accurate. That’s the whole point of my having written this blog post. The vast majority of my readers are small business owners who initially assume otherwise. So the concerns are not exaggerated since the intent was to help them understand that they should not expect 100% validity to numbers.

    As far as “Google going down”, I did mis-state in my article. I was referring to the Google Analytics system going down. Note however that the article was written in December of 2007. If you do the research, you’ll find that in 2007 there were a few back to back outages that lasted more than 24 hours.

    In that time-frame, I personally saw GA reports that showed zero activity on my company site which directly correlated to the outages. Like every day for an entire month, an average of 150 – 200 visits, then BAM – one day with 10 visits and the next with 40. Then the next, back up to 200…

    So no matter how high and mighty anyone might place Google on the “impossible that they would go down” pedestal, my experience in 2007 was otherwise. And not to be antagonistic about such a notion, I’ve been in the business for more than 14 years.

    Over the years, I’ve seen sites as big as eBay, AOL and others brought down for extended periods of time. Because that too, is the nature of the web. Heck, more than once, entire regions of the web have come to a grinding halt. Redundancy mitigates this for the most part, yet to claim that any single solution provider is invulnerable is really what’s exaggerated.

    At least that’s my opinion anyhow…

  6. Tim Rowe says:

    I have 1and1 which is reporting 1000 hits to my site a day and GA is reporting 8 and this led me to believe they are probably both flawed but I do not trust GA’s accuracy.

Leave a Reply

CommentLuv Enabled