Why Google Analytics is flawed
GOOGLE ANALYTICS – FLAWED AND MISLEADING?
As the resident SEO and SEM expert at WebSight Design, Inc., I am responsible for managing the search engine optimization efforts for all of our over 600 web sites. At WSD, our clients range from the smallest sole proprietors to Carlos Santana, Sammy Hagar, and the band Tool. The vast majority – over 90% of our clients, either can’t afford full-on SEO work, or don’t yet understand the importance of this aspect of business marketing strategy. As we continue to help educate our clients, at the very least, we provide basic SEO services – keyword research, proper “under-the-hood” and on-page content seeding, and more recently, providing sitemap.xml files (to help Google, Yahoo, and soon MSN to better index entire web sites).
One thing that more and more clients are asking for is Google Analytics. We previously provided WebTrends site activity log crunching and reporting. WebTrends, in our experience and in the experience of many other industry professionals, is too much of a drain on precious shared-hosting server resources. In addition, server based analytics solutions typically don’t filter out non-human site activity. This can include search engine spiders and web crawlers (bots)- automated software designed to scan web pages, either for legitimate or quite often illegal purposes (such as hackers looking for vulnerabilities that can be exploited.
So like many other web hosting providers, this past year we began setting up Google Analytics (GA) in many of our client’s sites. Their code resides right on web pages, thus dramatically reducing server drain. It’s free, which our small business clients appreciate, and it’s provided by one of the most well known brand names.
While less than ideal in it’s reporting and analysis methods (a topic I will discuss in an upcoming blog post), the user interface offers what has become a quite popular experience for many small business owners. Big charts and graphs, large fonts, and bright bold colors. Everyone loves all things Google. Well, I don’t anymore – read on to learn why.
For a few months I have been faced with a new challenge in the SEO arena – why Google Analytics is consistently showing dramatically less site traffic than WebTrends or my own web site visitor log analysis software. Upon my initial research, it was discovered that Google filters out spiders, bots and web crawlers from their statistics, in an effort to show only actual human visitor activity.
While this is a noble goal, it can not explain the vast difference I’ve seen consistently regardless of the size or scope of web site, or what the market focus of the site. For example, on one site, WebTrends was reporting over 100,000 unique visitors, while for the same period, GA showed just over 6,000!
Why am I so upset? Isn’t Google providing reliable information? Why would Google provide a flawed solution to millions of business owners?
EXPECTED AND TRUSTED BY ADVERTISERS
Some of our clients reach market segments that big name advertisers want to reach. So they sell ad space on their sites – typically big Flash banners. Those advertisers won’t bother placing ads on web sites that don’t garner what they consider to be a large enough viewership. They want to put their marketing and advertising money where it will do the most good. So they rely heavily upon marketing metrics (statistics that show how many unique page views a site garners, or how many unique visitors, or average time spent on a page are just three such metrics guages).
Because these major corporations spend big money, they need to find a way to get reliable, trustworthy information. It’s too easy for someone to generate phony, beefed up statistics, and there are hundreds of analytics tools out on the market.
Google has done such a great job at marketing their brand (have you seen their stock charts?) that they’re the darling of the advertising and public relations industry. Everybody wants to be the next Google, or they have Google on a solutions provider pedestal. Kind of like Microsoft was in the late 80′s and 90′s.
So here we have a situation where WebTrends was reporting ten, twenty, even 100 times as many visits to our client sites as compared to Google. If GA filters out non-human activity, then it’s easy to assume GA is going to be more trustworthy when the advertisers determine whether a site is worthy of their ad spending dollars.
RELIED UPON BY PROSPECTIVE INVESTORS
The other major area that GA can play a major role in affecting positively or negatively, is with prospective investors. As a web site garners more and more visitors, a time might come when the site owner seeks out investment money so they can grow or expand. Prospective investors rely on reliable and trustworthy market metrics to help in their “due diligence” process when deciding whether to put their money into the hands of web site owners.
For many of the same reasons as advertisers, and most especially because of how much money Google has generated in profits for early-on investors, a start-up’s prospective investors tend to view GA statistics as reliable and good enough for their decision making processes.
BUT WHAT IF GOOGLE’S NUMBERS ARE WRONG?
Because this issue has been bugging me for a few months, I’d already done some base-line research on where the discrepancies come from, however I’m a busy guy – there’s only so many hours in the day, and oddly enough, I have a life outside of my work. Until recently, I hadn’t had enough motive to figure out the nitty gritty details, and as a result, I’ve had my intuitive and preliminary beliefs, but hadn’t had a fact based professional opinion.
That changed recently when a client came to us in a panic – they recently switched to GA, right at a time when they are seeking major investor money. So how can they go to prospective investors and say that the numbers they were seeing through WebTrends were severely inflated, and that GA doesn’t justify investor money? IF GA were truly accurate, they would need to abandon investor opportunities. Their hopes and dreams, and all the hard work they put into their venture until now would either be lost forever, or at the very least, their business growth would be put on hold indefinitely.
This prompted me to really dig. As deep as I could. Casting as wide a net as possible to find what others have experienced, to learn more about how GA works, and once and for all, either confirm my suspicions and preliminary findings, or to back off and be in the position of needing to break the bad news to our clients. And if it turned out that I was right, I would need to be willing to find another solution – one that I would have the confidence to communicate to my clients that we were providing them a truly reliable and trustworthy analytics solution.
DOING THE HOMEWORK – AND WHAT I FOUND
The Google Analytics tracking code is a small snippet of code that is inserted into the body of an HTML page. When the HTML page is loaded, the tracking code contacts the Google Analytics server and logs a pageview for that page, as well as captures information about the visit and non-identifying information about the visitor.
If there is a disconnect between the site and Google when a page is loaded, that page view is not tracked.
If Google’s system is down, no visits during this time can be tracked. This happens from time to time. How often? Google claims it’s rare, and they also claim that even when their system is down, they’re still tracking – I don’t believe that’s 100% accurate at all.
If you are in a different timezone than where the server is located, GA will report you visited the site at a different time, or even on a different day than you really did!
Google Analytics only records pageview requests with a status code of 200. If any of your site’s pages give a status code other than this, the page views won’t be recorded even though someone did view those pages.
SO WHAT ABOUT SOLUTIONS LIKE NetRatings, Omniture, CoreMetrics?
Everyone has a solution. They all claim to be the best, the most accurate. If they do their statistics from code directly embedded on the site or from the server’s log files, the information is based on how well they filter out the noise of bots and spiders – yet none are perfect.
If they rely on their own pool of members – people who have agreed to put that company’s code on their computer so that the company can track their web surfing – then that company will create a baseline trending system – for example – if ten thousand of their members visited site X, the company will claim that 100,000 or 10,000,000 people around the globe visited that site during the same period.
How can they claim that? Well, they claim that their members are a “fair” or “reasonable” sampling of web users from enough different consumer groups, and that statistically it’s an accurate measure.
Personally, I say Bull! Too many sites are in too many niche-focuses – too many people surf the web for different reasons at different times… Having many years ago been a crime statistician, I know too well that such extrapolation is horse-hockey.
THE BOTTOM LINE – GOOD FOR TRENDING, BAD FOR FACT STATING
Okay – so Google Analytics isn’t perfect. No web visitor tracking solution is.
You can not say for a fact that the number of people supposedly visiting your site is really what they report.
But you can use it to get trends
In addition to having Google Analytics, it can’t hurt to have the site’s actual log files processed through a reporting solution that does a decent job at filtering out spiders and bots. There are many such solutions out on the market. I happen to use a program called WebLog Expert. The professional version filters out spiders and bots.
To what degree and how accurately Google or WebLog Expert or any analytics and site visitor tracking program filters out the noise is anyone’s guess. But if you use multiple solutions that each handles things in it’s own way, you can at least get a much better handle on general trends.