Tuesday, November 12, 2013

Google Analytics Demographic Info Now Available, But Do We Care?

So, in case you haven't navigated to your 'Audience' report section in Google Analytics recently, I'm here to let you know that they have added a new 'Demographics' tab, as well as an 'Interests' tab.  You have seen the word "demographics" off the the left before, but it used to just have information like geo location in it, while now it contains true demo information, like age and gender, with some caveats.



Caveat 1:  It doesn't work by default.  You have to go in and tweak your javascript to allow your GA account to utilize DoubleClick display data, and the other way around, by making a slight change to the protocol information the tag is checking.

Caveat 1a:  This only works with ga.js tracking, so it won't work with that new Universal Analytics tracking profile you have been working so hard on

Caveat 1b:  You are supposed to update your site's privacy policy since this type of display tracking is used for remarketing and is extra cookie data, but I am not sure how many people are doing that.  Often, if you have done any remarketing or GDN advertising in the past, your policy might already be ok.

Caveat 2:  This data is likely to only be sort of accurate, even in aggregate.  The issue is in how they are determining the age/gender of your visitors ('Interests,' though more difficult to pin down in real life, is, paradoxically, more likely to be accurate in browser data), which is basically by taking a hodge-podge or "whatever we've got" approach.

Basically, to figure out information on the particular "user" (in this case anyone who shares the same browser on a device), Google is using one of the following: entered data in a Google profile (if they are signed in to Google properties), data pulled from an associated social media platform that they log into while in the same browser that logged the site visit, or demographic data based on visits to other sites (most common).



That last one is the one that we want to look at a little bit.  What is happening here, is that Google is using aggregated demographic estimates of other sites to assume demographic of a single individual for your site.  This is really bad for the reason that I will illustrate below.

Let's say, that based on comScore information (which is already panel-based and aggregated, so to be taken with a grain of salt), Google knows that ESPN.com is mainly viewed by men, say, 70% or more (made up).

Now let's pretend that a lady who loves the Red Sox goes to ESPN to read up on the latest rumors regarding Mike Napoli, then comes to my site, where I just implemented the new demographic tracking in GA.  Analytics looks at her browsing history during the session, and labels her a man.

That's not an aggregate, that is a single user, marked incorrectly, which will be combined with many others to later form an aggregate.  For visits, which you probably have thousands of, it's not the end of the world.  But what if she purchases something?  That sample size is smaller, so each error (which is based on assumptions now by several different sites and data aggregation sources) is compounded.

Sometimes I look stuff up for my girlfriend on my devices.  God knows, Netflix has no idea what I like, since I let her use my account.  Sometimes we both use the same device for things.  Sometimes, other sites don't have a good grasp on their user demographics, and those assumptions underlie everything GA is using to determine your users.  See how many places this can go wrong?

So basically, I wouldn't use this demographic data to make too many actual decisions about my site's users, at least in terms of age or gender.  Just don't do it.

There is good news though, and it is the 'Interest' category.  Now, while you will still have issues regarding multiple users on the same browser/device, you eliminate all of the (often wrong) assumptions about the kinds of people that consume kinds of content (which can't help but fall prey to bias around sexism, agism, and racism).  If someone visits ESPN.com, they probably like sports.  Someone who goes to Amazon.com probably does some online shopping, and someone who goes to Stack Overflow is probably pretty tech savvy.  We (and Google Analytics) don't need to make many assumptions here that aren't on stable footing.

With all of that said, why would I bother going to the trouble of getting this working, if it has the potential to be so misleading?  Well, for one thing, it is almost always better to have some information rather than none, as long as you are careful with what you do with it.  As long as you just want general impressions, or to confirm something you already have a pretty good idea of, assumptions and aggregations are ok for that, if not ideal. 

The real benefit though, to both us as Analytics users, and to Google (thus the real reason for this feature), is that it can tie this information directly to your AdWords account through the DoubleClick tag, and use the data that you are collecting to automatically adjust the targeting of your GDN advertising.  For you, you get hands-free optimization of what has traditionally been a very blind and inefficient channel, and Google gets a way to encourage huge numbers of customers to invest more money in their display network.

For the effort that you have to put into it, it's worth doing, though it's hard to think that this is anything but a short-term fix as Google is desperate to increase revenues in the face of falling CPCs and the shift to mobile search.  Look for an update when I see if there is any notable change in performance of our GDN campaigns.