SEMiotics: Data Analytics

Showing posts with label Data Analytics. Show all posts

Wednesday, April 13, 2016

Mutli-Channel Attribution and Understanding Interaction

I'm no cosmologist, but this post is going to rely on a concept well known to astrophysicists, who often have something in common with today's marketers (as much as they might be loathe to admit it). So what is it that links marketing analytics to one of the coolest and most 'pure' sciences known to man?

I'll give you a hint: it has to do with such awesome topics as black holes, distant planets, and dark matter

The answer? It has to do with measuring the impacts of things that we can't actually see directly, but still make their presence felt. This is common practice for scientists who study the universe, and yet not nearly common enough among marketers and people who evaluate media spend and results. Like physicists, marketing analysis has progressed in stages, but we have the advantage of coming into a much more mature field, and thus avoiding the mistakes of earlier times.

Marketing analytics over the years and the assumptions created :

Overall Business Results (i.e. revenue) : if good, marketing is working!
Reach/Audience Measures (i.e. GRPs/TRPs) : more eyeballs = better marketing!
Last-click Attribution (i.e. click conversions) : put more money into paid search!
Path-based Attribution (i.e. weighted conversions) : I can track a linear path to purchase!
Model-based Attribution (i.e. beta coefficients) : marketing is a complex web of influences!

So what does this last one mean, and how does it relate to space? When trying to find objects in the distant regions of the cosmos, scientists often rely on indirect means of locating and measuring their targets, because they can't be observed normally. For instance, we can't see planets orbiting distant stars even with our best telescopes. However, based on things like the bend in light emitted from a star, and the composition of gases detected, we can 'know' that there is a planet in orbit of a certain size and density, that is affecting the measurements that we would expect to get from that star in the absence of such a hypothetical planet. Similarly we don't see black holes, but we can detect a certain radiation signature that is created when gases under the immense gravitational force of the black hole give off x-rays.

This is basically what a good media mix/attribution model is attempting to do, and it's why regression models can work so well. You are trying to isolate the effect of a particular marketing channel or effort, not in a vacuum, but in the overall context of the consumer environment. I first remember seeing white papers about this mainly about measuring brand lift due to exposure to TV or display ads, but those were usually simple linear regression problems, connecting a single predictor variable to a response, or done as a chi-square style hypothesis test. But outside of a controlled experiment, this method simply won't give you an accurate picture of your marketing ecosystem that takes into account the whole customer journey.

As a marketer, you've surely been asked at some point "what's the ROI of x channel?" or "How many sales did x advertisement drive?" And perhaps, once upon a time, you would have been content to pull a quick conversion number out of your web analytics platform and call it a day. However, any company that does things this way isn't only going to get a completely incorrect (and therefore useless) answer, but they aren't really even asking the right question.

Modern marketing models tell us that channels can't be evaluated in isolation, even if you can make a substantially accurate attempt to isolate a specific channel's contribution to overall marketing outcomes in a particular holistic context.

Why does that last part matter? Because even if you can build a great model out of clean data that is highly predictive, all of the 'contribution' measuring that you are doing is dependent on the other variables.

So for example, if you determine that PPC is responsible for 15% of all conversions, Facebook is 9%, and email is 6%, and then back into an ROI value based on the cost of each channel and the value of the conversions, you still have to be very careful with what you do with that information. The nature of many common methods for predictive modeling is such that if your boss says, "Well, based on your model PPC has the best ROI and Facebook has the worst, so take the Facebook budget and put it into PPC" you have no reason to think that your results will improve, or change the way you assume.

Why not? Because hidden interactivity between channels is built into the models, so some of the value that PPC is providing in your initial model (as well as any error term), is based on the levels of Facebook activity that were measured during your sample period.

It's a subtle distinction, but an important one. If you truly want to have an accurate understanding of the real world that your marketing takes place in, be ready to do a few things:

Ask slightly different questions; look at overall marketing ROI with the current channel mix, and how each channel contributes, taking into account interaction
Use that information to make incremental changes to your budget allocations and marketing strategies, while continuously updating your models to make sure they still predict out-of-sample data accurately
If you are testing something across channels or running a new campaign, try adding it as a binary categorical variable to your model, or a split in your decision tree

Just remember, ROI is a top-level metric, and shouldn't necessarily be applied at the channel level the way that people are used to. Say this to your boss "The marketing ROI, given our current/recent marketing mix, is xxxxxxx, with relative attribution between the channels being yyyyyyy. Knowing that, I would recommend increasing/decreasing investment in channel (variable) A for a few weeks, which according to the model would increase conversions by Z, and then see if that prediction is accurate." Re-run the model, check assumptions, rinse, repeat.

Friday, April 8, 2016

Own Your Data (Or at least house a copy)

There is a common trope among data analysts that 80% of your time is spent collecting, cleaning, and organizing your data, and just 20% is spent on analyzing or modeling. While the exact numbers may vary, if you work with data much you have probably heard something like that, and found it to be more or less true, if exaggerated. As society has moved to collecting exponentially more information, we have of course seen proliferation in the types, formats, and structures of the information, and thus the technologies that we use to store it. Moreover, very often you want to build a model that incorporates data that someone else is holding, according to their own methods that may or may not mesh well with your own.

For something as seemingly simple as a marketing channel attribution model, you might be looking at starting with a flat file containing upwards of 50-100 variables to start, pulled from 10+ sources. I recently went through this process to update two years' worth of weekly data, and it no joke took days and days of data prep just to get a CSV that could be imported into R or SAS for the actual analysis and modeling steps. Facebook, Twitter, Adwords, Bing, LinkedIn, YouTube, Google Analytics, a whole host of display providers... the list goes on and on. All of them using different date formats, calling similar/identical variables by different names, limited data exports to 30 days or 90 days at a time, etc..

Obviously, worth the effort for a big one-time study, but what about actually building a model for production? What about wanting to update your data set periodically and make sure the coefficients haven't changed too much? When dealing with in-house data (customer behavior, revenue forecasting, lead scoring, etc.) we often get spoiled by our databases, because we can just bang out a SQL query to return whatever information we want, in whatever shape we want. Plus, most tools like Tableau or R will plug right into a database, so you don't even have to transfer files manually.

At day's end, it quickly became apparent to me that having elements of our data, from CRM to Social to Advertising, live in an environment that I can't query or code against is just not compatible with solving the kinds of problems we want to solve. So of course, the next call I made was to our superstar data pipeline architect, an all-around genius who was building services for the Dev team running off our many AWS instances. I ask him to start thinking about how we should implement a data warehouse and connections to all of these sources if I can hook up the APIs, and he of course says he has already not only thought of it, but started building it. Turns out, he had a Redshift database up and was running Hadoop MapReduce jobs to populate it from our internal MongoDB!

So with that box checked, I started listing out the API calls we would want and all of the fields we should pull in, figure out the hook ups to the third party access points. Of course, as we have an agency partner for a lot of our paid media, that became the biggest remaining road block to my data heaven. I schedule a call to the rep in charge of their display & programmatic trade desk unit, just so we can chat about the best way to hook up and siphon off all of our daily ad traffic data from their proprietary system. After some back and forth, we finally arrive at a mainly satisfying strategy (with a few gaps due to how they calculate costs and potentially exposing other clients' data to us), but here is the kicker:

As we are trying to figure this out, he says that we are the first client to even ask about this.

I was so worried about being late to the game, that it didn't even occur to me that we would have to blaze this trail for them.

The takeaway? In an age of virtually limitless cheap cloud storage, and DevOps tools to automate API calls and database jobs, there is no reason that data analysts shouldn't have consistent access to a large, refreshing data lake (pun fully intended). The old model created a problem where we spend too much time gathering and pre-processing data, but the same technological advances that threaten to compound the problem can also solve it. JSON, SQL, Unstructured, and every other kind of data can live together, extracted, blended and loaded by HDFS into a temporary cloud instance, as needed.

The old 80/20 time model existed, and exists, because doing the right thing is harder, and takes more up-front work, but I'm pretty excited to take this journey and see how much time it saves over the long run.

(Famous last words before a 6 month project that ultimately fails to deliver on expectations; hope springs eternal)

What do you think? Have you tried to pull outside data into your own warehouse structure? Already solved this issue, or run into problems along the way? Share your experience in the comments!

Wednesday, March 25, 2015

The Trouble With Stats (or the people who pretend to use them)

My father also used to enjoy telling me "figures don't lie, but liars figure." He was essentially trying to explain why people often don't trust those who use numbers to explain things that don't naturally appeal to our (highly flawed) instincts. There is a Simpsons quote along similar lines, but for those of you who are fans, you already know it, and for anyone else, it would be a waste of time.

There is another, larger problem, however, that also damages the reputation of statistical analysis and the things it can tell us about our world. It's when people know/do just enough to be dangerous, and without basic rigour produce findings that are unsupportable. When they spread those findings, people who don't have a strong understanding of the analytical process tend to take them at face value, and then end up being let down later. Rather than blaming the particular culprit who practiced the bad stats, they just blame stats in general. These bad actors aren't always guilty of malice, but it doesn't absolve them of crimes against science, and there is a perfect example on ESPN.com today.

Obligatory picture of Nomar, 'cause baseball!

Check out the article here (it's an Insider article, so if you aren't already behind the pay wall, you won't be able to read it). To summarize, Mr. Keating is trumpeting the value of a simple composite stat, runs per hit (R/H from now on), as some sort of gem in terms offensive value, in both real and fantasy terms. He begins, ironically enough, by pointing out that "many sabermetricians [statisticians who study baseball] barely glance at stats like runs and RBIs, which depend heavily on a player's offensive context" (which is true), and then immediately goes on to make a vague declaration about other skills he must have (not true).

Keating proceeds to provide a handful of historical examples of other players with a high R/H, in fun anecdotal fashion. Fully three of the seven paragraphs in the article don't talk about the particular player that it focuses on, Brian Dozier of the Twins, and even the ones that mention him hardly focus on him. The examples provided vary widely across run environments, from the height of the power-infused steroid era, to pre-WWII baseball, with no accounting for what the differences in offensive numbers looked like at those times.

Finally he cherry-picks a few contemporary data points that support his assertion, none of which are particularly relevant to the point that he is making (and in some cases, contradictory).

At no point does he cover anything like the methodology he used to arrive at his conclusions, or even imply that there was any methodology, which might be more disturbing. Here is one thing I know about people who appropriately use statistics: they love to share the gory details.

Now, I will admit, I had a little prior experience with this particular stat, because I had looked into it a few years ago, and ultimately rejected it as useful. However, context always matters, so it was worth another look. One of the first things that Keating does in the article is subtly undermine sabermatricians and complex statistics, which is curious for a guy who's bio at the bottom of the page says that he covers statistical subjects as a senior reporter. Basically, what he says is that R/H is a stat "...that is so simple you can calculate it in your head and it'll tip you off to hidden fantasy values" (I challenge both parts of that statement).

He mentions BABIP later on without any explanation, so he is assuming a certain amount of sophistication on the part of his readers, which is understandable since most fantasy enthusiasts have more exposure to statistical analysis than an average baseball fan. Certainly, anyone interested in "new" composite stats would also be familiar with OPS, and quite possibly wOBA and wRC as well, so the assumption has to be that R/H offers something that these other stats don't.

This is where the whole thing falls apart under scrutiny. Keating says that R/H is a good proxy for identifying other skills like speed, walk rate, and power, but all of those other stats I just mentioned do the same thing, but the difference is, they actually work. Depending on your analytical preference (or which site you get your baseball fix from), you might choose to believe in OPS, wOBA, or wRC (or base runs, etc.), but the reality is, they are all pretty good, and so they correlate very highly to one another (r values of above .95 between all of them). R/H, on the other hand, correlates with none of them, not even close.

Of course, those are all stats meant to capture real-world offensive value in a context neutral way, and maybe R/H only works for fantasy value. So the obvious thing to do would be look at R/H against ESPN fantasy player values. Unfortunately, once again, R/H has almost no correlation to the player rater values (whereas the fantasy rater correlates really well with those other stats).

"Wait," you might (and should) say, "the fantasy player rater is going to be heavily weighted towards counting stats, and thus might not be appropriate when comparing against a weight stat like R/H." Sadly, you, like Mr. Keating, would be wrong again, as Dozier was 7th in the majors in plate appearances (PA) among qualified batters, so he actually has an unfair advantage in this case.

The truth is, of players who had enough plate appearances to qualify for the batting title last year, 4 out of the top 5 by R/H were not in the top 30 fantasy players, and only 4 of the top 15 made the cut. Most of whom were not even guys with an balanced speed/power/discipline profile, like Dozier (which Keating says the stat should identify), but power hitters like Donaldson, Rizzo, and Bautista. If you look at wRC+, every single one of the top-15 players was in the top 40 by fantasy value, and has an r value twice as high as R/H (.71 vs .31).

A common way to look at how much noise might be in a particular stat is to look at year-over-year correlation, and I will say that for players with at least 350 PA in both 2013 and 2014 there was a slight correlation. At just below an r value of .5, however, it wasn't particularly strong, highlighting just how much luck goes into this formula by including runs, and frankly, this number would probably change by looking at more years (I admit to not going deeper here, but I will point out the failing if someone wants to look into it).

Bottom line: runs per hit is not a great stat over all, and it is not a great stat in the context of fantasy baseball value. I won't venture to say that Keating did this analysis and decided to hide the results from his readers, nor will I say that he was too lazy to do any analysis, but I do know that it took me all of thirty seconds to start finding problems with the assertion. As someone who plays fantasy baseball, I have no interest in giving my competition an advantage, but I also can't figure out how this kind of misleading "analysis" does any good for the field.

The issue isn't just that the number doesn't say what Keating claims it does, because it will, by the nature of its component parts and some of the reasons he gives, have some relationship with the skills he is trying to identify. The issues are that it does so very inefficiently due to flaws in the components used, that we already have much better stats for highlighting the same information. The last, biggest issue is that despite being at least somewhat aware of these flaws, Keating is touting this approach nonetheless, under the guise of statistical analysis, without providing any evidence in support.

Dozier is a good player, and has fantasy value, but anyone using R/H to evaluate players is going to be disappointed in their fantasy season. It is neither predictive nor descriptive of a player's skill, and shouldn't be used that way. The problem is that by placing this article behind the pay wall of an authority like ESPN, the false premise is given a credence that it doesn't warrant, and undermines statistical analysis as a whole.

Tuesday, January 13, 2015

Putting the 'Fun' in 'Logistic FUNctions!'

So, this weekend I was playing around with R to graph some data sets and get the variables that go into the formula for doing logistic modeling. While trying to figure out some f(t) or t-values given the inputs, I was annoyed at calculating the results by hand, even if I was able to get the variables from R. It's pretty normal math for this kind of work, and it's good to know how to do it manually step-by-step, but eventually the fun wore off, and I just wanted to get it done.

Anyhow, I built a "Logistic Function Solver" aka calculator in Excel, and figured I would share it on the off-chance anyone else needs such a thing and doesn't feel like taking the time to build one. It's already been useful at work.

Here it is

Anyhow, you should be able to access the file on Google Drive with that link, and then make a copy for yourself. Let me know in the comments if this doesn't work for you.

Basically, cells with the light yellow background and green text are ones in which you should enter your values, and then your desired output will be in the green background with red text. (Leave the other cells alone, they have formulas)

Enjoy!

Tuesday, May 27, 2014

Top 5 Skills for the Modern Marketer/Data Analyst

[Skip to the bottom if you just want the top 5 list]

Over the years that I have been in digital marketing and analysis, I have been constantly shocked by the gaps and deficiencies that I have found in not only my own, but the entire industry's skill set. When I first began as a lowly search specialist, I came in with nothing more than a decent understanding of how organic search engines worked, a basic familiarity with Excel, and a passionate, though amateurish, interest in statistical theory. Within three months, my relevant knowledge base had expanded exponentially, but still I felt that I lacked useful skills, and frankly, that most people in the industry did as well. I have been trying to rectify that situation ever since.

Right off the bat, I was amazed at how little rigorous statistical analysis was being applied to SEM and other digital media buying and planning channels, given the volume of available data that was being (or could be) collected. This was manifest not only in the proportion of data analysts to account team members (which was very low at the time), but also in the absence of fundamental conceptual understanding of statistics held by the marketers themselves. It was naive of me to think that every account team would have a dedicated analyst (though I had assumed as much before my first day), but even at the time I thought that some rudimentary education on the theory and practice of utilizing data sets should be a prerequisite for a digital marketer.

Even more simply, I realized quickly that my Excel proficiency was not where it needed to be, or at least not at a point that my work couldn't be substantially improved by getting better with spreadsheet applications. What I thought I had known about Excel (still one of my all-time favorite human inventions) was a drop in the bucket compared to what I felt like I ended up needing, but as I developed those skills I was once again shocked by their conspicuous absence from the average marketer's tool kit. The number of people in our office who really knew Excel, and could maximize the efficiency of its capabilities, was limited to single digits, even though it is the bread and butter of any search marketer. From there, overly large data sets led me to need to use MS Access, a program which even fewer people were qualified to use, causing all kinds of missed opportunities and bottlenecks. While most people in every office that I have ever worked in tend to just seek out those who have that knowledge when they need it, very few companies require, or even encourage, widespread acquisition of information and skills that are borderline critical to the work their employees do.

When tagging and tracking issues came up (and they always do), I found myself frustrated by the gate-keepers and communication disconnects that exist between marketers and IT/website maintenance teams, so I realized that I would have to understand (at least at a rudimentary level) HTML, and then JavaScript. I had to learn more principles of SEO at times, which also required understanding of those basic web development languages. I had to understand other marketing channels to really see interactions, I had to understand offline sales processes to gain insights into lead generation marketing, which meant that I had to first learn about CRM pipelines, and then CRM platforms like Salesforce and Hubspot. As the lines between social, paid social, content, and SEO blurred, I had to approach each subject in turn; in order to understand any one of them I had to understand all of them. To understand what my data meant I needed to know all of the data that was collected, so I had to learn about databases. In order to make use of the databases, I had to learn SQL. I'm so far from where I started, and yet still so much further still from where I need to be. I will never have enough knowledge and understanding to do my job as well as I think I should.

But at every step in my career I have been surprised to see just how many people in the industry lack not only the skills that I have been seeking, but even the awareness of the roles that they should play, within the agency world and without. For so many years everything was siloed in terms of labor division that marketers (and really, everyone in business) came to believe that the world outside of their specific responsibility was segmented this way as well. There is this common theme in the industry today that those walls are finally breaking down, that channels are at long last interacting and that the ecosystem has finally become diverse and highly dependent, but this is a false concept. The ecosystem has always been complex, and the fact that we are finally starting to recognize it doesn't excuse us from responsibility for the gaps in the past, nor the continuing specification of skills moving forward.

A search marketer can't get away with simply knowing the AdWords and Bing platforms anymore, or at least shouldn't be able to in your workplace. Would you want someone in charge of a campaign that doesn't understand how the tracking codes work in a jquery library? Do you want someone presenting to clients or superiors not only raw information, but conclusions and insights, who doesn't understand sampling concepts, or how to differentiate between correlation and cause? How can a marketer assess the value of a user action without understanding the offline sales process, or the difference in the consumer journey for B to C versus B to B?

For so long digital marketers were like Oz, we claimed to be wizards and got away with it because no one looked behind the curtain. People finally looked behind curtain and found that in fact, it was all done with machines, and they were actually fine with that, because we said we were running the machines expertly. The problem is that now digital marketers are often demonstrated to simply be the people standing next to the machine, with no more understanding of how it works than those who were on the other side of the curtain. In order to stay relevant, we all need to not only be able to read the outputs, but understand and interact with the inputs as well. The world is changing fast, and education, in any form, is the only path to relevance.

So to sum this up into a top-five list (because that's what the internet wants), here we go:

Top 5 Skills for Every Data-Driven Marketer

1.) Microsoft Excel (custom sorting, formulas, pivot tables)

2.) Basic Statistical Theory (samples size & significance, correlation vs causation, variance & standard deviation)

3.) CRM Process/Offsite Interaction (digital is not a separate realm, it is part of the broader business we engage in)

4.) Minimal HTML, JavaScript knowledge (metadata tags, H1s, how API calls work, tagging intricacies & common problems)

5.) SQL/RDB Querying (pick one, MYSQL, PostgreSQL, even NOSQL, it doesn't matter; maybe learn R or Hadoop if you want to get fancy)

Monday, January 30, 2012

Is Marketing Really "Data-Driven?" Pt. 1

No matter what you call it, the clear trend in marketing today is towards a model that depends on consumer data collected digitally to inform both online and offline media strategy. Terms like “data-driven” and “fast-moving data” are bandied about, conjuring up an image of an agile, precise campaign that links brands to individuals, rather than demographics. Marketers know that the shift from art to science is already in progress, and I should say that I wholeheartedly agree with this approach.

The problem is that there is a danger in a job only half done, and at times I fear that we as an industry talk about “data-driven marketing” like experts, but that there is no rigor to the approach. Additionally, using digital data to inform traditional media, both in terms of planning and creative, when the same statistical approach isn’t applied to those channels, will return misleading results. No matter how cleverly you apply your digital learnings to traditional performance, if the metrics by which we measure TV are inaccurate, or not properly tied to business goals, then we risk just painting a picture that is different but no more insightful.

To truly claim a data-driven approach, you need to collect data at every step of the marketing process systematically, and analyze it methodically, adhering to sound statistical procedure. Just as importantly, you need to know what data to gather, and how it helps you to achieve your goal.

Let’s start looking at an example of how this can affect measurement at every level of a campaign. Starting with the broadest, what is the goal of marketing? To increase the sales/services provided of the client. How is that measured? Brand loyalty? Market share? Sales in dollars? Profit? Units sold? The first thing that an advertising agency has to do (ideally) is identify what the client goals are, and frankly, the media agency should be the one that determines the goal, as it is part of the marketing process.

Why is that? Let’s look at the list of client goals that I mentioned above, all of which at first blush appear to be totally normal, reasonable ways to judge a marketing agency, but all of which have some issues from a statistical and/or business standpoint.

Brand Loyalty: This is probably the worst measure for a number of reasons, over and above the fact that it is a vague concept. Anything that is survey or panel based can be looked at, but the methodology and sampling issues make it less scientific.

Market Share: Better than brand loyalty, but because the information has to come from a number of outside sources makes the gathering of this data ponderous, and more importantly there is a long time lag for reporting.

Sales (dollars): On its own this number is somewhat useful because it is an absolute 1-to-1 value, but it really should be adjusted to account for the market environment of the client’s particular category, rather than taken raw.

Profit: Terrible. I don’t think that anyone would actually measure a company’s marketing success based on profits, but it is something that clients think about and a useful illustration about what stats you don’t want. Too many uncontrolled variables go into profit and revenue numbers. If a company sells more product but the cost of raw materials increases as well, it shouldn’t be factored into any measure of advertising.

Sales (Units): This is probably the best way to measure overall advertising success over a long period of time, once again normalizing the number to the broader market conditions. By using sales in units you remove some of the variables around pricing and competitive environment (aside from the ones that can adjusted for).

The key to any good statistical measure of success or ability is removing as many uncontrolled variables as possible, and not crediting/blaming advertising for things it can’t control.

Since I often use baseball as examples for statistics and how to use them, the perfect analogue here is using ‘wins’ to judge a pitcher. Conventionally, people looked at wins to determine how good a pitcher is, but that number is quickly falling out of favor, because it has very little statistical relevance to how well a pitcher performs. Think about it, if a pitcher gives up 5 runs but his team scores 8, he gets a win. If another pitcher gives up 2 runs but his team is shut out, he gets a loss. Who did a better job?

The lesson is not that we shouldn’t measure things and use the data as much as possible, but that not all stats are created equal, and that we need to make sure that what we are collecting is telling us what we think it is. Right now I would say that marketing is getting good at amassing data, but still extremely infantile in terms of manipulating it properly. We are still at the stage of evaluating pitchers based on wins, as it were.

Some of this is also based on assumptions, and how many of them are based on traditionally held marketing beliefs that we take for granted, despite never seeing empirical evidence for them. Every marketer should be a gadfly. Poke holes in theories or justifications that don't make sense. If you see a test that doesn't account for uncontrolled variables in the results, point it out. If a conversation is centered around an idea that everyone accepts but no one has proved, ask why.

Next up, it might be worth looking at TV, and the relationship between digital/social and offline, in order to challenge some of the preconceived notions.

Monday, January 23, 2012

We Need to Make Digital Measurement Easier

[Editor’s Note: Sorry for the long layoff, I am going to be better about posting starting today]

To be clear, I don’t mean that we need to make it easier for us digital marketers, but that we need to make it easier for the brand representatives that we have to report to.

When I go through plans and recaps for marketing programs, the problem becomes very clear. People who have been dealing with traditional marketers for a long time expect just a few things from TV, print, and radio: Reach and Frequency. These are estimates, and they are provided by the people selling the media, so they don’t need to be calculated by those buying it.

It’s simple, and clean, though it does nothing to tell you about the effectiveness of the channel after the fact. How many people saw the ad, and how often. Move on.

Sometimes you will see a brand lift study built into a buy, which basically just consists of polling some consumers to see how the ad made them feel (with varying degrees of scientific rigour).

Then we get to digital, and suddenly the performance metrics increase exponentially. The breadth and depth of data that we have available to us in the digital space is both a blessing and a curse in that sense.

First of all, we subdivide “digital” into myriad channels of increasing specificity. There is display, search, social, in-text, and more. Each of these sub-channels has multiple ad unit types, and in turn, each ad type has multiple statistics that can be tracked.

(For instance, display ads can be static units or interactive units (and static units can be further broken down by size, so there are standard banners, skyscrapers, etc.), and so you have reach in terms of unique users, then interaction rates, time spent in the ad unit for rich media, click through rate, video plays in unit, and more.))

You can measure attributes of the ads themselves, like click through rate, impressions, cost per impression/click, etc., and you can also measure on-site actions and behavior, like conversions, bounce rate, time on site, and so on.

We haven’t even talked about the social metrics like Facebook likes, tweets, +1s, ‘conversations, and additional followers/friends.

The upside of all of this is that obviously the data gives us visibility and optimization options that traditional marketers can only dream of. The downside is that we are actually held to performance standards unlike traditional offline media channels, and moreover, that the people who we report to get lost in all of these metrics.

Traditional media channels don’t provide brands with much in the way of data or measurement options, and maybe the answer is that they should be forced to come up with better ways to justify their value. More likely however, we as digital marketers need to find ways to simplify our reporting.

This may mean actually giving brands less raw data, and it’s possible that Pandora’s box has been opened and it is too late. However, I think that the only possible outcome is the creation of a weighted composite number that is based on an equation taking into account a variety of metrics across digital channels, pegged to an index. The million dollar problem is just figuring out how to do it, but you can bet that I will be working on it, as I am sure others are.

Expect a 'part two' of this entry in the future.

Monday, October 3, 2011

Is Data a Big Deal in Advertising?

How does this still qualify as a question when it has been answered? Enough to warrant a "big prediction" and a panel discussion? Apparently, Omnicom thinks so.

The idea that it is going out on a limb to say that "everyone in the advertising industry will be 'partly a data scientist.'" is frustrating because it should be that way already, and should have been for years.

Obviously, I did most of my ranting on this a few posts back, but seeing it in print just gets my hackles up all over again. When I got into this business, it was basically because I spent a lot of free time on sabermetrics and baseball, and I wanted to work in a field where data analytics was the job. I honestly assumed that every SEM account team would have a dedicated stats person, rather than 2-3 analytics people for an entire office, if you are lucky.

On one hand, I should be glad, because every single advertising person who hasn't completely embraced the statistical analysis side of the industry is making me look better in comparison. As a search person, you would think that it would be a huge advantage over traditional media and thus cause money and favor to flow to my part of the business.

The reality is though, while it seems like a no-brainer to some, the very fact that there is an article like this means that the boat is being missed as we speak. If you saw a newspaper article in 1943 headlined "FDR Concerned that Hitler Might Cause Some Trouble," wouldn't you worry about what the heck FDR had been up to since 1936?

Is it worse to be stating the obvious, or stating the obvious years after it should have been obvious?

On the other hand, apparently we have a bright future to look forward to, with numbers and stuff.