This is the second part of the sharing of information on what we can do in these times of information overload - and its going to get worse.
This a continuation of a blog by a respected business associate.
Again its not the data - but how you use it thats important.
Making Better Decisions with Evidence - Part 2
My last post, about using data and evidence to make better management decisions, ended with a question.
A book club did an A/B test showing that by making a welcome call to new members, first-year spending could be increased by 8% and first-year retention by 6%.
So the question I posed at the end of this article, without providing an answer, had to do with what this company should now expect, as they roll their “welcome call” program out to alltheir new members. Should they expect the same results as they had in their test, or worse results, or better results?
If you didn't yet read the last post, what do you think?
There were a lot of answers ventured in the comments section, but unfortunately the vast majority were wrong. The correct answer was (wait for it)...
The book club should expect worse results from its roll-out.
More than two thirds of those who guessed any specific answer thought that the results would be the same, and several people suggested that there was insufficient information to tell.
Only two people thought the book club should expect the roll-out to have worse results, which is actually the correct answer, but no one suggested the correct reason for expecting worse results (although kudos to Sam Walker, who suggested it was because of regression to the mean, which was certainly close).
No, the actual reason the roll-out of a successful A/B test should be expected to under-perform the test result has to do with the sample of tests we choose to roll out, which is inherently biased. We only roll out successful tests, right? No one would roll out an unsuccessful test, why would they ever do that?
But this introduces a statistical bias. Every test, no matter how big or small, will have elements of randomness to it. There is always a chance that the results of a test on a sample of customers will give results that are significantly better or worse than would be achieved for the whole population of customers, based purely on the random selection of test participants.
When you do an A/B test, if your random choice of test participants just happens to include a few very enthusiastic or valuable customers, for example, then your test result will be better than the average result you would get for all your customers. Or, if your test sample just happens to include some very lackluster customers, its result will be worse than the average for all customers.
These variances occur purely on account of the random selection of participants. The smaller sample size you use for your test, the bigger the variance is likely to be, but there will be some variance in all samples, no matter how large.
However, what this means is that by choosing only to roll out tests that show positive results, we are eliminating more randomly negative results, and including more randomly positive ones, right? That is, there is always a chance that the result of any test will be significantly better or worse than the average over all our customers, but if this random occurrence is negative, then we wouldn't be rolling the program out.
So, on average, we should expect more roll-outs to under-perform their tests than to over-perform them.
My purpose in putting this and the previous post up (and future posts that I plan in this series) is not to provide an academic course in statistics. No calculations or equations were needed to make the argument I just made. You don’t need to add or subtract numbers to understand the logic.
My purpose is to call your attention to the fact that even though, as managers, we are all now inundated with data, and even though we have immense computational resources at our fingertips, most of us are not yet skilled enough in our reasoning to be able to put these data to good use, as a general proposition. If we want the quality of our decisions to improve with the quality and availability of data and computational capabilities, then we first need to improve our statistical reasoning.
We need to improve our skill at using evidence to make good decisions.
A couple of comments on the previous post made reference to the fact that the book club I mentioned would also have to ensure that the data from its test results are not contaminated or biased in some way, and this is certainly true. Before you can rely on data at all, you have to be sure you can trust it.
So in my next post on this topic, I’ll discuss the conditions under which you can trust the data you encounter.