It’s about a 4 min. read.
Dear Dr. Jay,
The debate over how and when to test for statistical significance comes up nearly every engagement. Why wouldn’t we just test everything?
-M.O. in Chicago
You’re not alone. Many clients want all sorts of things stat tested. Some things can be tested while others can’t. But for what can be tested, as market researchers we need to be mindful of two potential errors in hypothesis testing. Type I errors are when we reject a true null hypothesis. For example, if we accept the claim that Coke tastes better than Pepsi, it’s erroneous because in fact, it’s not true.
A type II error occurs when we accept the null hypothesis when in fact it is false. This part is safe to install and then the plane crashes. We choose the probability of committing a type I error when we choose alpha (say .05). The probability of a type II error is a function of power. We seldom take this side of the equation into account for good reason. Most decisions we make in market research don’t come with a huge price tag if we’re wrong. Hardly anyone ever dies if the results of the study are wrong. The goal in any research is to minimize both types of errors. The best way to do that is to use a larger sample.
This conundrum perfectly illustrates my “Life is a conjoint” mantra. While testing we’re always trading off between the accuracy of the results with the cost of executing a study with a larger sample. Further, we also tend to violate the true nature of hypothesis testing. More often than not, we don’t formally state a hypothesis. Rather, we statistically test everything and then report the statistical differences.
Consider this: when we compare two scores, we accept that we might get a statistical difference of 5% of the time simply by chance (a=.05). This could be the difference in concept acceptance between males and females.
In fact, that’s not really what we do, we perform hundreds of tests in most every study. Let’s say we have five segments and we want to test them for differences in concept acceptance. That’s 10 t-tests. Now we have a 29% chance of flagging a difference simply due to chance. That’s in every row of our tables. The better test would be to run an analysis of variance on the table to determine if any cell might be different. Then build a hypothesis and test them one at a time. But we don’t do this because it takes too much time. I realize I’m not going to change the way our industry does things (I’ve been trying for years), but maybe, just maybe you’ll pause for a moment when looking at your tables to decide if this “statistical” significance is really worth reporting—are the results valid and are they useful?.
Got a burning research question? You can send your questions to DearDrJay@cmbinfo.com or submit anonymously here:Ask Dr. Jay!