Friday, January 18, 2008

Did Babies Build Roads in Europe? Presumed Causation Between Correlated Variables

A frequent problem with interpretation of data in the social sciences and business research is presumed causation between correlated variables. Two variables can exhibit perfect linear correlation yet not be in a cause and effect relationship. Generally, we need to satisfy at least three stipulations to argue for a cause and effect relationship:

  1. Temporal precedence -- the cause happens before the effect.
  2. Association between the independent variable (i.e., cause) and dependent variable (i.e., effect) – a linear, geometric, exponential, logarithmic, or some other covariation exists.
  3. No reasonable alternatives -- upon careful inspection there are no other reasonable explanations for why the cause would result in the effect.

For example, between the years of 1945 and 1962, there were dramatic increases in the number of new roads built in Europe and the number of live births in the United States. (Note that I read this comparison somewhere but I do not recall; I use it frequently when teaching undergraduate statistics, because the face absurdity of the comparison makes the lesson easily remembered by students.) Were babies building roads in Europe? Not likely. Were roads in Europe making it possible for more babies to be born in the good 'ole U.S.A.? Not likely. See, variables can be perfectly correlated and probably unrelated. That is, there is no direct relationship between those variables; a confounding, third variable could be related to both of the correlated variables, which we might assign in this case to the drastic social upheaval that occurred during World War II.

In business research, causation and correlation are frequently confused as well. For example, are the dollars invested in showroom inventory the cause of sales revenue at the retail furniture store? Are the dollars of sales revenue generating investments in new showroom inventory? Still, is a third, confounding variable, such as consumer demand, somehow affecting both? In many cases, the discrete causal variable is not being measured, but at least the three stipulations above must be satisfied to argue for causation between any two known variables.

No comments: