Friday, November 7, 2008

Mastering Basic Statistics

Trust me. You can do statistical analysis. The basics of statistics can be mastered... Forget about those mind-numbing textbooks for a second. Descriptive Statistics are all about what the data looks like. Inferential Statistics are all about whether two sets of data are different or if two sets of data have a relationship.

Descriptive Statistics are a summary of what the sample data looks like, such as the measure of central tendency (e.g., mean for interval data) and measures of dispersion (e.b., standard deviation (SD) for interval data). Data that is dispersed about a mean like the bell-shape is normally distributed (i.e., 68.26%in 1 SD, 95.44% in 2 SD, 99.7% in 3 SD). The randomly drawn sample is best but rarely possible, so a non-random or convenience sample can be used with justification.

When compiling descriptive statistics, you need to know whether the sample data (i.e., level of measurement) is nominal (yes, no, or a label), ordinal (in some kind of order such as doneness of meat: rare, medium rare, medium, or well done) or a number that has order and the value means something (such as "that movie is an 8 on a scale of 10"). You also need to know the unit of analysis, such as the individual, group, organization, or society. Descriptive Statistics tell us what Inferential Statistics we can safely use to draw conclusions.

Inferential Statistics are how we make a decision about the POPULATION guided by what the Descriptive Statistics have told us about the SAMPLE data using probability theory. There are two types of decisions: Measures of Difference and Measures of Association. Measures of Difference (z, t, F, etc.) test differences between a number and sample, two samples or more than two samples. Measures of Association (r, correlation, regression) test whether variables move together and possibly whether there is some causal relationship. (Causal relationships are tricky to prove so be careful about saying X causes Y.)

When applying Inferential Statistics, the types of measures of difference or measures of association that can be used are governed by the level of measurement, the number of samples you are comparing, whether the sample is random/independent, and if the data is tightly dispersed about the mean like the normal distribution. When you are comparing samples, you have to make sure that the unit of analysis in each sample aligns with the other samples and your research question. (e.g., students in a classroom vs. a classroom of students, such as can a single student be judged by being in a particular class or should the particular class be judged by a single student.) Test statistics are calculated from sample data and critical values are looked up on a distribution (probability) table, and you compare these two in hypothesis testing. If you see a low p value, that is good.

All good quantitative research uses variations of the above instances to boil the research question down to a testable hypothesis for a large sample for descriptive, exploratory, or causal/experimental research. All good research articles explain how construct validity (i.e., theory or practical problem), external validity (i.e., how and why the sample was chosen), internal validity (i.e., why they think they saw is what they saw) and conclusion validity (i.e., how the descriptive and inferential statistics support our discussion) are achieved. A sample size of one in qualitative research might use ethnography, action research or other methods to build a case study or foundation for quantitative research.

That's it. That's about all a business manager or MBA must know about statistics. Of course, there is a lot more that you could know, but the basics can be mastered.

No comments: