Ascribing Causes to Events
The post hoc fallacy generally describes a major problem with reaching and inductive conclusion that covariation between two variables indeed exists when the variables cannot be manipulated (to verify the conclusion.) Causal inferences are also only predictive and presumptive, not absolute. Therefore, conclusions based on causal inferences may only be temporary in nature -- changes in the conclusion will come if strong predictions of other causal inferences are found. In other words, large problems may result when other variables (not under study) may be responsible for actions we have ascribed to one particular variable.
Major Sources of Measurement Errors
The four major sources of measurement error are the respondent, various situational factors, undue influence by the measurer, and the instrument used to measure the response. The respondent may be a source of measurement error in a home telephone interview conducted in the evenings on the topic of political candidates when s/he gives imprecise answers just to end the interview and get off the telephone. An additional person such as a spouse present during an interview on the topic of attitudes toward a new car model may influence the responses given. An interviewer in a face-to-face interview may inadvertently lead the respondent to give certain answers by using a particular tone of voice. A long survey containing many questions worded at a higher level of vocabulary than the respondent is accustomed to will cut response rates.
Measurement Scale Validity More Important than Reliability
Scale validity is more important to the measurement process than reliability. Reliability is concerned with freedom from random error or instability that can be present in a measurement device. A measurement device can be reliable, but not it does not have to be valid. Validity is more critical than reliability because it refers to whether what we wish to measure is actually measured. However, a measurement device cannot be valid if it is not reliable.
Difficulty of Determining Content Validity
Content validity of the measurement scale items is not the most difficult type of validity to determine. The evaluation of content validity requires judgment and intuition which may be difficult for some to accomplish. Once again, the criterion-related validity is not simple, but it can be determined by correlation of the scores. Construct validity is the most difficult to determine because one must consider whether the construct applies to (i.e., supports or refutes) the theory in question.
Reliable Measures May Not Be Valid
A valid measurement is reliable, but a reliable measurement may not be valid. Once again, a measurement instrument can be reliable, but not it does not have to be valid. For example, a person may wish to measure a room. Believing one’s foot to be twelve inches in length, one steps across the room placing feet end on end. The process is repeated twenty times. Therefore, the conclusion might be that the length of the room is 240 inches. Later it is learned that the foot is only 11.5 inches in length. The person’s foot is a reliable measurement of the length of the room, that is, they get twenty foot-lengths every time, however, this is not a valid measurement of the room in standard inches.
Instrument Stability and Equivalence
Stability and equivalence are not identical terms. Both stability and equivalence are factors of reliability, but they refer to different ways in which reliability can be reduced. Stability refers to changes in the items that are being observed while equivalence refers to variations in how they are being observed.
Rating and Ranking Scales
Rating scales can be used easily to judge properties of objects against specified criteria without comparison to other objects, whereas ranking scales classify objects by requiring a choice between the objects. Rating scales can be time-consuming to construct, whereas the procedure for ranking objects can be difficult to administer. Rating objects can often be influenced by poor or careless judgments by the person doing the rating--the halo effect, leniency, and central tendency noted by Cooper and Schindler (2003) can all be found in the normal personnel review process, for example.
Ranking objects against one another may eliminate the need for development of an absolute set of criteria to judge by as required by rating scales, but judging more than two objects at a time may lead to misinterpretation of the exact level of attitude expressed for any one object. This is especially true when some of the items are equally matched in the positive attitude that can be evoked from the respondent. A perfect, yet simple, example of this vote splitting problem was seen in the 1992 U.S. Presidential elections where Bill Clinton, George Bush, and Ross Perot each received 41%, 37%, and 19% of the vote, respectively. One interpretation is that Clinton was the most favored candidate (i.e., had a mandate), whereas another interpretation is that Bush and Perot were similar enough in ideals that they were campaigning for the same votes, and therefore, had one of them not run, the remaining one would have been elected handily.
Likert and Differential Scales
Likert scales (i.e., a summated scale) can typically be created more easily than differential scales such as the Thurstone Differential Scale. This creation advantage that the Likert scale has may be due to the Likert being constructed through Item Analysis, rather than differential scale which is created through consensus. Differential scales are complicated and expensive so other methods like the Likert scale are preferable for business research. For some types of research, the cost of having many knowledgeable judges agree on the rating of items included on the differential scale may produce better results than using the pre-testing process of a summated scale.
Unidimensonal and Multidimensional Scales
Unidimensional scales cumulatively measure attitudes that are extreme to less extreme, so it is possible to understand which individual items the respondent judged positively or negatively. However, not all concepts and constructs can be adequately assessed in this way because the items being studied may be correlated in more than one way, that is, they are multidimensional. Construction of a scale using the Semantic Differential method may indeed reveal more dimensions and so the measurement instrument must be narrowly focused to measure a concept unidimensionally. Somewhat ethereal concepts such as organizational image or brand image may be difficult to assess with a cumulative (i.e., unidimensional) scale.
Methods of Survey Measurement Scale Construction
The five methods of scale construction are the arbitrary approach, the consensus, item analysis, the cumulative approach, and factor scales (Cooper & Schindler, 2003). The arbitrary approach is a commonly used method that describes scale construction that occurs as the measurement instrument is being developed. Responses are scored based on the subjective judgment of the researcher, which may be good or bad. This quick and inexpensive method can be very powerful in the hands of an experienced researcher.
The consensus approach involves scale construction by a panel of judges (i.e., presumably knowledgeable) who weigh each item for relevance, clarity, and level of attitude it expresses. The panel of judges can produce a better scale for the measurement instrument, but it does so at the expense of time and money.
The item analysis approach to scale construction actually analyzes how well the items included in the measurement instrument discriminate between indicants of interest. The values assigned can then be totaled to measure the respondent’s total score. The Likert is a one common and very effective Item Analysis (i.e., summated) scale.
The cumulative approach to scale construction ranks items based on the degree to which they represent a certain position held about the item of measurement. In particular, the Guttman scalogram attempts to measure unidimensionality, that is, if the responses fall into a pattern of the most extreme position also including endorsements of all positions that are less extreme.
The factor scale is based on the correlation between items and the common factors that they share. Factor scales deal with the problem of there being more than one dimension to an attitude toward an item and the fact that there may be more dimensions not yet known. The appropriateness of each of the five methods of scale construction depends on the research objective and the type of measurement instrument, etc. Scale construction through the arbitrary and item analysis methods may be less expensive and completely adequate for some topics. Scale construction through the consensus, cumulative, and factor scaling methods are more time consuming, expensive, and would be more appropriate for measurements involving complex judgments.
The impact of the scale construction technique chosen on the scaling design could be as important as the information that is hoped to be derived from the research. The reason is that the expense of scale construction may be too costly or time consuming, or the scale construction method may not fit the measurement of the judgments being made by the respondent. Therefore, the selection of the scale construction technique is important.
Probability Sampling and Nonprobability Sampling
A probability sample is necessary when a true cross section of the population is needed to properly achieve research objectives. It is most likely that the sampling phase of a project requiring a probability sample would need to be funded because of the expense in pursuing the need to know the population members from which to draw, the personnel involved, and the larger size of the sample to produce a sample with the desired degree of confidence.
A nonprobability sample is sufficient when it is not necessary to comprehend a single item represented in the whole population. Clearly, if the researcher is seeking only to gain a “feel” for the level of presence of certain items within a population gathered with a nonprobability sampling technique such as the judgment or quota sampling methods, the sampling procedure will be less expensive than a probability sampling and will probably suffice. A good example of this would be conducting sampling for exploratory research.
Random, Cluster, and Stratified Samples
A simple random sample is most appropriate when a list of the population elements is known and can be easily randomized. The simplicity with which the sampling procedure can be established and executed is a major advantage. A cluster sampling is most appropriate when the expense of a simple random sampling exceeds the budget and clusters that are internally heterogeneous and externally homogenous can be identified. In other words, obtaining a list of population elements that are naturally grouped into heterogeneous clusters that can be sampled may be easiest. A stratified sample is appropriate if a complete population list that would facilitate a simple random sampling is unavailable, and preferable, if the population can be stratified on the primary variable that is being studied. A stratified sample can also improve statistical efficiency, if it results in the elements within the stratum being more alike each other (i.e., homogeneous) and different from the other stratum (i.e., heterogeneous).
Finite Population Adjustment Factor
The finite population adjustment factor applies when the sample size is five percent or more of the total population and it can be used to reduce the required size of a sample to produce a desired level of precision (i.e., confidence). If the size of the sample is a budget concern, then adjusting the size of the sample with the finite population adjustment factor may be appropriate.
Disproportionate Stratified Probability Sample
The statistical efficiency of the entire sample can sometimes be increased if a larger sample is taken within one of the strata. This may be a good idea if the stratum is larger, more variable internally, and if the whole process of sampling is less expensive within the stratum.
Reference
Cooper, D.R., & Schindler, P.S. (2003). Business research methods, (8th ed.). Boston, MA: McGraw Hill.
Data Reliability Engineering
6 months ago
No comments:
Post a Comment