Statistical association
(or correlation)
-
measures the strength (or weakness) of the
relationship between two variables
- for scatter plots in this course, we will always
use the Pearson's r correlation coefficient
(can vary between -1 to 0 to +1)
- for cross tabulations, we will always use the
Cramer's V correlation coefficient (can vary
between 0 to +1)
-
a positive (or direct)
correlation indicates that as the values of one variable
increase, so do the values of the other variable
-
a negative (or inverse)
correlation indicates that as the values of one variable increase, the values of the other variable
decrease
-
the closer either correlation
coefficient is to zero, the weaker the relationship
-
in social-science research, the following
classification of both correlation coefficients is commonly used:
- statistical association is not necessarily the same
thing as causation
-
i.e., just because there is present a
weak, moderate, or strong level of statistical association between
two variables does not
necessarily mean that changes in one variable cause changes
observed in the other variable
-
statistical association just means that
the values of one variable change consistently with changes in the
values of the other variable
-
it may be a third variable that
accounts for (or causes) the changes in the two variables that you are
measuring
Statistical significance
-
measures the probability of
random-sampling error in survey data
-
random-sampling error causes the observed
patterns that we see in sample data to be unrepresentative
of the actual pattern that we would see if we had looked at the whole
population from which the sample was randomly drawn
-
the good news with random sampling is
that we can accurately estimate population patterns by looking
at only a relatively small sample randomly drawn from a much
larger population
-
the bad news is that any randomly-drawn
sample can be unrepresentative of the population from which it is drawn
-
however, the good news is that the
probability of having an unrepresentative sample can be calculated (this
is what the Chi-square statistic tells us)
-
in this course, we will use the
Chi-square statistic to determine the probability of
random-sampling error
-
in social-science research, the following
classification of Chi-square probability is commonly used:
-
if the Chi-square probability of
random-sample error is less than 0.05, then the sample
results are assumed to be statistically significant
(i.e., there is less than a 5% chance that our observed sample results
are unrepresentative of the population results)
- if the Chi-square probability of random-sample error is
equal to or greater than 0.05, then the sample results are
assumed to be statistically not significant
(i.e., there is a 5% chance or more that our observed sample results are
misleading us)
- the Chi-square probability is reported one of two ways:
- sometimes an asterisk is used
- if there is an asterisk next to the correlation coefficient,
then the sample results are considered statistically significant
- if there is no asterisk, then the sample results are considered
statistically not significant
- or, sometimes the actual Chi-square probability is given
- if the Chi-square probability is less than 0.05,
then the sample results are considered statistically
significant,
- if the Chi-square probability is equal to or greater
than 0.05, then the sample results are considered
statistically not significant,
- statistical significance is not the same thing
as substantive significance:
- for example, national-sample surveys repeatedly show that there is
no statistically significant difference between
Protestants (as a group) and Catholics (as a group) in their attitudes
toward legalizing abortion
- as we can see in the 2004 NES survey results in the table below,
roughly the same proportion of Catholics as Protestants support
legalizing abortion);
- the relationship is weak (the Cramer's V is less than 0.10)
- and any sample differences are not statistically significant
(there is no asterisk, indicating that the Chi-square probability of
random sampling error is equal to or greater than 5% or 0.05 -- the
Chi-square probability for this cross tabulation is actually 0.06)
- such a finding (a weak correlation that is not statistically
significant) is nevertheless substantively
significant, given the strong anti-abortion stand taken by the
Catholic Church's leadership.
|