|
Purpose
By entering
frequencies into a 2 x 2 table, you can calculate various statistics to evaluate
the relationship between two dichotomous variables. Thus, the 2 x 2 option can
be used as an alternative to correlation when the two variables of interest are
dichotomous.
Preparations
Select a 2x2 cells
range and run Statistics→Nonparametric
Statistics →2x2 Tables command.
Results
The Pearson Chi-square is the most common test for significance of the
relationship between categorical variables. This measure is based on the fact
that we can compute the expected frequencies in a two-way table (i.e.,
frequencies that we would expect if there was no relationship between the
variables). For example, suppose we ask 20 males and 20 females to choose
between two brands of soda pop (brands A and B). If there is no relationship
between preference and gender, then we would expect about an equal number of
choices of brand A and brand B for each sex. The Chi-square test becomes
increasingly significant as the numbers deviate further from this expected
pattern; that is, the more this pattern of choices for males and females
differs.
The value of the Chi-square and its significance level
depends on the overall number of observations and the number of cells in the
table. Consistent with the principles discussed in Elementary concepts,
relatively small deviations of the relative frequencies across cells from the
expected pattern will prove significant if the number of observations is large.
The only assumption underlying the use of the Chi-square
(other than random selection of the sample) is that the expected frequencies are
not very small. The reason is that the Chi-square inherently tests the
underlying probabilities in each cell; and when the expected cell frequencies
fall, for example, below 5, those probabilities cannot be estimated with
sufficient precision. For further discussion of this issue refer to Everitt
(1977), Hays (1988), or Kendall and Stuart (1979).
Yates corrected Chi-square. The
approximation of the Chi-square
statistic in small 2 x 2 tables can be improved by reducing the absolute value
of differences between expected and observed frequencies by 0.5 before squaring
(Yates' correction). This correction, which makes the estimation more
conservative, is usually applied when the table contains only small observed
frequencies, so that some expected frequencies become less than 10 (for further
discussion of this correction, see Conover, 1974; Everitt, 1977; Hays, 1988;
Kendall & Stuart, 1979; and Mantel, 1974).
Phi-square.
The Phi-square
is a measure of correlation between the two categorical variables in the table.
Fisher exact test. Given the marginal
frequencies in the table, and assuming that in the population the two factors in
the table are not related, how likely is it to obtain cell frequencies as uneven
or worse than the ones that were observed? For small n, this probability can be
computed exactly by counting all possible tables that can be constructed based
on the marginal frequencies. This is the underlying rationale for the Fisher
exact test. It computes the exact probability under the null hypothesis of
obtaining the current distribution of frequencies across cells, or one that is
more uneven.
|