Cross Tabulation and Chi-square

The Cross
Tabulation command displays the joint distribution of two or more
variables (*multivariate frequency distribution*) in a matrix format and thus
allows to compare the relationship between the variables. Cross tabulation
table (also known as contingency or crosstab table) is generated for each
distinct value of a layer variable (optional) and contains counts and
percentages. Chi-square test is used to check if the results of a cross
tabulation are statistically significant. To run the chi-square test for
tabulated data run the Chi-square test (summarized
data) command from the Nonparametric Statistics menu (v6.4+).

# How To

Run the Statistics → Basic Statistics → Cross Tabulation and Chi-square command.

Select a row variable (containing the categories that define the rows of the table) and a column variable (containing the categories that define the columns of the table).

Optionally, select a frequency variable. Frequency variable specifies the number of observations that each row represents. When omitted, each row represents a single observation.

Optionally, select a layer variable. Layer variable distinct levels (values) cause separate tables generated. The layer variable is also called the break variable, control variable or filter variable.

Optionally, in the advanced options select the Print tables option value. This option allows to choose which tables are printed. Chi-square test summary and three tables (observed frequencies, expected frequencies and chi-squared values) are printed with any of these options. Available options are listed below.

o None: No additional tables are printed.

o Combined frequency table: Contingency table (combined frequency table) with counts and cell percentages is printed

o Separate percentage tables: Marginal proportion tables (row proportions, column proportions) and proportion of total table are printed in place of combined frequency table.

o All: Three proportion tables and the contingency table are printed.

**Casewise**
deletion is used for missing values removal.

# Results

In a two-way frequency table entries
are frequency counts. Entries in the "*Total*" row and "*Total*"
column are called marginal totals.

Observed Frequencies table

Observed frequency is the number of times that a particular combination
of categories occurred.

Expected Frequencies table

Expected frequency is the number of observations that would be expected for
a particular combination of categories if the null hypothesis were true (combination
were to occur by chance). The formula for expected frequency in the *i ^{th}*
row and

*j*column is:

^{th}where
is
the total in the i^{th} row, is
the total in the j^{th} column and *N* is the table grand total.

Cross-tab table

Table entries consist of frequency, row and column percentages, and
total percentage (denominator is the total number of observations in the table).

Chi-square test summary

Chi-square statistic is a measure of how close the observed
frequencies are to the expected frequencies. It is defined as ,
where *O* is an observed frequency, *E* is an expected frequency, sum
is across all cells.

d.f. – degrees of freedom. The
number of degrees of freedom is defined as: ,
where *r* is the number of rows and *c* is the number of columns.

If the p-level (p-level > X)
is less than selected (0.05)
the test is significant and null hypothesis is rejected, and it can be
concluded that there is an association (dependence) between the row variable
and the column variable. Null hypothesis H_{0}
states that the row and column variables are independent.

# References

Bishop, Y. M. M., Fienberg, S. E., & Holland, P. W. (2007). Discrete multivariate analysis: Theory and practice. New York, NY: Springer-Verlag (Original work published in 1975).