Cross Tabulation and Chi-square

The Cross Tabulation command displays the joint distribution of two or more variables (multivariate frequency distribution) in a matrix format and thus allows to compare the relationship between the variables. Cross tabulation table (also known as contingency or crosstab table) is generated for each distinct value of a layer variable (optional) and contains counts and percentages. Chi-square test is used to check if the results of a cross tabulation are statistically significant. To run the chi-square test for tabulated data run the Chi-square test (summarized data) command from the Nonparametric Statistics menu (v6.4+).

How To

Run the Statistics → Basic Statistics → Cross Tabulation and Chi-square command.

Select a row variable (containing the categories that define the rows of the table) and a column variable (containing the categories that define the columns of the table).

Optionally, select a frequency variable. Frequency variable specifies the number of observations that each row represents. When omitted, each row represents a single observation.

Optionally, select a layer variable. Layer variable distinct levels (values) cause separate tables generated. The layer variable is also called the break variable, control variable or filter variable.

Optionally, in the advanced options select the Print tables option value. This option allows to choose which tables are printed. Chi-square test summary and three tables (observed frequencies, expected frequencies and chi-squared values) are printed with any of these options. Available options are listed below.

o   None: No additional tables are printed.

o   Combined frequency table: Contingency table (combined frequency table) with counts and cell percentages is printed

o   Separate percentage tables: Marginal proportion tables (row proportions, column proportions) and proportion of total table are printed in place of combined frequency table.

o   All: Three proportion tables and the contingency table are printed.

Casewise deletion is used for missing values removal.

Results

In a two-way frequency table entries are frequency counts. Entries in the "Total" row and "Total" column are called marginal totals.

Observed Frequencies table 
Observed frequency is the number of times that a particular combination of categories occurred.


 

Expected Frequencies table
Expected frequency is the number of observations that would be expected for a particular combination of categories if the null hypothesis were true (combination were to occur by chance). The formula for expected frequency in the ith row and jth column is:

where  is the total in the ith row, is the total in the jth column and N is the table grand total.

Cross-tab table
Table entries consist of frequency, row and column percentages, and total percentage (denominator is the total number of observations in the table).


Chi-square test summary

Chi-square statistic is a measure of how close the observed frequencies are to the expected frequencies. It is defined as , where O is an observed frequency, E is an expected frequency, sum is across all cells.

d.f. – degrees of freedom. The number of degrees of freedom is defined as: , where r is the number of rows and c is the number of columns.

If the p-level (p-level > X) is less than selected  (0.05) the test is significant and null hypothesis is rejected, and it can be concluded that there is an association (dependence) between the row variable and the column variable. Null hypothesis H0 states that the row and column variables are independent.

References

Bishop, Y. M. M., Fienberg, S. E., & Holland, P. W. (2007). Discrete multivariate analysis: Theory and practice. New York, NY: Springer-Verlag (Original work published in 1975).