The Chi-squared test can be used for the following:
- To test the hypothesis that for one classification table (e.g. gender), all classification levels have the same frequency.
- To test the relationship between two classification factors (e.g. gender and profession).
How to enter data
In the following example we have two categorical variables. For the variable OUTCOME a code 1 is entered for a positive outcome and a code 0 for a negative outcome. For the variable SMOKING a code 1 is used for the subjects that smoke, and a code 0 for the subjects that do not smoke. The data of each case is entered on one row of the spreadsheet.
In the Chi-squared test dialog box, one or two discrete variables with the classification data must be identified. Classification data may either be numeric or alphanumeric (string) values. If required, you can convert a continuous variable into a discrete variable using the IF function (see elsewhere).
After you have completed the dialog box, click OK to obtain the frequency table with the relevant statistics.
When you select the option Show all percentages in the results window, all percentages are shown in the table as follows:
In this example the number 42 in the upper left cell (for both Codes X and Coded Y equal to 0) is 67.7% of the row total of 62 cases; 75% of the column total of 56 cases and 42% of the grand total of 100 cases.
Single classification factor
When you want to test the hypothesis that for one single classification table (e.g. gender), all classification levels have the same frequency, then identify only one discrete variable in the dialog form. In this case the null hypothesis is that all classification levels have the same frequency. If the calculated P-value is low (P<0.05), then you reject the null hypothesis and the alternative hypothesis that there is a significant difference between the frequencies of the different classification levels must be accepted.
In a single classification table the mode of the observations is the most common observation or category (the observation with the highest frequency). A unimodal distribution has one mode; a bimodal distribution, two modes.
Two classification factors
When you want to study the relationship between two classification factors (e.g. gender and profession), then identify the two discrete variables in the dialog form. In this case the null hypothesis is that the two factors are independent. If the calculated P-value is low (P<0.05), then the null hypothesis is rejected and you accept the alternative hypothesis that there is a relation between the two factors.
Note that when the degrees of freedom is equal to 1, e.g. in case of a 2x2 table, MedCalc uses Yates' correction for continuity.
Chi-squared test for trend
If the table has two columns and three or more rows (or two rows and three or more columns), and the categories can be quantified, MedCalc will also perform the Chi-squared test for trend. The Cochran-Armitage test for trend (Armitage, 1955) tests whether there is a linear trend between row (or column) number and the fraction of subjects in the left column (or top row). The Cochran-Armitage test for trend provides a more powerful test than the unordered independence test above.
If there is no meaningful order in the row (or column) categories, then you should ignore this calculation.
Analysis of 2x2 table
- When the number of expected frequencies in the 2x2 table is low (in case the total number of observations is less than 20), the table should be tested using Fisher's exact test;
- When the two classification factors are not independent, or when you want to test the difference between proportions in related or paired observations (e.g. in studies in which patients serve as their own control), you must use the McNemar test.
- Altman DG (1991) Practical statistics for medical research. London: Chapman and Hall.
- Armitage P (1955) Tests for linear trends in proportions and frequencies. Biometrics 11:375-386.