# Chi-Square

Published:

This post explains Chi-square Test.

Chapter 19 Witte

# Chi-Square ($\chi^2$) Test for Qualitative (Nominal) Data

• When data are qualitative with nominal measurement (not ordinal - ordered)
• The chi-square test focuses on any discrepancies between these observed frequencies and the corresponding set of expected frequencies, which are derived from the null hypothesis.
• One-variable $\chi^2$ - Goodness of fit
• When data are distributed along a single qualitative variable, the one-variable $\chi^2$ test evaluates these discrepancies as a test for “goodness of fit.”
• Two-variable $\chi^2$ - Test for independence
• When data are cross-classified along two qualitative variables, the two-variable $\chi^2$ test evaluates these discrepancies as a “test of independence” or a lack of predictability between the two qualitative variables.

$\chi^2 = \sum_{i=1}^{n} \frac{(O_i - E_i)^2}{E_i}$

## One-variable $\chi^2$ - Goodness of fit

### Example - Blood Types

• Your blood belongs to one of four genetically determined types: O, A, B, or AB.
• A bulletin issued by a large blood bank claims that these four blood types are distributed according to the following proportions:
• $P_O = 0.44;~P_A = 0.41;~P_B = 0.10,~P_{AB} = 0.05$
• Values of population proportions always must sum to $1.00$.
• Let’s treat this claim as a null hypothesis to be tested with a random sample of 100 students from a large university
FrequencyOABABTotal
$O_i$ (Sample)3838204100
$E_i$ (as per claim)4441105100
• Evaluating Discrepancies
• The crucial question is whether the discrepancies between observed and expected frequencies are small enough to be regarded as a common outcome, given that the null hypothesis is true. If so, the null hypothesis is retained.
• Otherwise, if the discrepancies are large enough to qualify as a rare outcome, the null hypothesis is rejected.
• $\chi^2 = \sum_{i=1}^{n} \frac{(O_i - E_i)^2}{E_i} = 11.24$
• $df = 3, \chi^2 = 11.24$
• https://www.statisticshowto.com/tables/chi-squared-table-right-tail/
• $p = 0.01 < 0.05$

• Reject the Null
• Distribution of blood types in the student population differs

### Example - Zodiac sign

• 256 visual artists were surveyed to find out their zodiac signs.

• The results were: Aries (29), Taurus (24), Gemini (22), Cancer (19), Leo (21), Virgo (18), Libra (19), Scorpio (20), Sagittarius (23), Capricorn (18), Aquarius (20), Pisces (23).

• Test the hypothesis that zodiac signs are evenly distributed across visual artists.

• Categories = [‘Aries’, ‘Taurus’, ‘Gemini’, ‘Cancer’, ‘Leo’, ‘Virgo’, ‘Libra’, ‘Scorpio’, ‘Sagittarius’, ‘Capricorn’, ‘Aquarius’, ‘Pisces’]
• Observed = $O_i = [29, 24, 22, 19, 21, 18, 19, 20, 23, 18, 20, 23]$
• $X_{obs} = 21.33$
• Expected = $E_i = [21.33, 21.33, 21.33, 21.33, 21.33, 21.33, 21.33, 21.33, 21.33, 21.33, 21.33, 21.33]$
• $\chi^2 = \sum_{i=1}^{n} \frac{(O_i - E_i)^2}{E_i} = 5.094$
• $df = 11, \chi^2 = 5.094$
• https://www.statisticshowto.com/tables/chi-squared-table-right-tail/
• $p = 0.92 > 0.05$ (between .90 and .95)

• Fail to reject the Null
• Zodiac signs are evenly distributed across visual artists.

## Two-variable $\chi^2$ - Test for independence

StudentsMajor 1Major 2Major 3Total
Male393051120
Female21401980
Total607070200
• Hypothesis
• $H_o: \text{Major and Gender have similar distribution in all majors i.e. no relationship i.e. are independent}$
• $H_A: H_o: \text{Major and Gender are not independent}$
• These are Observed Frequencies
• Expected Frequencies
• Total students = 200
• 120 are Male and 80 are Female
• 120/200 are Male and 80/200 are Female
• 0.6 are Male and 0.4 are Female
• In all majors we should have this proportion
• In Major 1 there are 60 students so these must be in proportion of 0.6:0.4
• 36 Male and 24 Female
• In Major 2 there are 70 students so these must be in proportion of 0.6:0.4
• 42 Male: 28 Female
• In Major 3 there are 70 students so these must be in proportion of 0.6:0.4
• 42 Male: 28 Female
StudentsMajor 1Major 2Major 3Total
Male $O_i$393051120
Male $E_i$$60 * \frac{120}{200} = 36$$ 70 * \frac{120}{200} = 42$$70 * \frac{120}{200} = 42 Female O_i21401980 Female E_i$$ 60 * \frac{80}{200} = 24$$70 * \frac{80}{200} = 28$$ 70 * \frac{80}{200} = 28$
Total607070200
• $\text{Expected Frequency} = \frac{\text{Row Total}* \text{Col Total}}{\text{Grand Total}}$
• $\chi^2 = \sum_{i=1}^{n} \frac{(O_i - E_i)^2}{E_i}$
• $\chi^2 = \frac{(39-36)^2}{36} + \frac{(30-42)^2}{42} + \frac{(51-42)^2}{42} + \frac{(21-24)^2}{24} + \frac{(40-28)^2}{28} + \frac{(19-28)^2}{28}$
• $\chi^2 = 0.25 + 3.43 + 1.93 + 0.38 + 5.14 + 2.89 = 14.02$
• Degree of Freedom
• $df = (c-1)(r-1)$
• $c$ is number of columns and $r$ is number of rows
• $df = (3-1)(2-1) = 2 * 1 = 2$
• https://naneja.github.io/files/statistics/tables.pdf
• $df = 2; \chi^2 = 14.02$
• $p = 0.001 < 0.05$
• Reject the Null
• There is relationship between Gender and Major

Tags: