Chi-Square

4 minute read

Published: April 16, 2021

This post explains Chi-square Test.

Chapter 19 Witte

Chi-Square ($\chi^2$) Test for Qualitative (Nominal) Data

When data are qualitative with nominal measurement (not ordinal - ordered)
The chi-square test focuses on any discrepancies between these observed frequencies and the corresponding set of expected frequencies, which are derived from the null hypothesis.
One-variable $\chi^2$ - Goodness of fit
- When data are distributed along a single qualitative variable, the one-variable $\chi^2$ test evaluates these discrepancies as a test for “goodness of fit.”
Two-variable $\chi^2$ - Test for independence
- When data are cross-classified along two qualitative variables, the two-variable $\chi^2$ test evaluates these discrepancies as a “test of independence” or a lack of predictability between the two qualitative variables.

$ \chi^2 = \sum_{i=1}^{n} \frac{(O_i - E_i)^2}{E_i} $

Your blood belongs to one of four genetically determined types: O, A, B, or AB.
A bulletin issued by a large blood bank claims that these four blood types are distributed according to the following proportions:
- $P_O = 0.44;~P_A = 0.41;~P_B = 0.10,~P_{AB} = 0.05 $
- Values of population proportions always must sum to $1.00$.
Let’s treat this claim as a null hypothesis to be tested with a random sample of 100 students from a large university

Frequency	O	A	B	AB	Total
$O_i$ (Sample)	38	38	20	4	100
$E_i$ (as per claim)	44	41	10	5	100

Evaluating Discrepancies
- The crucial question is whether the discrepancies between observed and expected frequencies are small enough to be regarded as a common outcome, given that the null hypothesis is true. If so, the null hypothesis is retained.
- Otherwise, if the discrepancies are large enough to qualify as a rare outcome, the null hypothesis is rejected.
$ \chi^2 = \sum_{i=1}^{n} \frac{(O_i - E_i)^2}{E_i} = 11.24 $
$df = 3, \chi^2 = 11.24$
https://www.statisticshowto.com/tables/chi-squared-table-right-tail/
$ p = 0.01 < 0.05 $
Reject the Null
Distribution of blood types in the student population differs

256 visual artists were surveyed to find out their zodiac signs.
The results were: Aries (29), Taurus (24), Gemini (22), Cancer (19), Leo (21), Virgo (18), Libra (19), Scorpio (20), Sagittarius (23), Capricorn (18), Aquarius (20), Pisces (23).
Test the hypothesis that zodiac signs are evenly distributed across visual artists.
- Categories = [‘Aries’, ‘Taurus’, ‘Gemini’, ‘Cancer’, ‘Leo’, ‘Virgo’, ‘Libra’, ‘Scorpio’, ‘Sagittarius’, ‘Capricorn’, ‘Aquarius’, ‘Pisces’]
- Observed = $O_i = [29, 24, 22, 19, 21, 18, 19, 20, 23, 18, 20, 23]$
- $X_{obs} = 21.33$
- Expected = $ E_i = [21.33, 21.33, 21.33, 21.33, 21.33, 21.33, 21.33, 21.33, 21.33, 21.33, 21.33, 21.33] $
- $ \chi^2 = \sum_{i=1}^{n} \frac{(O_i - E_i)^2}{E_i} = 5.094 $
- $df = 11, \chi^2 = 5.094$
- https://www.statisticshowto.com/tables/chi-squared-table-right-tail/
- $ p = 0.92 > 0.05 $ (between .90 and .95)
- Fail to reject the Null
- Zodiac signs are evenly distributed across visual artists.

Students	Major 1	Major 2	Major 3	Total
Male	39	30	51	120
Female	21	40	19	80
Total	60	70	70	200

Hypothesis
- $ H_o: \text{Major and Gender have similar distribution in all majors i.e. no relationship i.e. are independent} $
- $H_A: H_o: \text{Major and Gender are not independent}$
These are Observed Frequencies
Expected Frequencies
- Total students = 200
  - 120 are Male and 80 are Female
  - 120/200 are Male and 80/200 are Female
  - 0.6 are Male and 0.4 are Female
  - In all majors we should have this proportion
    - In Major 1 there are 60 students so these must be in proportion of 0.6:0.4
      - 36 Male and 24 Female
    - In Major 2 there are 70 students so these must be in proportion of 0.6:0.4
      - 42 Male: 28 Female
    - In Major 3 there are 70 students so these must be in proportion of 0.6:0.4
      - 42 Male: 28 Female

Students	Major 1	Major 2	Major 3	Total
Male $O_i$	39	30	51	120
Male $E_i$	$ 60 * \frac{120}{200} = 36$	$ 70 * \frac{120}{200} = 42$	$ 70 * \frac{120}{200} = 42$
Female $O_i$	21	40	19	80
Female $E_i$	$ 60 * \frac{80}{200} = 24$	$ 70 * \frac{80}{200} = 28$	$ 70 * \frac{80}{200} = 28$
Total	60	70	70	200

$ \text{Expected Frequency} = \frac{\text{Row Total}* \text{Col Total}}{\text{Grand Total}} $
$ \chi^2 = \sum_{i=1}^{n} \frac{(O_i - E_i)^2}{E_i} $
$ \chi^2 = \frac{(39-36)^2}{36} + \frac{(30-42)^2}{42} + \frac{(51-42)^2}{42} + \frac{(21-24)^2}{24} + \frac{(40-28)^2}{28} + \frac{(19-28)^2}{28}$
$ \chi^2 = 0.25 + 3.43 + 1.93 + 0.38 + 5.14 + 2.89 = 14.02$
Degree of Freedom
- $df = (c-1)(r-1)$
  - $c$ is number of columns and $r$ is number of rows
- $df = (3-1)(2-1) = 2 * 1 = 2$
https://naneja.github.io/files/statistics/tables.pdf
- $df = 2; \chi^2 = 14.02$
- $p = 0.001 < 0.05$
- Reject the Null
  - There is relationship between Gender and Major