Chi-Square

4 minute read

Published:

This post explains Chi-square Test.

Chapter 19 Witte

Chi-Square (χ2) Test for Qualitative (Nominal) Data

  • When data are qualitative with nominal measurement (not ordinal - ordered)
  • The chi-square test focuses on any discrepancies between these observed frequencies and the corresponding set of expected frequencies, which are derived from the null hypothesis.
  • One-variable χ2 - Goodness of fit
    • When data are distributed along a single qualitative variable, the one-variable χ2 test evaluates these discrepancies as a test for “goodness of fit.”
  • Two-variable χ2 - Test for independence
    • When data are cross-classified along two qualitative variables, the two-variable χ2 test evaluates these discrepancies as a “test of independence” or a lack of predictability between the two qualitative variables.

χ2=ni=1(OiEi)2Ei

One-variable χ2 - Goodness of fit

Example - Blood Types

  • Your blood belongs to one of four genetically determined types: O, A, B, or AB.
  • A bulletin issued by a large blood bank claims that these four blood types are distributed according to the following proportions:
    • PO=0.44; PA=0.41; PB=0.10, PAB=0.05
    • Values of population proportions always must sum to 1.00.
  • Let’s treat this claim as a null hypothesis to be tested with a random sample of 100 students from a large university
FrequencyOABABTotal
Oi (Sample)3838204100
Ei (as per claim)4441105100
  • Evaluating Discrepancies
    • The crucial question is whether the discrepancies between observed and expected frequencies are small enough to be regarded as a common outcome, given that the null hypothesis is true. If so, the null hypothesis is retained.
    • Otherwise, if the discrepancies are large enough to qualify as a rare outcome, the null hypothesis is rejected.
  • χ2=ni=1(OiEi)2Ei=11.24
  • df=3,χ2=11.24
  • https://www.statisticshowto.com/tables/chi-squared-table-right-tail/
  • p=0.01<0.05

  • Reject the Null
  • Distribution of blood types in the student population differs

Example - Zodiac sign

  • 256 visual artists were surveyed to find out their zodiac signs.

  • The results were: Aries (29), Taurus (24), Gemini (22), Cancer (19), Leo (21), Virgo (18), Libra (19), Scorpio (20), Sagittarius (23), Capricorn (18), Aquarius (20), Pisces (23).

  • Test the hypothesis that zodiac signs are evenly distributed across visual artists.

    • Categories = [‘Aries’, ‘Taurus’, ‘Gemini’, ‘Cancer’, ‘Leo’, ‘Virgo’, ‘Libra’, ‘Scorpio’, ‘Sagittarius’, ‘Capricorn’, ‘Aquarius’, ‘Pisces’]
    • Observed = Oi=[29,24,22,19,21,18,19,20,23,18,20,23]
    • Xobs=21.33
    • Expected = Ei=[21.33,21.33,21.33,21.33,21.33,21.33,21.33,21.33,21.33,21.33,21.33,21.33]
    • χ2=ni=1(OiEi)2Ei=5.094
    • df=11,χ2=5.094
    • https://www.statisticshowto.com/tables/chi-squared-table-right-tail/
    • p=0.92>0.05 (between .90 and .95)

    • Fail to reject the Null
    • Zodiac signs are evenly distributed across visual artists.

Two-variable χ2 - Test for independence

StudentsMajor 1Major 2Major 3Total
Male393051120
Female21401980
Total607070200
  • Hypothesis
    • Ho:Major and Gender have similar distribution in all majors i.e. no relationship i.e. are independent
    • HA:Ho:Major and Gender are not independent
  • These are Observed Frequencies
  • Expected Frequencies
    • Total students = 200
      • 120 are Male and 80 are Female
      • 120/200 are Male and 80/200 are Female
      • 0.6 are Male and 0.4 are Female
      • In all majors we should have this proportion
        • In Major 1 there are 60 students so these must be in proportion of 0.6:0.4
          • 36 Male and 24 Female
        • In Major 2 there are 70 students so these must be in proportion of 0.6:0.4
          • 42 Male: 28 Female
        • In Major 3 there are 70 students so these must be in proportion of 0.6:0.4
          • 42 Male: 28 Female
StudentsMajor 1Major 2Major 3Total
Male Oi393051120
Male Ei60120200=3670120200=4270120200=42 
Female Oi21401980
Female Ei6080200=247080200=287080200=28 
Total607070200
  • Expected Frequency=Row TotalCol TotalGrand Total
  • χ2=ni=1(OiEi)2Ei
  • χ2=(3936)236+(3042)242+(5142)242+(2124)224+(4028)228+(1928)228
  • χ2=0.25+3.43+1.93+0.38+5.14+2.89=14.02
  • Degree of Freedom
    • df=(c1)(r1)
      • c is number of columns and r is number of rows
    • df=(31)(21)=21=2
  • https://naneja.github.io/files/statistics/tables.pdf
    • df=2;χ2=14.02
    • p=0.001<0.05
    • Reject the Null
      • There is relationship between Gender and Major