Chi-square
Published:
This post explains Chi-square Test.
Chi-Square Statistic
$ \tilde{\chi}^2 = \sum_{i=1}^{n} \frac{(O_i - E_i)^2}{E_i} $
- Two types of chi-square tests. Both use the chi-square statistic and distribution for different purposes:
- A chi-square goodness of fit test determines if sample data matches a population.
- A chi-square test for independence compares two variables in a contingency table to see if they are related. In a more general sense, it tests to see whether distributions of categorical variables differ from each another.
- A very small chi square test statistic means that your observed data fits your expected data extremely well. In other words, there is a relationship.
- A very large chi square test statistic means that the data does not fit very well. In other words, there isn’t a relationship.
Example
256 visual artists were surveyed to find out their zodiac sign. The results were: Aries (29), Taurus (24), Gemini (22), Cancer (19), Leo (21), Virgo (18), Libra (19), Scorpio (20), Sagittarius (23), Capricorn (18), Aquarius (20), Pisces (23).
Test the hypothesis that zodiac signs are evenly distributed across visual artists.
categories = ['Aries', 'Taurus', 'Gemini', 'Cancer', 'Leo', 'Virgo', 'Libra', 'Scorpio', 'Sagittarius', 'Capricorn', 'Aquarius', 'Pisces'] observed = [29, 24, 22, 19, 21, 18, 19, 20, 23, 18, 20, 23] mean = np.mean(observed) # 21.33 expected = [mean] * len(categories) # repeat component = [] for obs,exp in zip(observed, expected): c = (obs-exp)**2/exp component.append(c) chi_square_statistic = sum(component) print(f'chi_square_statistic = {chi_square_statistic:.3f}') # 5.094 df = len(observed) - 1 print(df) # 11 # df = 11 and statistic = 5.094 # https://www.statisticshowto.com/tables/chi-squared-table-right-tail/ # p_value = between .90 and .95 p_value = 1 - stats.chi2.cdf(chi_square_statistic , df) print(f'p_value = {p_value:.3f}') # 0.927
from scipy import stats # just pass the observed data chi_square_statistic, p_value = stats.chisquare(observed) print(f'chi_square_statistic={chi_square_statistic:.3f}, p_value={p_value:.3f}') # chi_square_statistic=5.094, p_value=0.927
p-value = 0.927
- Fail to reject the Null since p-value is very large in comparison to 0.01-0.05 (1%-5%)