# Chi-square

Published:

This post explains Chi-square Test.

# Chi-Square Statistic

$\tilde{\chi}^2 = \sum_{i=1}^{n} \frac{(O_i - E_i)^2}{E_i}$

• Two types of chi-square tests. Both use the chi-square statistic and distribution for different purposes:
• A chi-square goodness of fit test determines if sample data matches a population.
• A chi-square test for independence compares two variables in a contingency table to see if they are related. In a more general sense, it tests to see whether distributions of categorical variables differ from each another.
• A very small chi square test statistic means that your observed data fits your expected data extremely well. In other words, there is a relationship.
• A very large chi square test statistic means that the data does not fit very well. In other words, there isn’t a relationship.

# Example

• 256 visual artists were surveyed to find out their zodiac sign. The results were: Aries (29), Taurus (24), Gemini (22), Cancer (19), Leo (21), Virgo (18), Libra (19), Scorpio (20), Sagittarius (23), Capricorn (18), Aquarius (20), Pisces (23).

• Test the hypothesis that zodiac signs are evenly distributed across visual artists.

• categories = ['Aries', 'Taurus', 'Gemini', 'Cancer', 'Leo', 'Virgo', 'Libra', 'Scorpio', 'Sagittarius', 'Capricorn', 'Aquarius', 'Pisces']
observed = [29, 24, 22, 19, 21, 18, 19, 20, 23, 18, 20, 23]

mean = np.mean(observed)
expected = [mean for i in range(len(observed))]

component = [(obs-exp)**2/exp for obs,exp in zip(observed, expected)]

chi_square_statistic = sum(component)
print(f'chi_square_statistic = {chi_square_statistic:.3f}') # 5.094

df = len(observed) - 1
print(df)

# df = 11 and statistic = 5.094
# https://www.statisticshowto.com/tables/chi-squared-table-right-tail/
# p_value = between .90 and .95
p_value = 1 - stats.chi2.cdf(chi_square_statistic , df)
print(f'p_value = {p_value:.3f}') # 0.927

• chi_square_statistic, p_value = stats.chisquare(observed)
print(f'chi_square_statistic={chi_square_statistic:.3f}, p_value={p_value:.3f}') # chi_square_statistic=5.094, p_value=0.927

• p-value = 0.927

• Fail to reject the Null since p-value is very large in comparison to 0.01-0.05 (1%-5%)

Tags: