# Chi-square

Published:

This post explains Chi-square Test.

# Chi-Square Statistic

$\tilde{\chi}^2 = \sum_{i=1}^{n} \frac{(O_i - E_i)^2}{E_i}$

• Two types of chi-square tests. Both use the chi-square statistic and distribution for different purposes:
• A chi-square goodness of fit test determines if sample data matches a population.
• A chi-square test for independence compares two variables in a contingency table to see if they are related. In a more general sense, it tests to see whether distributions of categorical variables differ from each another.
• A very small chi square test statistic means that your observed data fits your expected data extremely well. In other words, there is a relationship.
• A very large chi square test statistic means that the data does not fit very well. In other words, there isn’t a relationship.

# Example

• 256 visual artists were surveyed to find out their zodiac sign. The results were: Aries (29), Taurus (24), Gemini (22), Cancer (19), Leo (21), Virgo (18), Libra (19), Scorpio (20), Sagittarius (23), Capricorn (18), Aquarius (20), Pisces (23).

• Test the hypothesis that zodiac signs are evenly distributed across visual artists.

• categories = ['Aries', 'Taurus', 'Gemini', 'Cancer', 'Leo', 'Virgo', 'Libra', 'Scorpio', 'Sagittarius', 'Capricorn', 'Aquarius', 'Pisces']
observed = [29, 24, 22, 19, 21, 18, 19, 20, 23, 18, 20, 23]

mean = np.mean(observed) # 21.33
expected = [mean] * len(categories) # repeat

component = []
for obs,exp in zip(observed, expected):
c = (obs-exp)**2/exp
component.append(c)

chi_square_statistic = sum(component)
print(f'chi_square_statistic = {chi_square_statistic:.3f}') # 5.094

df = len(observed) - 1
print(df) # 11

# df = 11 and statistic = 5.094
# https://www.statisticshowto.com/tables/chi-squared-table-right-tail/
# p_value = between .90 and .95

p_value = 1 - stats.chi2.cdf(chi_square_statistic , df)
print(f'p_value = {p_value:.3f}') # 0.927

• from scipy import stats

# just pass the observed data
chi_square_statistic, p_value = stats.chisquare(observed)

print(f'chi_square_statistic={chi_square_statistic:.3f}, p_value={p_value:.3f}')
# chi_square_statistic=5.094, p_value=0.927

• p-value = 0.927

• Fail to reject the Null since p-value is very large in comparison to 0.01-0.05 (1%-5%)

Tags: