t-Tests
Published:
This post covers t-Tests.
t-Distribution
- Z - test works when we know $\mu$ and $\sigma$
- Use Samples
- How different a sample mean is from a population
- How different two sample means are from each other
- Two samples can be
- Independent
- Dependent
- Two samples can be
- Estimate Population Standard Deviation using sample standard deviation with Bessel’s correction
- Bessel’s correction is the use of $n − 1$ instead of $n$ in the formula for the sample variance and sample standard deviation, where $n$ is the number of observations in a sample.
- This method corrects the bias in the estimation of the population variance.
- It also partially corrects the bias in the estimation of the population standard deviation.
- However, the correction often increases the mean squared error in these estimations.
- This technique is named after Friedrich Bessel.
- To find out how typical or atypical (unusual) a sample mean - find its location on the distribution of sample means i.e. sampling distribution
- we can determine when we know population parameters, $\mu, \sigma$
- $ std~errro= \frac{\sigma}{\sqrt{n}}$
- $z = \frac{sample~mean - \mu}{std~error} = \frac{mean~difference}{std~error}$
- Std for Samples = $S = \sqrt{\frac{\Sigma(X_i - \bar{X})^2}{n-1}}$
- Standard Error depends on sample, we cannot use $\sigma$ if we have sample
- Thus, we have a new distribution that is more prone to error - t-Distribution
- more spread out and thicker in the tails than a normal distribution
- Since large sample sizes gives skinnier sampling distribution
- more spread out and thicker in the tails than a normal distribution
- What happens as n increases?
- The t-Distribution approaches to Normal Distribution
- The t-Distribution gets Skinnier tails
- $S \rightarrow \sigma$
Degree of Freedom - Sample Standard Deviation
- We can pick a sample of size $n$ from population using $n$ degrees of freedom
- Now to compute Standard Deviation, we need sample mean
- $\bar{X} = \frac{X_1+X_2+X_3+…+X_n}{n}$
- $ X_1+X_2+X_3+…+X_n = n . \bar{X} $
- $n-1$ Degrees of Freedom
- We may vary $n-1$ values to keep sum of these values as $n\bar{X}$
- $n-1$ is the effective sample size since only $n-1$ values are independent if we know the mean.
- $S = \sqrt{\frac{\Sigma(X_i - \bar{X})^2}{n-1}}$
- As degrees of freedom increases, the t-distribution better approxiamate the normal distribution
t-Table
Questions
What’s the t-critical value for a one-tailed alpha level of 0.05 with 12 degrees of freedom.
Ans 1.782
p = 0.05 df = 12 # 1-p for right-tailed test value = round(stats.t.ppf(1-p, df), 3) print(value) # 1.782
What are t-critical values for 2-tailed test with $\alpha = 0.05$ and sample size 30
Ans: $\pm 2.045$
p = 0.025 sample_size = 30 df = sample_size - 1 # p for left-tailed test value = round(stats.t.ppf(p, df), 3) print(value) # -2.045 # 1-p for right-tailed test value = round(stats.t.ppf(1-p, df), 3) print(value) # 2.045
What are the limits for the right area of t-statistic when sample size is 24 and t-statistic is 2.45
.02 and .01
value = 2.45 sample_size = 24 df = sample_size - 1 p = round(1 - stats.t.cdf(value, df), 3) print(p) # 0.011
t-Statistic
$t = \frac{\bar{X}-\mu_0}{\frac{S}{\sqrt{n}}}$
- The larger/smaller the value of $\bar{X}$, the stronger the evidence that $\mu > \mu_0$
- The larger/smaller the value of $\bar{X}$, the stronger the evidence that $\mu < \mu_0$
- The further the value of $\bar{X}$ from $\mu_0$ in either direction, the stronger/weaker the evidence that $\mu \ne \mu_0$
One Sample t-Test
$t = \frac{\bar{X}-\mu_0}{\frac{S}{\sqrt{n}}}$
\[H_0: \mu = \mu_0 \\\begin{align*} H_A &: \mu < \mu_0 \\ &: \mu > \mu_0 \\ &: \mu \ne \mu_0 \end{align*}\]$\alpha$ Levels (column levels of t-table)
- What will increase the t-Statistic
- Large difference between $\bar{X}$ and $\mu_0$
- Larger $n$
- Larger $S$
- Large Standard Error
- Larger t-Statistic
- => Lower probability of obtaining t-Statistic
- => Larger $\bar{X} - \mu_0$
Example - Finches Beek Width
- Average known Beak Width = 6.07 mm
- $H_0: \mu = 6.07$
- $H_A: \mu \ne 6.07$
- Sample Size = 500
- Degrees of Freedom = 499
- Compute sample mean and std dev from the sample dataset
- $\bar{X} = 6.470$
- $S = 0.396$
- t-Statistic
- $ t = \frac{6.47 - 6.07}{0.396/\sqrt{500}} = \frac{}{0.0179} = 22.346$
- Reject Null or Fail to reject Null
- Reject null since t-value is very large
- probability of getting this t-value is very very small
- probability of getting the sample with beek width 6.47 from the population with mean 6.07 is very very small
- p-value
- probability of getting a t-statistic
- Reject null since t-value is very large
P-Value
Compute t-statistic
- $t = \frac{\bar{X}-\mu_0}{\frac{S}{\sqrt{n}}}$
One-tailed Test
- p-value is the probability
- above the t-Statistic if it’s positive, or
- below the t-Statistic if it’s negative
- p-value is the probability
Two-tailed Test
- p-value is the probability of the sum of both
- above the t-Statistic and
- below the t-Statistic
- p-value is the probability of the sum of both
Reject the Null when the p-value is less than the $\alpha$ level
Example
Sample = [5, 19, 11, 23, 12, 7, 3, 21]
Is this sample mean significantly different from 10 at an alpha level of 0.05?
Different => two-tailed t-test
t = 0.977
from scipy import stats def sample_std(data): xbar = np.mean(data) std = [(d - xbar)**2 for d in data] df = len(data)-1 std = np.sqrt(sum(std)/df) return std data = [5, 19, 11, 23, 12, 7, 3, 21] xbar = np.mean(data) print(xbar) # 12.625 n = len(data) df = n - 1 # 7 S = sample_std(data) print(S) # 7.6 t = (xbar - 10)/(S/np.sqrt(n)) print(f't={t:.3f}') # 0.977
- Since two-tailed test then
- $p = p(t<-0.9777) + p(t>0.9777)$
- From the table
- df = 7 and t = 0.977
- Left p = 0.18 [between 0.20 and 0.15]
- Similarly, Right p = 0.18 [between 0.20 and 0.15]
- since symmetrical
- $ p = 0.36 ~(0.30 < p < .40)$
- Since two-tailed test then
p = round(1 - stats.t.cdf(t, df), 3) print(2*p) # 0.36
$p$ is not statisticaly significant since $p=0.36 > \alpha = 0.05$, so we fail to reject the Null
- Thus, $H_0: \mu = 10$
Example
Mean Rent = 1830 for all apartments
Company A wants to know if the rent they are charging is significantly different at $\alpha = 0.05$
- Sample: $n=25,~ \bar{X}=1700,~ S=200$
$H_0: \mu = 1830$ and $H_A: \mu \ne 1830$
What are t-critical values
t-Critical = $\pm 2.064$
alpha = 0.05 sample_size = 25 df = sample_size-1 # Two tailed test, so alpha/2 t_critical = stats.t.ppf(alpha/2, df) print(f't_critical = {t_critical:.3f} and {-t_critical:.3f}') # -2.064
What is the t-statistic value
$t = -3.25$
$S = \sqrt{\frac{\Sigma(X_i - \bar{X})^2}{n-1}} = 200$
$t = \frac{xbar - mu}{S/np.sqrt(n)}$
mu = 1830 sample_size = 25 xbar = 1700 S = 200 t = (xbar - mu)/(S/np.sqrt(sample_size)) print(f't={t:.3f}') # -3.250
t is in critical region so reject the null in favor of $H_A: \mu \ne 1830$
Rental company charges significantly less that population 1830
What is the Confidence Interval for the population for Company A?
95% Confidence Interval = (1617.44, 1782.56)
- $\pm ~\text{t_critical} * \text{std_error} $
Margin of Error = 82.56
std_error = S/np.sqrt(sample_size) CI95_lb = xbar - abs(t_critical) * std_error CI95_ub = xbar + abs(t_critical) * std_error print(f'95% CI = {CI95_lb:.2f}, {CI95_ub:.2f}') # 1617.44, 1782.56 margin_of_error = abs(t_critical) * std_error print(f'margin of error = {margin_of_error:.2f}') # 82.56
If n = 100 then t_critical=-1.984 and Margin of Error = 39.68
Increase of sample size will reduce Margin of error
alpha = 0.05 sample_size = 100 df = sample_size-1 # Two tailed test, so alpha/2 t_critical = stats.t.ppf(alpha/2, df) print(f't_critical = {t_critical:.3f} and {-t_critical:.3f}') # -1.984 and 1.98 mu = 1830 xbar = 1700 S = 200 t = (xbar - mu)/(S/np.sqrt(sample_size)) print(f't={t:.3f}') # -6.500 std_error = S/np.sqrt(sample_size) CI95_lb = xbar - abs(t_critical) * std_error CI95_ub = xbar + abs(t_critical) * std_error print(f'95% CI = {CI95_lb:.2f}, {CI95_ub:.2f}') # 1660.32, 1739.68 margin_of_error = abs(t_critical) * std_error print(f'margin of error = {margin_of_error:.2f}') # 39.68
Cohen’s d
Standardised mean difference that measures the distance between means in standardised units
$Cohen’s~d = \frac{\bar{X}-\mu}{S}$
mu = 1830 n = 25 xbar = 1700 S = 200 alpha = 0.05 d = (xbar - mu)/S print(f'd={d:.3f}') # -0.65
Dependent Samples
Same subject takes the test twice
Within subject designs
- each subject is assigned two conditions in random order
in control but get treatment
two kinds of treatment
Every subject is given a Pre-Test and a Post-Test
- Growth over time (Longitudinal Study)
- Each subject at different points of time
xi yi Di = xi - yi x1 y1 D1 = x1-y1 x2 y2 D2 = x2-y2 x3 y3 D3 = x3-y3 - each subject is assigned two conditions in random order
Example - Keyboards
Errors in two design of keyboards (QWERTY and Alphabetical)
Mean Error on Querty Keyboard = 5.08 and Alphabetical Keyboard = 7.98
import numpy as np # https://naneja.github.io/datasets file = './data/Keyboards.csv' df = pd.read_csv(file) xbar_q = df.QWERTYerrors.mean() xbar_a = df.Alphabeticalerrors.mean() print(xbar_q, xbar_a) # 5.08 7.8
Are these differences significant?
$n = 25$
$H_0: \mu_Q = \mu_A ~and~ H_A: \mu_Q \ne \mu_A$
- Also can say $\mu_Q - \mu_A = 0$
What is Point Estimate for $\mu_Q - \mu_A$
-2.72
point_estimate = xbar_q - xbar_a print(f'point_estimate={point_estimate:.3f}') # -2.720
What is S
3.69
S = df['d'].std(ddof=1) # 3.69 # delta degrees of freedom 1 for sample df['d'] = (df.QWERTYerrors - df.Alphabeticalerrors) # Compute S for d m = df.d.mean() df['S'] = (df.d - m)**2 S = np.sqrt(df.S.sum() / (df.shape[0] - 1)) print(f'S={S:.2f}')
What is t-Statistic when S = 3.69
t = -3.69
S = 3.69 t = point_estimate / (S/np.sqrt(df.shape[0])) print(f't={t:.2f}')
What are t-Critical Values for $\alpha=0.05$
$\pm 2.064$
from scipy import stats alpha = 0.05 t_critical = stats.t.ppf(alpha/2, df.shape[0]-1) print(f't_critical= pm {abs(t_critical):.3f}') # -2.064
Reject the Null or Fail to reject Null
- Reject the Null
Significant Less Error and we may say causal effect due to keyboard design
95% Confidence Interval
-4.24, -1.20
std_error = S/np.sqrt(df.shape[0]) CI95_lb = point_estimate - abs(t_critical) * std_error CI95_ub = point_estimate + abs(t_critical) * std_error print(f'95% CI = {CI95_lb:.2f}, {CI95_ub:.2f}') # -4.24, -1.20 margin_of_error = abs(t_critical) * S/np.sqrt(df.shape[0]) print(f'margin of error = {margin_of_error:.2f}') # 1.52
- Users will make fewer errors in the range of 4 to 1 on querty keyboard than alpha errors
Advantages and Disadvantages- Dependent Samples
- Within-Subject design
- Two Conditions
- Longitudinal
- Pre-Test, Post-Test
- Advantages
- Controls for individual differences
- Use Fewer Subjects
- Cost-Effective
- Less Time-consuming
- Less Expensive
- Disadvantages
- Carry-over Effects
- Second measurement can be affected by first treatment
- Order may influence results
- Within-Subject design
Independent Samples
- Between-Subject Designs
- Experimental
- Observational
$ t = \frac{\bar{X_1}-\bar{X_2}}{standard~error} $
Reject $H_0$ if $p<\alpha$
Fail to Reject $H_0$ if $p > \alpha$
Standard Deviation = $\sqrt{S_1^2 + S_2^2}$
Standard Error $ = \frac{S}{\sqrt{n}} = \frac{\sqrt{S_1^2 + S_2^2}}{\sqrt{n}} = \sqrt{\frac{S_1^2 + S_2^2}{n}} = \sqrt{\frac{S1^2}{n} + \frac{S2^2}{n}} = \sqrt{\frac{S1^2}{n_1} + \frac{S2^2}{n_2}}$
Degrees of Freedom $ df= (n_1-1) + (n_2-1) = n_1 + n_2 -2$
- $ t = \frac{(\bar{X_1}-\bar{X_2})}{SE}$
Example - Food Prices
$H_0 : \mu_1 = \mu_2$
$H_A : \mu_1 \ne \mu_2$
Sample Averages
- 8.94 and 11.14
Size of Each Sample
- 18 and 14
Sample Standard Deviations
- 2.65 and 2.18
Standard Error
- 0.85
t-Statistic
- $\pm 2.58$
$t^*$ Critical value for two-tailed test at $\alpha=0.05$
- degrees of freedom = $n_1 + n_2 - 2$
- $\pm 2.042$
Reject the Null since $t > t*$
Prices are significantly different for both areas
df = pd.read_csv('./data/FoodPrices.csv') data1 = list(df.AverageMealPriceArea1.dropna().values) data2 = list(df.AverageMealPriceArea2.dropna().values) n1 = len(data1) # 18 n2 = len(data2) # 14 df = n1 + n2 - 2 # 30 print(f'n1 = {n1} and n2 = {n2}') # 18 14 print(f"df = {df}") # 30 xbar1 = np.mean(data1) # 8.94 xbar2 = np.mean(data2) # 11.14 print(f'mean1 = {xbar1:.2f} and mean2 = {xbar2:.2f}') # 8.94 11.14 # Delta Degrees of Freedom 1 for sample std1 = np.std(data1, ddof=1) # 2.65 std2 = np.std(data2, ddof=1) # 2.18 print(f'std1 = {std1:.2f} and std2 = {std2:.2f}') # 2.65 2.18 S = np.sqrt(std1**2/n1 + std2**2/n2) #0.85 print(f'S = std error = {S:.2f}') # 0.85 t = abs((xbar1 - xbar2)/S) # direction doesn't matter print(f't = {t:.2f}') # 2.58 alpha = 0.05/2 print(f"df={df} Alpha={alpha} t={t}") t_critical = 2.042 # from table print(f"from table t-critical {t_critical}") from scipy import stats t_critical = stats.t.ppf(1-alpha, df) print(f't_critical = pm {t_critical:.3f}') # t_critical = pm 2.042 if t > t_critical: print("t is greater than t-critical") # true print("Reject Null") # true else: print("t is less than t-critical") print("fail to reject null")