t-Tests

12 minute read

Published:

This post covers t-Tests.

t-Distribution

  • Z - test works when we know $\mu$ and $\sigma$
  • Use Samples
    • How different a sample mean is from a population
    • How different two sample means are from each other
      • Two samples can be
        • Independent
        • Dependent
  • Estimate Population Standard Deviation using sample standard deviation with Bessel’s correction
    • Bessel’s correction is the use of $n − 1$ instead of $n$ in the formula for the sample variance and sample standard deviation, where $n$ is the number of observations in a sample.
    • This method corrects the bias in the estimation of the population variance.
    • It also partially corrects the bias in the estimation of the population standard deviation.
    • However, the correction often increases the mean squared error in these estimations.
    • This technique is named after Friedrich Bessel.
  • To find out how typical or atypical (unusual) a sample mean - find its location on the distribution of sample means i.e. sampling distribution
    • we can determine when we know population parameters, $\mu, \sigma$
    • $ std~errro= \frac{\sigma}{\sqrt{n}}$
    • $z = \frac{sample~mean - \mu}{std~error} = \frac{mean~difference}{std~error}$
    • Std for Samples = $S = \sqrt{\frac{\Sigma(X_i - \bar{X})^2}{n-1}}$
  • Standard Error depends on sample, we cannot use $\sigma$ if we have sample
  • Thus, we have a new distribution that is more prone to error - t-Distribution
    • more spread out and thicker in the tails than a normal distribution
      • Since large sample sizes gives skinnier sampling distribution
  • What happens as n increases?
    • The t-Distribution approaches to Normal Distribution
    • The t-Distribution gets Skinnier tails
    • $S \rightarrow \sigma$

Degree of Freedom - Sample Standard Deviation

  • We can pick a sample of size $n$ from population using $n$ degrees of freedom
  • Now to compute Standard Deviation, we need sample mean
  • $\bar{X} = \frac{X_1+X_2+X_3+…+X_n}{n}$
  • $ X_1+X_2+X_3+…+X_n = n . \bar{X} $
    • $n-1$ Degrees of Freedom
    • We may vary $n-1$ values to keep sum of these values as $n\bar{X}$
    • $n-1$ is the effective sample size since only $n-1$ values are independent if we know the mean.
    • $S = \sqrt{\frac{\Sigma(X_i - \bar{X})^2}{n-1}}$
  • As degrees of freedom increases, the t-distribution better approxiamate the normal distribution

t-Table

  • Questions

      1. What’s the t-critical value for a one-tailed alpha level of 0.05 with 12 degrees of freedom.

        • Ans 1.782

        • p = 0.05
          df = 12
                   
          # 1-p for right-tailed test
          value = round(stats.t.ppf(1-p, df), 3)
          print(value) # 1.782
          
      2. What are t-critical values for 2-tailed test with $\alpha = 0.05$ and sample size 30

        • Ans: $\pm 2.045$

        • p = 0.025
          sample_size = 30
          df = sample_size - 1
                   
          # p for left-tailed test
          value = round(stats.t.ppf(p, df), 3)
          print(value) # -2.045
                   
          # 1-p for right-tailed test
          value = round(stats.t.ppf(1-p, df), 3)
          print(value) # 2.045
          
      3. What are the limits for the right area of t-statistic when sample size is 24 and t-statistic is 2.45

        • .02 and .01

        • value = 2.45
          sample_size = 24
          df = sample_size - 1
                   
          p = round(1 - stats.t.cdf(value, df), 3)
          print(p) # 0.011
          

t-Statistic

$t = \frac{\bar{X}-\mu_0}{\frac{S}{\sqrt{n}}}$

  • The larger/smaller the value of $\bar{X}$, the stronger the evidence that $\mu > \mu_0$
  • The larger/smaller the value of $\bar{X}$, the stronger the evidence that $\mu < \mu_0$
  • The further the value of $\bar{X}$ from $\mu_0$ in either direction, the stronger/weaker the evidence that $\mu \ne \mu_0$

One Sample t-Test

  • $t = \frac{\bar{X}-\mu_0}{\frac{S}{\sqrt{n}}}$

    \[H_0: \mu = \mu_0 \\\begin{align*} H_A &: \mu < \mu_0 \\ &: \mu > \mu_0 \\ &: \mu \ne \mu_0 \end{align*}\]
  • $\alpha$ Levels (column levels of t-table)

  • What will increase the t-Statistic
    • Large difference between $\bar{X}$ and $\mu_0$
    • Larger $n$
    • Larger $S$
    • Large Standard Error
  • Larger t-Statistic
    • => Lower probability of obtaining t-Statistic
    • => Larger $\bar{X} - \mu_0$

Example - Finches Beek Width

  • Average known Beak Width = 6.07 mm
  • $H_0: \mu = 6.07$
  • $H_A: \mu \ne 6.07$
    • Sample Size = 500
    • Degrees of Freedom = 499
  • Compute sample mean and std dev from the sample dataset
    • $\bar{X} = 6.470$
    • $S = 0.396$
  • t-Statistic
    • $ t = \frac{6.47 - 6.07}{0.396/\sqrt{500}} = \frac{}{0.0179} = 22.346$
  • Reject Null or Fail to reject Null
    • Reject null since t-value is very large
      • probability of getting this t-value is very very small
      • probability of getting the sample with beek width 6.47 from the population with mean 6.07 is very very small
    • p-value
      • probability of getting a t-statistic

P-Value

  • Compute t-statistic

    • $t = \frac{\bar{X}-\mu_0}{\frac{S}{\sqrt{n}}}$
  • One-tailed Test

    • p-value is the probability
      • above the t-Statistic if it’s positive, or
      • below the t-Statistic if it’s negative
  • Two-tailed Test

    • p-value is the probability of the sum of both
      • above the t-Statistic and
      • below the t-Statistic
  • Reject the Null when the p-value is less than the $\alpha$ level

  • Example

    • Sample = [5, 19, 11, 23, 12, 7, 3, 21]

    • Is this sample mean significantly different from 10 at an alpha level of 0.05?

      • Different => two-tailed t-test

      • t = 0.977

      • from scipy import stats
        def sample_std(data):
            xbar = np.mean(data)
            std = [(d - xbar)**2 for d in data]
                 
            df = len(data)-1
            std = np.sqrt(sum(std)/df)
                  
            return std
              
        data = [5, 19, 11, 23, 12, 7, 3, 21]
        xbar = np.mean(data)
        print(xbar) # 12.625
              
        n = len(data)
        df = n - 1 # 7
              
        S = sample_std(data)
        print(S) # 7.6
              
        t = (xbar - 10)/(S/np.sqrt(n))
        print(f't={t:.3f}') # 0.977
        
        • Since two-tailed test then
          • $p = p(t<-0.9777) + p(t>0.9777)$
        • From the table
          • df = 7 and t = 0.977
          • Left p = 0.18 [between 0.20 and 0.15]
          • Similarly, Right p = 0.18 [between 0.20 and 0.15]
            • since symmetrical
        • $ p = 0.36 ~(0.30 < p < .40)$
      • https://www.socscistatistics.com/pvalues/tdistribution.aspx

      • p = round(1 - stats.t.cdf(t, df), 3)
        print(2*p) # 0.36
        
      • $p$ is not statisticaly significant since $p=0.36 > \alpha = 0.05$, so we fail to reject the Null

      • Thus, $H_0: \mu = 10$
  • Example

    • Mean Rent = 1830 for all apartments

    • Company A wants to know if the rent they are charging is significantly different at $\alpha = 0.05$

      • Sample: $n=25,~ \bar{X}=1700,~ S=200$
    • $H_0: \mu = 1830$ and $H_A: \mu \ne 1830$

      • What are t-critical values

        • t-Critical = $\pm 2.064$

            alpha = 0.05
                      
            sample_size = 25
            df = sample_size-1
                      
            # Two tailed test, so alpha/2
            t_critical = stats.t.ppf(alpha/2, df) 
                      
            print(f't_critical = {t_critical:.3f} and {-t_critical:.3f}') # -2.064
          
      • What is the t-statistic value

        • $t = -3.25$

          • $S = \sqrt{\frac{\Sigma(X_i - \bar{X})^2}{n-1}} = 200$

          • $t = \frac{xbar - mu}{S/np.sqrt(n)}$

              mu = 1830
              sample_size = 25
              xbar = 1700
              S = 200
                            
              t = (xbar - mu)/(S/np.sqrt(sample_size))
              print(f't={t:.3f}') # -3.250
            
      • t is in critical region so reject the null in favor of $H_A: \mu \ne 1830$

      • Rental company charges significantly less that population 1830

      • What is the Confidence Interval for the population for Company A?

        • 95% Confidence Interval = (1617.44, 1782.56)

          • $\pm ~\text{t_critical} * \text{std_error} $
        • Margin of Error = 82.56

            std_error = S/np.sqrt(sample_size)
                      
            CI95_lb = xbar - abs(t_critical) * std_error
            CI95_ub = xbar + abs(t_critical) * std_error
                      
            print(f'95% CI = {CI95_lb:.2f}, {CI95_ub:.2f}') # 1617.44, 1782.56
                      
            margin_of_error = abs(t_critical) * std_error
            print(f'margin of error = {margin_of_error:.2f}') # 82.56
          
      • If n = 100 then t_critical=-1.984 and Margin of Error = 39.68

        • Increase of sample size will reduce Margin of error

            alpha = 0.05
                    
            sample_size = 100
            df = sample_size-1
                    
            # Two tailed test, so alpha/2
            t_critical = stats.t.ppf(alpha/2, df) 
                    
            print(f't_critical = {t_critical:.3f} and {-t_critical:.3f}') 
            # -1.984 and 1.98
                    
            mu = 1830
            xbar = 1700
            S = 200
                    
            t = (xbar - mu)/(S/np.sqrt(sample_size))
            print(f't={t:.3f}') # -6.500
                    
            std_error = S/np.sqrt(sample_size)
                    
            CI95_lb = xbar - abs(t_critical) * std_error
            CI95_ub = xbar + abs(t_critical) * std_error
                    
            print(f'95% CI = {CI95_lb:.2f}, {CI95_ub:.2f}') # 1660.32, 1739.68
                    
            margin_of_error = abs(t_critical) * std_error
            print(f'margin of error = {margin_of_error:.2f}') # 39.68
          

Cohen’s d

  • Standardised mean difference that measures the distance between means in standardised units

  • $Cohen’s~d = \frac{\bar{X}-\mu}{S}$

  • mu = 1830
      
    n = 25
    xbar = 1700
    S = 200
    alpha = 0.05
      
    d = (xbar - mu)/S
    print(f'd={d:.3f}') # -0.65
    

Dependent Samples

  • Same subject takes the test twice

  • Within subject designs

    • each subject is assigned two conditions in random order
      • in control but get treatment

      • two kinds of treatment

    • Every subject is given a Pre-Test and a Post-Test

    • Growth over time (Longitudinal Study)
      • Each subject at different points of time
    xiyiDi = xi - yi
    x1y1D1 = x1-y1
    x2y2D2 = x2-y2
    x3y3D3 = x3-y3

Example - Keyboards

  • Errors in two design of keyboards (QWERTY and Alphabetical)

  • Mean Error on Querty Keyboard = 5.08 and Alphabetical Keyboard = 7.98

    • import numpy as np
          
      # https://naneja.github.io/datasets
      file = './data/Keyboards.csv'
      df = pd.read_csv(file)
          
      xbar_q = df.QWERTYerrors.mean()
      xbar_a = df.Alphabeticalerrors.mean()
      print(xbar_q, xbar_a) # 5.08 7.8
      
  • Are these differences significant?

    • $n = 25$

    • $H_0: \mu_Q = \mu_A ~and~ H_A: \mu_Q \ne \mu_A$

      • Also can say $\mu_Q - \mu_A = 0$
    • What is Point Estimate for $\mu_Q - \mu_A$

      • -2.72

      • point_estimate = xbar_q - xbar_a
        print(f'point_estimate={point_estimate:.3f}') # -2.720
        
    • What is S

      • 3.69

      • S = df['d'].std(ddof=1) # 3.69
        # delta degrees of freedom 1 for sample
              
        df['d'] = (df.QWERTYerrors - df.Alphabeticalerrors)
              
        # Compute S for d
        m = df.d.mean()
        df['S'] = (df.d - m)**2
        S = np.sqrt(df.S.sum() / (df.shape[0] - 1))
        print(f'S={S:.2f}')
        
    • What is t-Statistic when S = 3.69

      • t = -3.69

      • S = 3.69
        t = point_estimate / (S/np.sqrt(df.shape[0]))
        print(f't={t:.2f}')
        
    • What are t-Critical Values for $\alpha=0.05$

      • $\pm 2.064$

      • from scipy import stats
        alpha = 0.05
        t_critical = stats.t.ppf(alpha/2, df.shape[0]-1)
        print(f't_critical= pm {abs(t_critical):.3f}') # -2.064
        
    • Reject the Null or Fail to reject Null

      • Reject the Null
    • Significant Less Error and we may say causal effect due to keyboard design

    • 95% Confidence Interval

      • -4.24, -1.20

      • std_error = S/np.sqrt(df.shape[0])
              
        CI95_lb = point_estimate - abs(t_critical) * std_error
        CI95_ub = point_estimate + abs(t_critical) * std_error
        print(f'95% CI = {CI95_lb:.2f}, {CI95_ub:.2f}') # -4.24, -1.20
              
        margin_of_error = abs(t_critical) * S/np.sqrt(df.shape[0])
        print(f'margin of error = {margin_of_error:.2f}') # 1.52
        
      • Users will make fewer errors in the range of 4 to 1 on querty keyboard than alpha errors

Advantages and Disadvantages- Dependent Samples

    • Within-Subject design
      • Two Conditions
      • Longitudinal
      • Pre-Test, Post-Test
    • Advantages
      • Controls for individual differences
      • Use Fewer Subjects
      • Cost-Effective
      • Less Time-consuming
      • Less Expensive
    • Disadvantages
      • Carry-over Effects
      • Second measurement can be affected by first treatment
      • Order may influence results

Independent Samples

  • Between-Subject Designs
    • Experimental
    • Observational
    \[H_0: \mu_1 - \mu_2 = 0 \\\begin{align*} H_A &: \mu_1 - \mu_2 > 0 \\ &: \mu_1 - \mu_2 < 0 \\ &: \mu_1 - \mu_2 \ne 0 \end{align*}\]
  • $ t = \frac{\bar{X_1}-\bar{X_2}}{standard~error} $

  • Reject $H_0$ if $p<\alpha$

  • Fail to Reject $H_0$ if $p > \alpha$

  • Standard Deviation = $\sqrt{S_1^2 + S_2^2}$

  • Standard Error $ = \frac{S}{\sqrt{n}} = \frac{\sqrt{S_1^2 + S_2^2}}{\sqrt{n}} = \sqrt{\frac{S_1^2 + S_2^2}{n}} = \sqrt{\frac{S1^2}{n} + \frac{S2^2}{n}} = \sqrt{\frac{S1^2}{n_1} + \frac{S2^2}{n_2}}$

  • Degrees of Freedom $ df= (n_1-1) + (n_2-1) = n_1 + n_2 -2$

  • $ t = \frac{(\bar{X_1}-\bar{X_2})}{SE}$

Example - Food Prices

$H_0 : \mu_1 = \mu_2$

$H_A : \mu_1 \ne \mu_2$

  • Sample Averages

    • 8.94 and 11.14
  • Size of Each Sample

    • 18 and 14
  • Sample Standard Deviations

    • 2.65 and 2.18
  • Standard Error

    • 0.85
  • t-Statistic

    • $\pm 2.58$
  • $t^*$ Critical value for two-tailed test at $\alpha=0.05$

    • degrees of freedom = $n_1 + n_2 - 2$
    • $\pm 2.042$
  • Reject the Null since $t > t*$

  • Prices are significantly different for both areas

  • df = pd.read_csv('./data/FoodPrices.csv')
      
    data1 = list(df.AverageMealPriceArea1.dropna().values)
    data2 = list(df.AverageMealPriceArea2.dropna().values)
      
    n1 = len(data1) # 18
    n2 = len(data2) # 14
    df = n1 + n2 - 2 # 30
      
    print(f'n1 = {n1} and n2 = {n2}') # 18 14
    print(f"df = {df}") # 30
      
      
    xbar1 = np.mean(data1) # 8.94
    xbar2 = np.mean(data2) # 11.14
    print(f'mean1 = {xbar1:.2f} and mean2 = {xbar2:.2f}') # 8.94 11.14
      
    # Delta Degrees of Freedom 1 for sample
    std1 = np.std(data1, ddof=1) # 2.65
    std2 = np.std(data2, ddof=1) # 2.18
      
    print(f'std1 = {std1:.2f} and std2 = {std2:.2f}') # 2.65 2.18
      
    S = np.sqrt(std1**2/n1 + std2**2/n2) #0.85
    print(f'S = std error = {S:.2f}') # 0.85
      
    t = abs((xbar1 - xbar2)/S) # direction doesn't matter
    print(f't = {t:.2f}') # 2.58
      
    alpha = 0.05/2
      
    print(f"df={df} Alpha={alpha} t={t}")
    t_critical = 2.042 # from table
    print(f"from table t-critical {t_critical}")
      
    from scipy import stats
    t_critical = stats.t.ppf(1-alpha, df)
    print(f't_critical = pm {t_critical:.3f}') # t_critical = pm 2.042
      
    if t > t_critical:
        print("t is greater than t-critical") # true
        print("Reject Null") # true
    else:
        print("t is less than t-critical")
        print("fail to reject null")