Maths and Statistics

3 minute read

Published:

This post covers certain formulas useful for Deep Learning.

Expected Value

  • $\mathop{\mathbb{E}}(X) == \mu_X == \Sigma(xp)$, where $x$ is random variable
  • $\mathop{\mathbb{E}}(a) == a$, where $a$ is non-random variable/constant
  • Expected value of the product of two independent random variable is $\mathop{\mathbb{E}}(X.Y) == \mathop{\mathbb{E}}(X).\mathop{\mathbb{E}}(Y) == \mu_X.\mu_Y$
  • Expected value of scaled variable is $\mathop{\mathbb{E}}(a.X) = a.\mathop{\mathbb{E}}(X)$
  • Expected value of the product of correlated variables is $\mathop{\mathbb{E}}(X.Y) == \mathop{\mathbb{E}}(X).\mathop{\mathbb{E}}(Y) + Cov(X,Y) == \mu_X.\mu_Y + Cov(X,Y)$
    • Variables are correlated if value of one of them, in some degree, determines or influences the other
    • Covariance measure as how much these variables are correlated
    • Independent variables will have zero covariance
  • Expected value of the sum of variables (independent or not) is $\mathop{\mathbb{E}}(X+Y) == \mathop{\mathbb{E}}(X)+\mathop{\mathbb{E}}(Y) == \mu_X + \mu_Y$
  • Linearity of Expectation, whether independent or not, $\mathop{\mathbb{E}}(a.X+b.Y+c) == a.\mathop{\mathbb{E}}(X) + b.\mathop{\mathbb{E}}(Y) + c == a. \mu_X + b. \mu_Y + c$

Covariance

  • Covariance of random variables $X$ and $Y$ is $Cov(X, Y) == \sigma (X,Y) == \sigma_{X,Y}$
    • Zero if the variables are independent
    • +ve if one increases then other increases
    • -ve if one increases then other decreases
  • $\sigma(X,Y) = \mathop{\mathbb{E}}[X-\mathop{\mathbb{E}}(X)].\mathop{\mathbb{E}}[Y-\mathop{\mathbb{E}}(Y)] = \mathop{\mathbb{E}}[X-\mu_X].\mathop{\mathbb{E}}[Y-\mu_Y]$
  • Measures total variation of two random variables from their expected values. We can get direction of the relationship and it doesn’t indicate the strength of relationship.
    • $\sigma(X,Y) = \frac{\sum(X-\bar{X})(Y-\bar{Y})}{n}$
  • If the variables are independent
  • $\mathop{\mathbb{E}}(X.Y) = \mu_X.\mu_Y + \sigma(X,Y)$
  • $\sigma(X,Y) = \mathop{\mathbb{E}}(X.Y) - \mu_X.\mu_Y$, if the variables are independent then $\mathop{\mathbb{E}}(X.Y) = \mu_X.\mu_Y$, thus $\sigma(X,Y) = 0$
  • Converse is not true, $\sigma(X,Y) = 0$ doesn’t mean that varaiables are independent
    • if $X$ is uniformly distributed in $[-1, 1]$, then $\mathop{\mathbb{E}}(X)=0$, also $\mathop{\mathbb{E}}(X^3)=0$
    • $\sigma(X,Y=X^2) = \mathop{\mathbb{E}}(X,X^2) - \mathop{\mathbb{E}}(X)-\mathop{\mathbb{E}}(X^2)$
    • $\sigma(X,Y=X^2) = \mathop{\mathbb{E}}(X^3)-\mathop{\mathbb{E}}(X).\mathop{\mathbb{E}}(X^2) = 0 - 0.\mu_{X^2} = 0$
    • Cov is zero but the variables $X, X^2$ are dependent
  • Covariance is commutative
    • $\sigma(X,Y)=\sigma(Y,X)$
  • Covariance is invariant to the displacement of one or both variables
    • $\sigma(X+h, Y+k)=\sigma(X, Y)$
  • Covariance is scaled by the scales of $X$ and $Y$
    • $\sigma(a.X, b.Y)= a.b.\sigma(X, Y)$
  • Non-linearity Property
    • $\sigma(a.X+h, b.Y+k)= a.b.\sigma(X, Y)$

Variance

  • $Var(X) = \sigma^2(X) = \Sigma(X^2.p) - \mu^2$
  • Special case of Covariance when both variables are same
  • $Var(X) = Cov(X,X) == \sigma(X,X) == \sigma^2(X)==\sigma_X^2$
  • Variance measures how much values of a random variable are spread out i.e. how much they are different among them
    • $Var(X) = \mathop{\mathbb{E}}[(X-\mathop{\mathbb{E}}(X))^2] = \mathop{\mathbb{E}}[(X-\mu_X)^2]$
  • Variance from Expected Values of $X$ and $X^2$
    • $\sigma_X^2 = \mathop{\mathbb{E}}(X^2)-\mathop{\mathbb{E}}^2(X)$
    • $\sigma_X^2 = \mathop{\mathbb{E}}(X^2)-\mu_X^2$
  • Variance of a non-random variable is zero
    • $\sigma^2(a)=0$
  • Variance is invariant to the displacement
    • $\sigma^2(X+h) = \sigma^2(X)$
  • If variable is scaled by constant, the variance gets scaled by square of the constant
    • $\sigma^2(a.X) = a^2 . \sigma^2(X)$
  • Variance of Sum and Difference of two correlated random variables
    • $\sigma^2(X+Y) = \sigma^2(X) + \sigma^2(Y) + 2.\sigma(X,Y)$
    • $\sigma^2(X-Y) = \sigma^2(X) + \sigma^2(Y) - 2.\sigma(X,Y)$
    • If the variables are independent, then $\sigma(X,Y)=0$ and
      • $\sigma^2(X+Y) = \sigma^2(X) + \sigma^2(Y)$
      • $\sigma^2(X-Y) = \sigma^2(X) + \sigma^2(Y)$
  • Variance of product of two correlated random variables
    • $\sigma^2(X,Y)= \sigma(X^2, Y^2) + \mathop{\mathbb{E}}(X^2).\mathop{\mathbb{E}}(Y^2) - [\sigma(X,Y) + \mathop{\mathbb{E}}(X).\mathop{\mathbb{E}}(Y)]^2 $
    • $\sigma^2(X,Y)= \sigma(X^2, Y^2) + \mu(X^2).\mu(Y^2) - [\sigma(X,Y) + \mu(X).\mu(Y)]^2 $

Mean Square Error (MSE) of an Estimator

\[\begin{eqnarray*} MSE_\theta &=& \mathop{\mathbb{E}}(\hat\theta-\theta)^2 \\ &=& \mathop{\mathbb{E}}(\hat\theta^2 + \theta^2 - 2\theta\hat\theta) \\ &=& \mathop{\mathbb{E}}(\hat\theta^2) + \theta^2 -2\theta\mathop{\mathbb{E}}(\hat\theta) \\ &=& \sigma^2(\hat\theta) + \mathop{\mathbb{E}^2}(\hat\theta) + \theta^2 -2\theta\mathop{\mathbb{E}}(\hat\theta) [since~\sigma_X^2 = \mathop{\mathbb{E}}(X^2)-\mathop{\mathbb{E}^2}(X) \\ &=& \sigma^2(\hat\theta) + [\mathop{\mathbb{E}(\hat\theta)-\theta}]^2 \\ &=& Var(\hat\theta) + (Bias~of~\hat\theta)^2 \end{eqnarray*}\]

References:

  • https://www.odelama.com/data-analysis/Commonly-Used-Math-Formulas/

  • http://people.missouristate.edu/songfengzheng/Teaching/MTH541/Lecture%20notes/evaluation.pdf