Maths and Statistics
Published:
This post covers certain formulas useful for Deep Learning.
Expected Value
- $\mathop{\mathbb{E}}(X) == \mu_X == \Sigma(xp)$, where $x$ is random variable
- $\mathop{\mathbb{E}}(a) == a$, where $a$ is non-random variable/constant
- Expected value of the product of two independent random variable is $\mathop{\mathbb{E}}(X.Y) == \mathop{\mathbb{E}}(X).\mathop{\mathbb{E}}(Y) == \mu_X.\mu_Y$
- Expected value of scaled variable is $\mathop{\mathbb{E}}(a.X) = a.\mathop{\mathbb{E}}(X)$
- Expected value of the product of correlated variables is $\mathop{\mathbb{E}}(X.Y) == \mathop{\mathbb{E}}(X).\mathop{\mathbb{E}}(Y) + Cov(X,Y) == \mu_X.\mu_Y + Cov(X,Y)$
- Variables are correlated if value of one of them, in some degree, determines or influences the other
- Covariance measure as how much these variables are correlated
- Independent variables will have zero covariance
- Expected value of the sum of variables (independent or not) is $\mathop{\mathbb{E}}(X+Y) == \mathop{\mathbb{E}}(X)+\mathop{\mathbb{E}}(Y) == \mu_X + \mu_Y$
- Linearity of Expectation, whether independent or not, $\mathop{\mathbb{E}}(a.X+b.Y+c) == a.\mathop{\mathbb{E}}(X) + b.\mathop{\mathbb{E}}(Y) + c == a. \mu_X + b. \mu_Y + c$
Covariance
- Covariance of random variables $X$ and $Y$ is $Cov(X, Y) == \sigma (X,Y) == \sigma_{X,Y}$
- Zero if the variables are independent
- +ve if one increases then other increases
- -ve if one increases then other decreases
- $\sigma(X,Y) = \mathop{\mathbb{E}}[X-\mathop{\mathbb{E}}(X)].\mathop{\mathbb{E}}[Y-\mathop{\mathbb{E}}(Y)] = \mathop{\mathbb{E}}[X-\mu_X].\mathop{\mathbb{E}}[Y-\mu_Y]$
- Measures total variation of two random variables from their expected values. We can get direction of the relationship and it doesn’t indicate the strength of relationship.
- $\sigma(X,Y) = \frac{\sum(X-\bar{X})(Y-\bar{Y})}{n}$
- If the variables are independent
- $\mathop{\mathbb{E}}(X.Y) = \mu_X.\mu_Y + \sigma(X,Y)$
- $\sigma(X,Y) = \mathop{\mathbb{E}}(X.Y) - \mu_X.\mu_Y$, if the variables are independent then $\mathop{\mathbb{E}}(X.Y) = \mu_X.\mu_Y$, thus $\sigma(X,Y) = 0$
- Converse is not true, $\sigma(X,Y) = 0$ doesn’t mean that varaiables are independent
- if $X$ is uniformly distributed in $[-1, 1]$, then $\mathop{\mathbb{E}}(X)=0$, also $\mathop{\mathbb{E}}(X^3)=0$
- $\sigma(X,Y=X^2) = \mathop{\mathbb{E}}(X,X^2) - \mathop{\mathbb{E}}(X)-\mathop{\mathbb{E}}(X^2)$
- $\sigma(X,Y=X^2) = \mathop{\mathbb{E}}(X^3)-\mathop{\mathbb{E}}(X).\mathop{\mathbb{E}}(X^2) = 0 - 0.\mu_{X^2} = 0$
- Cov is zero but the variables $X, X^2$ are dependent
- Covariance is commutative
- $\sigma(X,Y)=\sigma(Y,X)$
- Covariance is invariant to the displacement of one or both variables
- $\sigma(X+h, Y+k)=\sigma(X, Y)$
- Covariance is scaled by the scales of $X$ and $Y$
- $\sigma(a.X, b.Y)= a.b.\sigma(X, Y)$
- Non-linearity Property
- $\sigma(a.X+h, b.Y+k)= a.b.\sigma(X, Y)$
Variance
- $Var(X) = \sigma^2(X) = \Sigma(X^2.p) - \mu^2$
- Special case of Covariance when both variables are same
- $Var(X) = Cov(X,X) == \sigma(X,X) == \sigma^2(X)==\sigma_X^2$
- Variance measures how much values of a random variable are spread out i.e. how much they are different among them
- $Var(X) = \mathop{\mathbb{E}}[(X-\mathop{\mathbb{E}}(X))^2] = \mathop{\mathbb{E}}[(X-\mu_X)^2]$
- Variance from Expected Values of $X$ and $X^2$
- $\sigma_X^2 = \mathop{\mathbb{E}}(X^2)-\mathop{\mathbb{E}}^2(X)$
- $\sigma_X^2 = \mathop{\mathbb{E}}(X^2)-\mu_X^2$
- Variance of a non-random variable is zero
- $\sigma^2(a)=0$
- Variance is invariant to the displacement
- $\sigma^2(X+h) = \sigma^2(X)$
- If variable is scaled by constant, the variance gets scaled by square of the constant
- $\sigma^2(a.X) = a^2 . \sigma^2(X)$
- Variance of Sum and Difference of two correlated random variables
- $\sigma^2(X+Y) = \sigma^2(X) + \sigma^2(Y) + 2.\sigma(X,Y)$
- $\sigma^2(X-Y) = \sigma^2(X) + \sigma^2(Y) - 2.\sigma(X,Y)$
- If the variables are independent, then $\sigma(X,Y)=0$ and
- $\sigma^2(X+Y) = \sigma^2(X) + \sigma^2(Y)$
- $\sigma^2(X-Y) = \sigma^2(X) + \sigma^2(Y)$
- Variance of product of two correlated random variables
- $\sigma^2(X,Y)= \sigma(X^2, Y^2) + \mathop{\mathbb{E}}(X^2).\mathop{\mathbb{E}}(Y^2) - [\sigma(X,Y) + \mathop{\mathbb{E}}(X).\mathop{\mathbb{E}}(Y)]^2 $
- $\sigma^2(X,Y)= \sigma(X^2, Y^2) + \mu(X^2).\mu(Y^2) - [\sigma(X,Y) + \mu(X).\mu(Y)]^2 $
Mean Square Error (MSE) of an Estimator
\[\begin{eqnarray*} MSE_\theta &=& \mathop{\mathbb{E}}(\hat\theta-\theta)^2 \\ &=& \mathop{\mathbb{E}}(\hat\theta^2 + \theta^2 - 2\theta\hat\theta) \\ &=& \mathop{\mathbb{E}}(\hat\theta^2) + \theta^2 -2\theta\mathop{\mathbb{E}}(\hat\theta) \\ &=& \sigma^2(\hat\theta) + \mathop{\mathbb{E}^2}(\hat\theta) + \theta^2 -2\theta\mathop{\mathbb{E}}(\hat\theta) [since~\sigma_X^2 = \mathop{\mathbb{E}}(X^2)-\mathop{\mathbb{E}^2}(X) \\ &=& \sigma^2(\hat\theta) + [\mathop{\mathbb{E}(\hat\theta)-\theta}]^2 \\ &=& Var(\hat\theta) + (Bias~of~\hat\theta)^2 \end{eqnarray*}\]References:
https://www.odelama.com/data-analysis/Commonly-Used-Math-Formulas/
http://people.missouristate.edu/songfengzheng/Teaching/MTH541/Lecture%20notes/evaluation.pdf