Random variables (RVs) are used to represent the numeric value associated with the outcome(s) of a random process. [Strictly speaking an RV is a function that maps each outcome in the sample space to a real number].
A discrete RV can only take values from a countable set:
\[X = \begin{cases} 10 & \text{if} \; H \\ -5 & \text{if} \; T \end{cases}\]
\[X = \{ 0, 1, 2, \cdots, \infty \}\]
A continuous RV can take an infinite number of possible values from an uncountable set. E.g. measuring the amount of rainfall on a particular day. The RV could be any positive real number:
\[X \in \mathbb R_{+}\]
A probability distribution is a function that gives the probability associated with each outcome in the experiment.
A frequency distribution shows the observed frequency of specific outcomes over multiple trials of an experiment.
E.g. let \(X\) be the RV for the outcome of rolling a six-sided die. Suppose we run the experiment 1000 times, and construct a frequency table showing how many times each outcome occurred:
diceroll = data.frame(X = sample(x = c(1:6), size = 1000, replace = TRUE))
plot1 = ggplot(data = diceroll, aes(x = as.factor(X))) +
geom_bar(width = 0.3) +
ggtitle('frequency distribution of X') +
xlab('X') + ylab('frequency')
plot2 = ggplot(data = diceroll, aes(x = as.factor(X))) +
geom_bar(width = 0.3, aes(y = (..count..)/sum(..count..))) +
ggtitle('relative frequency distribution of X') +
xlab('X') + ylab('relative frequency')
grid.arrange(plot1, plot2, ncol = 2)
If you ran the above experiment infinitely many times, you would observe the relative frequency of each outcome converge to \(\frac 16\). Under the frequentist definition of probability you would define the probability of each outcome as \(\frac 16\).
You can construct a frequency distribution for a continuous RV by discretizing the sample space into intervals (“bins”) and measuring the frequency of observations in each interval.
Is the specific functional form of a probability distribution. There are two types:
For discrete RVs, a probability mass function (pmf): gives the specific probability of each value the RV can take. All probabilities add to 1.
E.g. the theoretical pmf for rolling a six-sided die:
For continuous RVs, a probability density function (pdf): gives the relative likelihood associated with each point in the sample space. The area under a pdf is 1. E.g. if instead of a six-sided die, you had a random number generator giving real numbers between 1 and 6, its pdf would be:
For continuous RVs, the absolute probability that it equals a particular value is zero, since there is an infinite range of possibility. Thus pdfs are used to predict the probability of an RV falling within a range of values, rather than on a specific value.
The expected value of a random variable is the weighted average of all possible outcomes, using probabilities as weights. Also known as the mean:
\[\text{E}[X] = \sum_i P_i X_i = \mu\]
where \(\text{E}[\cdot]\) is the expectation operator and \(\mu\) denotes the mean of \(X\).
E.g. if you roll a six-sided die the expected value of \(X\) is:
\[\text{E}[X] = \sum_i P_i X_i = \frac 16 \cdot 1 + \frac 16 \cdot 2 + \frac 16 \cdot 3 + \frac 16 \cdot 4 + \frac 16 \cdot 5 + \frac 16 \cdot 6 = 3.5\]
Expectation is linear:
\[\text{E}[aX] = a \text{E}[X]\]
e.g. if you multiplied values on the die by two, the expected value would also multiply by two:
\[\text{E}[2X] = 2 \text{E}[X] = 2 \cdot 3.5 = 7\]
Expectation is additive:
\[\text{E}[X + Y] = \text{E}[X] + \text{E}[Y]\]
e.g. if you rolled two dice, and let \(X\) be the RV for one and \(Y\) the other, then
\[\text{E}[X + Y] = 3.5 + 3.5 = 7\]
In general the expected value of two RVs is not multiplicative, i.e.
\[\text{E}[XY] \neq \text{E}[X] \; \text{E}[Y]\]
The only exception is if \(X\) and \(Y\) are independent, in which case \(\text{E}[XY] = \text{E}[X] \; \text{E}[Y]\).
The concept of expectation can be generalized to include higher powers.
A moment is an expectation of a power of a random variable. The moment of an RV is defined as:
\[n \text{-th moment of } X = \text{E}[X^n] = \sum_i P_i X_i^n\]
E.g. for a six-sixed die roll:
\[\text{E}[X] = \sum_i P_i X_i = 3.5 = \mu\]
\[\text{E}[X^2] = \sum_i P_i X_i^2 = \frac 16 \cdot 1^2 + \frac 16 \cdot 2^2 + \frac 16 \cdot 3^2 + \frac 16 \cdot 4^2 + \frac 16 \cdot 5^2 + \frac 16 \cdot 6^2 = \frac{91}{6}\]
A central moment is an expectation of a power of a random variable about its mean. The central moment of an RV is defined as:
\[n \text{-th central moment of } X = \text{E}[(X-\mu)^n]\]
The first central moment of \(X\) is:
\[\text{E}[X - \mu] = \text{E}[X] - \text{E}[X] = 0\]
The second central moment of \(X\) is:
\[\text{E}[(X-\mu)^2] \label{secondCentralMoment}\]
This is also known as the variance of \(X\).
Expanding \(\eqref{secondCentralMoment}\) and using the fact that \(\mu = \text{E}[X]\),
\[ \begin{align} E[(X-\mu)^2] &= E[(X - E[X])^2] \nonumber\\ &= E[X^2 - 2XE[X] + E[X]^2] \nonumber\\ &= E[X^2] - 2E[X]E[X] + E[X]^2 \nonumber\\ &= E[X^2] - 2E[X]^2 + E[X]^2 \nonumber\\ &= E[X^2] - E[X]^2 \label{variance} \end{align} \]
Equations \(\eqref{secondCentralMoment}\) and \(\eqref{variance}\) are both common expressions for the variance of a random variable.
Moments give useful information about the properties of a random variable’s probability distribution (next).
A summary statistic is a single value that summarizes some property of a distribution. Below are some common summary statistics used for describing the the distribution of random variables.
A measure of central tendency. Defined as the weighted average of all possible values of the RV, using probabilities as weights.
Calculated by taking the first moment (expected vale) of the RV:
\[\mu = \text{E}[X] = \sum_i P_i X_i\]
A measure of the spread of the distribution. Denoted \(\sigma^2\) or \(\text{Var}[\cdot]\).
Calculated by taking the second central moment of the RV:
\[\sigma^2 = \text{E}[(X-\mu)^2] = \text{E}[X^2] - \text{E}[X]^2\]
Unlike expectation, variance is not linear:
\[\text{Var}[aX] = a^2 \text{Var}[X]\]
Shifting the distribution left or right leaves the variance unchanged, since variance is a measure of spread:
\[\text{Var}[X+a] = \text{Var}[X]\]
Square root of the variance.
\[\sigma = \sqrt{\text{E}[(X-\mu)^2]}\]
A measure of the extent to which a distribution is skewed to one side.
Defined as the third standardized moment of the RV:
\[\gamma_1 = \text{E}\bigg[ \bigg( \frac{X - \mu}{\sigma} \bigg)^3 \bigg] = \frac{\text{E}[(X-\mu)^3]}{\sigma^3}\]
A measure of the “fatness” of the tails of the distribution.
Defined as the fourth standardized moment:
\[\gamma_2 = E \bigg[ \bigg( \frac{X - \mu}{\sigma} \bigg)^4 \bigg] = \frac{E[(X-\mu)^4]}{\sigma^4}\]