1 Random Variables
2 Probability Distributions
- 2.1 Frequency distribution
- 2.2 Probability distribution function
3 Expectated Value
4 Moments
5 Summary Statistics for Distributions

1 Random Variables

Random variables (RVs) are used to represent the numeric value associated with the outcome(s) of a random process. [Strictly speaking an RV is a function that maps each outcome in the sample space to a real number].

A discrete RV can only take values from a countable set:

Observing a coin toss. The sample space is \(\Omega = \{ H, T \}\). Suppose you win 10 points for \(H\), lose 5 for \(T\). You could let \(X\) be the RV for the value of the payoff:

\[X = \begin{cases} 10 & \text{if} \; H \\ -5 & \text{if} \; T \end{cases}\]

Observing the number of cyclists who cross a bridge on a certain day. The sample space is infinite, \(\Omega = \{0, 1, 2, ... \}\) but still countable. In this case it makes sense to let the RV be equal to number of cyclists:

\[X = \{ 0, 1, 2, \cdots, \infty \}\]

A continuous RV can take an infinite number of possible values from an uncountable set. E.g. measuring the amount of rainfall on a particular day. The RV could be any positive real number:

\[X \in \mathbb R_{+}\]

2 Probability Distributions

A probability distribution is a function that gives the probability associated with each outcome in the experiment.

2.1 Frequency distribution

A frequency distribution shows the observed frequency of specific outcomes over multiple trials of an experiment.

E.g. let \(X\) be the RV for the outcome of rolling a six-sided die. Suppose we run the experiment 1000 times, and construct a frequency table showing how many times each outcome occurred:

diceroll = data.frame(X = sample(x = c(1:6), size = 1000, replace = TRUE))

plot1 = ggplot(data = diceroll, aes(x = as.factor(X))) + 
    geom_bar(width = 0.3) +
    ggtitle('frequency distribution of X') + 
    xlab('X') + ylab('frequency') 

plot2 = ggplot(data = diceroll, aes(x = as.factor(X))) + 
    geom_bar(width = 0.3, aes(y = (..count..)/sum(..count..))) +
    ggtitle('relative frequency distribution of X') + 
    xlab('X') + ylab('relative frequency') 

grid.arrange(plot1, plot2, ncol = 2)

If you ran the above experiment infinitely many times, you would observe the relative frequency of each outcome converge to \(\frac 16\). Under the frequentist definition of probability you would define the probability of each outcome as \(\frac 16\).

You can construct a frequency distribution for a continuous RV by discretizing the sample space into intervals (“bins”) and measuring the frequency of observations in each interval.

2.2 Probability distribution function

Is the specific functional form of a probability distribution. There are two types:

For discrete RVs, a probability mass function (pmf): gives the specific probability of each value the RV can take. All probabilities add to 1.

E.g. the theoretical pmf for rolling a six-sided die:

For continuous RVs, a probability density function (pdf): gives the relative likelihood associated with each point in the sample space. The area under a pdf is 1. E.g. if instead of a six-sided die, you had a random number generator giving real numbers between 1 and 6, its pdf would be:

For continuous RVs, the absolute probability that it equals a particular value is zero, since there is an infinite range of possibility. Thus pdfs are used to predict the probability of an RV falling within a range of values, rather than on a specific value.

3 Expectated Value

The expected value of a random variable is the weighted average of all possible outcomes, using probabilities as weights. Also known as the mean:

\[\text{E}[X] = \sum_i P_i X_i = \mu\]

where \(\text{E}[\cdot]\) is the expectation operator and \(\mu\) denotes the mean of \(X\).

E.g. if you roll a six-sided die the expected value of \(X\) is:

\[\text{E}[X] = \sum_i P_i X_i = \frac 16 \cdot 1 + \frac 16 \cdot 2 + \frac 16 \cdot 3 + \frac 16 \cdot 4 + \frac 16 \cdot 5 + \frac 16 \cdot 6 = 3.5\]

3.1 Linearity

Expectation is linear:

\[\text{E}[aX] = a \text{E}[X]\]

e.g. if you multiplied values on the die by two, the expected value would also multiply by two:

\[\text{E}[2X] = 2 \text{E}[X] = 2 \cdot 3.5 = 7\]

3.2 Additivity

Expectation is additive:

\[\text{E}[X + Y] = \text{E}[X] + \text{E}[Y]\]

e.g. if you rolled two dice, and let \(X\) be the RV for one and \(Y\) the other, then

\[\text{E}[X + Y] = 3.5 + 3.5 = 7\]

3.3 Non-multiplicativity

In general the expected value of two RVs is not multiplicative, i.e.

\[\text{E}[XY] \neq \text{E}[X] \; \text{E}[Y]\]

The only exception is if \(X\) and \(Y\) are independent, in which case \(\text{E}[XY] = \text{E}[X] \; \text{E}[Y]\).

4 Moments

The concept of expectation can be generalized to include higher powers.

A moment is an expectation of a power of a random variable. The moment of an RV is defined as:

\[n \text{-th moment of } X = \text{E}[X^n] = \sum_i P_i X_i^n\]

E.g. for a six-sixed die roll:

the first moment of \(X\) is just the expected value:

\[\text{E}[X] = \sum_i P_i X_i = 3.5 = \mu\]

the second moment of \(X\) is:

\[\text{E}[X^2] = \sum_i P_i X_i^2 = \frac 16 \cdot 1^2 + \frac 16 \cdot 2^2 + \frac 16 \cdot 3^2 + \frac 16 \cdot 4^2 + \frac 16 \cdot 5^2 + \frac 16 \cdot 6^2 = \frac{91}{6}\]

and higher order moments follow the same form.

A central moment is an expectation of a power of a random variable about its mean. The central moment of an RV is defined as:

\[n \text{-th central moment of } X = \text{E}[(X-\mu)^n]\]

The first central moment of \(X\) is:

\[\text{E}[X - \mu] = \text{E}[X] - \text{E}[X] = 0\]

The second central moment of \(X\) is:

\[\text{E}[(X-\mu)^2] \label{secondCentralMoment}\]

This is also known as the variance of \(X\).

Expanding \(\eqref{secondCentralMoment}\) and using the fact that \(\mu = \text{E}[X]\),

\[ \begin{align} E[(X-\mu)^2] &= E[(X - E[X])^2] \nonumber\\ &= E[X^2 - 2XE[X] + E[X]^2] \nonumber\\ &= E[X^2] - 2E[X]E[X] + E[X]^2 \nonumber\\ &= E[X^2] - 2E[X]^2 + E[X]^2 \nonumber\\ &= E[X^2] - E[X]^2 \label{variance} \end{align} \]

Equations \(\eqref{secondCentralMoment}\) and \(\eqref{variance}\) are both common expressions for the variance of a random variable.

Moments give useful information about the properties of a random variable’s probability distribution (next).

5 Summary Statistics for Distributions

A summary statistic is a single value that summarizes some property of a distribution. Below are some common summary statistics used for describing the the distribution of random variables.

5.1 Mean

A measure of central tendency. Defined as the weighted average of all possible values of the RV, using probabilities as weights.

Calculated by taking the first moment (expected vale) of the RV:

\[\mu = \text{E}[X] = \sum_i P_i X_i\]

5.2 Variance

A measure of the spread of the distribution. Denoted \(\sigma^2\) or \(\text{Var}[\cdot]\).

Calculated by taking the second central moment of the RV:

\[\sigma^2 = \text{E}[(X-\mu)^2] = \text{E}[X^2] - \text{E}[X]^2\]

Unlike expectation, variance is not linear:

\[\text{Var}[aX] = a^2 \text{Var}[X]\]

Shifting the distribution left or right leaves the variance unchanged, since variance is a measure of spread:

\[\text{Var}[X+a] = \text{Var}[X]\]

5.3 Standard Deviation

Square root of the variance.

\[\sigma = \sqrt{\text{E}[(X-\mu)^2]}\]

5.4 Skewness

A measure of the extent to which a distribution is skewed to one side.

Defined as the third standardized moment of the RV:

\[\gamma_1 = \text{E}\bigg[ \bigg( \frac{X - \mu}{\sigma} \bigg)^3 \bigg] = \frac{\text{E}[(X-\mu)^3]}{\sigma^3}\]

5.5 Kurtosis

A measure of the “fatness” of the tails of the distribution.

Defined as the fourth standardized moment:

\[\gamma_2 = E \bigg[ \bigg( \frac{X - \mu}{\sigma} \bigg)^4 \bigg] = \frac{E[(X-\mu)^4]}{\sigma^4}\]