1 Probability Spaces

A random process is any process whose outcome is subject in some way to chance (randomness).

Probability spaces are used to model random processes. They have three components:

\(\Omega\), a sample space, which is the set of all possible outcomes for the process
- e.g. if you toss a coin twice: \(\Omega = \{ HH, HT, TH, TT \}\)
\(\mathcal F\), a set of events (event space), that make up all possible subsets of the sample space
- e.g. \(\mathcal F = \{ \text{getting 1 head, getting 2 heads, ...} \}\)
\(\text{P}\), a probability function, that assigns probabilities to each event
- e.g. the probability of getting 2 heads: \(\text{P}(\text{ 2 heads }) = \frac 14\)

The sample space of a random process can be discrete or continuous. Discrete sample spaces have countable and well-defined set of outcomes (tossing a coin, rolling a die, etc.). Continuous sample spaces have an infinite and uncountable set of possible outcomes (temperature of a substance, height of a person, etc.).

In R you can simulate a discrete random process using the sample() function. You must specify the sample space, the sample size, and whether or not you want to sample with replacement.

E.g. to simulate tossing a coin twice:

sample(x = c('H','T'), size = 2, replace = TRUE)

## [1] "T" "H"

2 Some Probability Axioms

Probability spaces come with a set of postulates that describe how probabilities are related to each other. These are called the Kolmogorov axioms.

2.1 The complement rule

The probabilities of all possible events in the event space \(\mathcal F\) must sum to 1. If \(A\) is an event, and if \(\text{P}(A)\) denotes the probability that \(A\) happens, then the probability of \(A\) not happening is:

\[\text{P}(A^c) = 1 - \text{P}(A)\]

where \(A^c\) is the complement of event \(A\).

2.2 Conditional probability

For two events, \(A\) and \(B\), the probability that \(A\) occurs given that \(B\) has already occurred is:

\[\text{P}(A | B) = \frac{\text{P}(A \cap B)}{\text{P}(B)}\]

The RHS of this equation essentially restricts the sample space of \(A\) and \(B\) to only those outcomes where \(B\) occurs.

Here is a reference on set notation.

2.3 The multiplication rule

Rearranging the law of conditional probability gives the multiplication rule:

\[\text{P}(A \cap B) = \text{P}(A | B) \; \text{P}(B)\]

i.e. the probability that two events both happen equals the probability the first will happen, multiplied by the probability the second will happen given the first has happened.

2.4 Independence

Two events are independent if the occurrence of one has no effect on the probability that the other occurs.

If \(A\) and \(B\) are independent:

\[\text{P}(A | B) = \text{P}(A) \hspace{2.0cm}\]

i.e. the probability that \(A\) occurs is invariant to whether or not \(B\) has occurred.

This gives the multiplication rule for independent events:

\[\text{P}(A \cap B) = \text{P}(A) \; \text{P}(B) \hspace{2.0cm}\]

i.e. if two events are independent, the probability they both happen is simply the product of their individual probabilities.

2.5 Mutual exclusivity

Two events are mutually exclusive or disjoint if they cannot both happen.

\[\text{P}(A \cap B) = 0 \hspace{2.0cm}\]

2.6 The addition rule

This leads to the addition rule: for two mutually exclusive events \(A\) and \(B\), the probability that at least one will happen is:

\[\text{P}(A \cup B) = \text{P}(A) + \text{P}(B) \hspace{2.0cm}\]

It should also follow that if \(A\) and \(B\) are not mutually exclusive, i.e. \(\text{P}(A \cap B \neq 0)\), then the probability that at least one will happen is:

\[\text{P}(A \cup B) = \text{P}(A) + \text{P}(B) - \text{P}(A \cap B)\]

2.7 The law of total probability

Given two events in the event space, \(A\) and \(B\), the probability that \(A\) occurs can be written:

\[\text{P}(A) = \text{P}(A \cap B) + \text{P}(A \cap B^c)\]

i.e. the probability that \(A\) occurs equals the probability that \(A\) and \(B\) both occur, plus the probability that \(A\) occurs and \(B\) doesn’t occur.

Using the definition of conditional probability, this can also be written:

\[\text{P}(A) = \text{P}(A | B) \; \text{P}(B) + \text{P}(A | B^c) \; \text{P}(B^c)\]

You can generalize this for event spaces with multiple events:

\[\text{P}(A) = \sum_i \text{P}(A \cap B_i) = \sum_i \text{P}(A | B_i) \; \text{P}(B_i)\]

where the \(B_i = \{B_1, B_2, ... B_n \}\) represent \(n\) general partitions of the sample space.

3 The Frequentist Interpretation of Probability

There is more than one way the concept of probability can be interpreted. The most common is the frequentist interpretation.

These are the main tenets of the frequentist probability:

probability is defined in terms of an event’s relative frequency in a large number of trials
e.g. if you conduct \(n\) trials, and \(k\) is the number of trials in which event \(X\) occurs, the probability of event \(X\) is defined as:

\[\text{P}(X) \approx \frac kn\]

the frequentist approach assumes that data is a repeatable random sample, and that if the experiment is repeated many times, the relative frequency of an event will converge to its true probability:

\[\text{P}(X) = \lim_{n \rightarrow \infty} \frac kn\]

E.g. the reason we “know” that the probability of a coin toss giving heads is one half—if indeed we really know this at all—is because when the process is observed many times, we find empirically that the proportion of heads tends to converge roughly to one half.

In R:

cointoss = sample(x = c('H','T'), size = 100000, replace = TRUE)
prop.table(table(cointoss))

## cointoss
##       H       T 
## 0.50087 0.49913

The frequentist approach doesn’t conflict with the axioms of probability. It only provides a “way” to interpret probability and apply these axioms to real-world processes.

Probability Spaces, Axioms