[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 2 3 4 5 6 7
[2,] 3 4 5 6 7 8
[3,] 4 5 6 7 8 9
[4,] 5 6 7 8 9 10
[5,] 6 7 8 9 10 11
[6,] 7 8 9 10 11 12
NYU Applied Statistics for Social Science Research
\[ \DeclareMathOperator{\E}{\mathbb{E}} \DeclareMathOperator{\P}{\mathbb{P}} \DeclareMathOperator{\V}{\mathbb{V}} \DeclareMathOperator{\L}{\mathscr{L}} \DeclareMathOperator{\I}{\text{I}} \]

https://tinyurl.com/two-truths-and

⚠️ What follows is an oversimplified opinion.
Summary of the book The Theory That Would Not Die

We may regard the present state of the universe as the effect of its past and the cause of its future. An intellect which at any given moment knew all of the forces that animate nature and the mutual positions of the beings that compose it, if this intellect were vast enough to submit the data to analysis, could condense into a single formula the movement of the greatest bodies of the universe and that of the lightest atom; for such an intellect nothing could be uncertain, and the future just like the past would be present before its eyes.

Marquis Pierre Simon de Laplace (1729 — 1827)
“Uncertainty is a function of our ignorance, not a property of the world.”


Left: Pierre Simon Laplace in the style of Wassily Kandinsky, by OpenAI DALL·E
Timonen, J., Mannerström, H., Vehtari, A., & Lähdesmäki, H. (2021). lgpr: An interpretable non-parametric method for inferring covariate effects from longitudinal data. Bioinformatics, 37(13), 1860–1867. https://doi.org/10.1093/bioinformatics/btab021
Random variable X for the number of Heads in two flips

sample() function to simulate rolls of a die and replicate() function to repeat the rolling process many timespurrr::map(1:n, \(x) expression)[1] 4 3
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 6 4 1 6 6 2 5 1
[2,] 6 5 5 3 2 1 1 3
[1] 12 9 6 9 8 3
[1] 0.9159
roll_sums < 11 creates an indicator variablemean() does the averageList of 20
$ : int [1:2] 6 2
$ : int [1:2] 4 4
$ : int [1:2] 5 2
$ : int [1:2] 6 2
$ : int [1:2] 5 2
$ : int [1:2] 5 1
$ : int [1:2] 3 6
$ : int [1:2] 3 4
$ : int [1:2] 4 4
$ : int [1:2] 3 2
$ : int [1:2] 2 2
$ : int [1:2] 5 6
$ : int [1:2] 5 2
$ : int [1:2] 4 2
$ : int [1:2] 6 1
$ : int [1:2] 4 1
$ : int [1:2] 5 4
$ : int [1:2] 1 2
$ : int [1:2] 4 2
$ : int [1:2] 3 2
List of 20
$ : int 6
$ : int [1:2] 4 1
$ : int [1:3] 1 2 3
$ : int [1:4] 1 4 5 2
$ : int [1:5] 6 2 3 4 4
$ : int [1:6] 5 3 4 5 4 1
$ : int [1:7] 1 5 4 5 5 4 3
$ : int [1:8] 1 6 5 6 4 4 3 4
$ : int [1:9] 3 3 6 6 5 1 4 1 6
$ : int [1:10] 1 6 5 5 3 4 5 1 3 5
$ : int [1:11] 4 4 2 4 6 2 1 2 2 3 ...
$ : int [1:12] 2 3 2 6 4 2 6 3 3 4 ...
$ : int [1:13] 2 3 2 2 2 5 1 3 6 2 ...
$ : int [1:14] 3 2 6 4 2 1 3 2 6 6 ...
$ : int [1:15] 3 4 3 5 1 6 1 2 5 4 ...
$ : int [1:16] 6 2 2 6 2 6 3 5 2 6 ...
$ : int [1:17] 3 5 3 6 5 4 2 1 5 2 ...
$ : int [1:18] 5 1 3 6 6 6 6 3 4 6 ...
$ : int [1:19] 5 5 6 1 6 5 1 2 4 3 ...
$ : int [1:20] 4 6 4 3 4 3 4 3 5 1 ...
| Sex | No | Yes | Total |
|---|---|---|---|
| Male | 0.620 | 0.167 | 0.787 |
| Female | 0.057 | 0.156 | 0.213 |
| Total | 0.677 | 0.323 | 1.000 |
| Sex | No | Yes | Total |
|---|---|---|---|
| Male | 0.620 | 0.167 | 0.787 |
| Female | 0.057 | 0.156 | 0.213 |
| Total | 0.677 | 0.323 | 1.000 |
“Untergang der Titanic”, as conceived by Willy Stöwer, 1912
Let \(A\) be a partition of \(\Omega\), so that each \(A_i\) is disjoint, \(\P(A_i >0)\), and \(\cup A_i = \Omega\). \[ \P(B) = \sum_{i=1}^{n} \P(B \cap A_i) = \sum_{i=1}^{n} \P(B \mid A_i) \P(A_i) \]
Image from Blitzstein and Hwang (2019), Page 55
\[ \P(A \mid B) = \frac{\P(B \cap A)}{\P(B)} = \frac{\P(B \mid A) \P(A)}{\sum_{i=1}^{n} \P(B \mid A_i) \P(A_i)} \]
We typically think of \(A\) is some unknown we wish to learn (e.g., the status of a disease) and \(B\) as the data we observe (e.g., the result of a diagnostic test)
We call \(\P(A)\) prior probability of A (e.g., how prevalent is the disease in the population)
We call \(\P(A \mid B)\), the posterior probability of the unknown \(A\) given data \(B\)
The idea is that we can approximate the ratio of the area of an inscribed circle, \(A_c\), to the area of the square, \(A_s\), by uniformly “throwing darts” at the square with the side \(2r\) and counting how many darts land inside the circle versus inside the square.
\[ \begin{align} A_{c}& = \pi r^2 \\ A_{s}& = (2r)^2 = 4r^2 \\ \frac{A_{c}}{A_{s}}& = \frac{\pi r^2}{4r^2} = \frac{\pi}{4} \implies \pi = \frac{4A_{c}}{A_{s}} \end{align} \]
To estimate \(\pi\), we perform the following simulation:
\[ \begin{align} X& \sim \text{Uniform}(-1, 1) \\ Y& \sim \text{Uniform}(-1, 1) \\ \pi& \approx \frac{4 \sum_{i=1}^{N} \I(x_i^2 + y_i^2 < 1)}{N} \end{align} \]
The numerator is a sum over an indicator function \(\I\), which evaluates to \(1\) if the inequality holds and \(0\) otherwise.
Image from Fokko Smits, Martijn Dirksen, and Ivo Schoots: RECIST 1.1 - and more
Greek letters will be used for latent parameters, and English letters will be used for observables.
For a more complete workflow, see Bayesian Workflow by Gelman et al. (2020)
dot_plot <- function(x, y) {
p <- ggplot(data.frame(x, y), aes(x, y))
p + geom_point(aes(x = x, y = y), size = 0.5) +
geom_segment(aes(x = x, y = 0, xend = x,
yend = y), linewidth = 0.2) +
xlab(expression(theta)) +
ylab(expression(f(theta)))
}
theta <- c(0.10, 0.30, 0.50, 0.70, 0.90)
prior <- c(0.05, 0.45, 0.30, 0.15, 0.05)
dot_plot(theta, prior) +
ggtitle("Prior probability of response")


Compare with the data model:

| theta | prior | lik | lik_x_prior | post |
|---|---|---|---|---|
| 0.1 | 0.05 | 0.01 | 0.00 | 0.00 |
| 0.3 | 0.45 | 0.13 | 0.06 | 0.29 |
| 0.5 | 0.30 | 0.31 | 0.09 | 0.46 |
| 0.7 | 0.15 | 0.31 | 0.05 | 0.23 |
| 0.9 | 0.05 | 0.07 | 0.00 | 0.02 |
| Total | 1.00 | 0.83 | 0.20 | 1.00 |
To compute event probabilities, we integrate (or sum) the relevant regions of the parameter space \[ \P(\theta \geq 0.5) = \int_{0.5}^{1} f(\theta \mid y) \, d\theta \]
In this case, we only have discrete quantities, so we sum:
| theta | prior | lik | lik_x_prior | post |
|---|---|---|---|---|
| 0.1 | 0.05 | 0.01 | 0.00 | 0.00 |
| 0.3 | 0.45 | 0.13 | 0.06 | 0.29 |
| 0.5 | 0.30 | 0.31 | 0.09 | 0.46 |
| 0.7 | 0.15 | 0.31 | 0.05 | 0.23 |
| 0.9 | 0.05 | 0.07 | 0.00 | 0.02 |
| Total | 1.00 | 0.83 | 0.20 | 1.00 |
| theta | prior | lik | lik_x_prior | post |
|---|---|---|---|---|
| 0.1 | 0.2 | 0.01 | 0.00 | 0.01 |
| 0.3 | 0.2 | 0.13 | 0.03 | 0.16 |
| 0.5 | 0.2 | 0.31 | 0.06 | 0.37 |
| 0.7 | 0.2 | 0.31 | 0.06 | 0.37 |
| 0.9 | 0.2 | 0.07 | 0.01 | 0.09 |
| Total | 1.0 | 0.83 | 0.17 | 1.00 |
Gelman, A. et al. (2020). Bayesian Workflow. ArXiv:2011.01808 [Stat]. http://arxiv.org/abs/2011.01808