[1] 0.1
NYU Applied Statistics for Social Science Research
\[ \DeclareMathOperator{\E}{\mathbb{E}} \DeclareMathOperator{\P}{\mathbb{P}} \DeclareMathOperator{\V}{\mathbb{V}} \DeclareMathOperator{\L}{\mathcal{L}} \DeclareMathOperator{\I}{\text{I}} \DeclareMathOperator*{\argmax}{arg\,max} \DeclareMathOperator*{\argmin}{arg\,min} \]
logit <- qlogis
and invlogit <- plogis
We consider K regression inputs and independent priors on all \(K + 1\) unknowns: \(\alpha\) and \(\beta_1, \beta_2, ..., \beta_k\)
Bernoulli likelihood is: \(f(y \mid p) = p^y (1 - p)^{1-y}\) with \(y \in \{0, 1\}\)
And each \(p_i = \text{logit}^{-1}(\alpha + X_i\beta)\) \[ f\left(\alpha,\beta \mid y,X\right) \propto f_{\alpha}\left(\alpha\right) \cdot \prod_{k=1}^K f_{\beta}\left(\beta_k\right) \cdot \\ \prod_{i=1}^N \left(\text{logit}^{-1}(\alpha + X_i\beta) \right)^{y_i} \left(1 - \text{logit}^{-1}(\alpha + X_i\beta)\right)^{1-y_i} \]
In Stan, the likelihood term can be written on a log scale as y ~ bernoulli_logit_glm(x, alpha, beta)
or bernoulli_logit_glm_lupmf(y | x, alpha, beta)
set.seed(123)
logit <- qlogis; invlogit <- plogis
n <- 100
a <- 1.2
b <- 0.4
x <- runif(n, -15, 10)
eta <- a + x * b
Pr <- invlogit(eta)
y <- rbinom(n, 1, Pr)
sim <- tibble(y, x, Pr)
p <- ggplot(aes(x, y), data = sim)
p <- p + geom_point(size = 0.5) +
geom_line(aes(x, Pr), linewidth = 0.2) +
geom_vline(xintercept = 0, color = "red", linewidth = 0.2,
linetype = "dashed", alpha = 1/3) +
geom_hline(yintercept = invlogit(a), color = "red", linewidth = 0.2,
linetype = "dashed", alpha = 1/3) +
geom_hline(yintercept = 0.50, linewidth = 0.2, linetype = "dashed", alpha = 1/3) +
ggtitle(TeX("$y_i \\sim Bernoulli(logit^{-1}(1.2 + 0.4x_i))$")) +
annotate("text", x = -5.5, y = invlogit(a) - 0.02,
label = TeX("Intercept = $logit^{-1}(1.2)$ \\approx 0.77")) +
annotate("text", x = -8, y = 0.53,
label = TeX("$slope_{.5} = \\frac{0.4}{4} = 0.10$")) +
ylab(TeX("$logit^{-1}(1.2 + 0.4x)$")); print(p)
Model Info:
function: stan_glm
family: binomial [logit]
formula: y ~ x
algorithm: sampling
sample: 2000 (posterior sample size)
priors: see help('prior_summary')
observations: 100
predictors: 2
Estimates:
mean sd 10% 50% 90%
(Intercept) 1.5 0.5 0.9 1.5 2.1
x 0.5 0.1 0.4 0.5 0.7
Fit Diagnostics:
mean sd 10% 50% 90%
mean_PPD 0.5 0.0 0.5 0.5 0.6
The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
MCMC diagnostics
mcse Rhat n_eff
(Intercept) 0.0 1.0 1132
x 0.0 1.0 1106
mean_PPD 0.0 1.0 1584
log-posterior 0.0 1.0 817
For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
prior_pred_logit <- function(x) {
a <- rnorm(1, mean = 1.2, sd = 0.5)
b <- rnorm(1, mean = 0.4, sd = 0.1)
Pr <- invlogit(a + b * x)
return(Pr)
}
prior_pred <- replicate(50, prior_pred_logit(x)) |>
as.data.frame()
df_long <- prior_pred |>
mutate(x = x) |>
pivot_longer(cols = -x, names_to = "line", values_to = "y")
p <- ggplot(aes(x, y), data = df_long)
p + geom_line(aes(group = line), linewidth = 0.2, alpha = 1/5) +
geom_line(aes(y = Pr), data = sim, linewidth = 0.5, color = 'red') +
ylab(TeX("$logit^{-1}(\\alpha + \\beta x)$")) +
ggtitle(TeX("Simulating from prior $logit^{-1}(\\alpha + \\beta x_i))$"),
subtitle = TeX("$\\alpha \\sim Normal(1.2, 0.5)$ and $\\beta \\sim Normal(0.4, 0.1)$"))
pdp
pregnant | glucose | pressure | triceps | insulin | mass | pedigree | age | diabetes |
---|---|---|---|---|---|---|---|---|
6 | 148 | 72 | 35 | NA | 33.6 | 0.627 | 50 | pos |
1 | 85 | 66 | 29 | NA | 26.6 | 0.351 | 31 | neg |
8 | 183 | 64 | NA | NA | 23.3 | 0.672 | 32 | pos |
1 | 89 | 66 | 23 | 94 | 28.1 | 0.167 | 21 | neg |
0 | 137 | 40 | 35 | 168 | 43.1 | 2.288 | 33 | pos |
5 | 116 | 74 | NA | NA | 25.6 | 0.201 | 30 | neg |
mice
package and brms
, which works nicely with mice
p1 <- ggplot(pima, aes(x = mass)) +
geom_density(aes(group = diabetes, fill = diabetes, color = diabetes), alpha = 1/5) +
xlab("BMI")
p2 <- pima |>
drop_na() |>
mutate(bmi_cut = cut(mass, breaks = seq(15, 70, by = 5))) |>
group_by(bmi_cut) |>
summarize(p = mean(diabetes == "pos"),
n = n(),
se = sqrt(p * (1 - p) / n),
lower = p + se,
upper = p - se) |>
ggplot(aes(x = bmi_cut, y = p)) +
geom_point() + geom_linerange(aes(ymin = lower, ymax = upper), linewidth = 0.2) +
xlab("BMI range") + ylab("Proportion") +
ggtitle("Proportion of diabetics by BMI +/- 1 SE")
library(pdp)
d <- pima |>
as_tibble() |>
select(diabetes, age, pedigree, mass, glucose) |>
drop_na() |> # in an important analysis, you should not do this; instead, impute
mutate(diab = if_else(diabetes == "pos", 1, 0),
age = (age - mean(age)) / 10,
pedigree = (pedigree - mean(pedigree)),
bmi = ((mass - mean(mass)) / 10),
glucose = ((glucose - mean(glucose)) / sd(glucose)))
head(d)
# A tibble: 6 × 7
diabetes age pedigree mass glucose diab bmi
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 pos 1.67 0.154 33.6 0.852 1 0.115
2 neg -0.231 -0.122 26.6 -1.21 0 -0.585
3 pos -0.131 0.199 23.3 2.00 1 -0.915
4 neg -1.23 -0.306 28.1 -1.08 0 -0.435
5 pos -0.0312 1.81 43.1 0.492 1 1.06
6 neg -0.331 -0.272 25.6 -0.194 0 -0.685
\[ \begin{eqnarray} y_i &\sim& \text{Bernoulli}(p_i) \\ \eta_i &=& \alpha + \beta \left( \frac{x_{i} - \bar x}{10} \right) \\ p_i &=& \frac{e^{\eta_i}}{1 + e^{\eta_i}} \\ \alpha &\sim& \text{Normal}(\mu_\alpha ,\ \sigma_\alpha) \\ \beta &\sim& \text{Normal}(\mu_\beta , \ \sigma_\beta) \end{eqnarray} \]
invlogit(c(-1.5, 1.5)) = [0.18, 0.82]
\[ \begin{eqnarray} y_i &\sim& \text{Bernoulli}(p_i) \\ \eta_i &=& \alpha + \beta \left( \frac{x_{i} - \bar x}{10} \right) \\ p_i &=& \frac{e^{\eta_i}}{1 + e^{\eta_i}} \\ \alpha &\sim& \text{Normal}(0 ,\ 0.5) \\ \beta &\sim& \text{Normal}(0 , \ 1) \end{eqnarray} \]
Model Info:
function: stan_glm
family: binomial [logit]
formula: diab ~ bmi
algorithm: sampling
sample: 2000 (posterior sample size)
priors: see help('prior_summary')
observations: 752
predictors: 2
Estimates:
mean sd 10% 50% 90%
(Intercept) 0.0 0.5 -0.6 0.0 0.6
bmi 0.0 1.0 -1.3 0.0 1.3
MCMC diagnostics
mcse Rhat n_eff
(Intercept) 0.0 1.0 1215
bmi 0.0 1.0 1108
log-posterior 0.0 1.0 932
For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
\[ \begin{eqnarray} y_i &\sim& \text{Bernoulli}(p_i) \\ \eta_i &=& \alpha + \beta_1 \left( \frac{x_{i} - \bar x}{10} \right) \\ p_i &=& \frac{e^{\eta_i}}{1 + e^{\eta_i}} \\ \alpha &\sim& \text{Normal}(0 ,\ 0.5) \\ \beta &\sim& \text{Normal}(0.7 , \ 0.2) \end{eqnarray} \]
Model Info:
function: stan_glm
family: binomial [logit]
formula: diab ~ bmi
algorithm: sampling
sample: 2000 (posterior sample size)
priors: see help('prior_summary')
observations: 752
predictors: 2
Estimates:
mean sd 10% 50% 90%
(Intercept) 0.0 0.5 -0.6 0.0 0.6
bmi 0.7 0.2 0.4 0.7 1.0
MCMC diagnostics
mcse Rhat n_eff
(Intercept) 0.0 1.0 1215
bmi 0.0 1.0 1108
log-posterior 0.0 1.0 932
For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
Model Info:
function: stan_glm
family: binomial [logit]
formula: diab ~ bmi
algorithm: sampling
sample: 2000 (posterior sample size)
priors: see help('prior_summary')
observations: 752
predictors: 2
Estimates:
mean sd 10% 50% 90%
(Intercept) -0.6 0.1 -0.7 -0.6 -0.5
bmi 0.9 0.1 0.8 0.9 1.1
Fit Diagnostics:
mean sd 10% 50% 90%
mean_PPD 0.4 0.0 0.3 0.4 0.4
The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
MCMC diagnostics
mcse Rhat n_eff
(Intercept) 0.0 1.0 1315
bmi 0.0 1.0 1299
mean_PPD 0.0 1.0 1651
log-posterior 0.0 1.0 780
For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
posterior_predict
or by constructing the posterior distribution directlybmi_scaled <- (40 - mean(d$mass)) / 10
yepred_m2 <- posterior_epred(m2, newdata = data.frame(bmi = bmi_scaled))
quantile(yepred_m2, probs = c(0.05, 0.50, 0.95)) |> round(2)
5% 50% 95%
0.47 0.51 0.56
d_m2 <- as_tibble(m2) |>
mutate(log_odds = `(Intercept)` + bmi * bmi_scaled,
prob = invlogit(log_odds),
ypred = rbinom(2e3, size = 1, prob = prob))
d_m2[1:3, ]
# A tibble: 3 × 5
`(Intercept)` bmi log_odds prob ypred
<dbl> <dbl> <dbl> <dbl> <int>
1 -0.676 0.897 0.000969 0.500 1
2 -0.746 0.936 -0.0403 0.490 0
3 -0.682 0.957 0.0402 0.510 0
5% 50% 95%
0.47 0.51 0.56
Model Info:
function: stan_glm
family: binomial [logit]
formula: diab ~ bmi + pedigree + age + glucose
algorithm: sampling
sample: 2000 (posterior sample size)
priors: see help('prior_summary')
observations: 752
predictors: 5
Estimates:
mean sd 10% 50% 90%
(Intercept) -0.8 0.1 -0.9 -0.8 -0.7
bmi 0.8 0.1 0.7 0.8 1.0
pedigree 0.8 0.3 0.5 0.8 1.2
age 0.3 0.1 0.2 0.3 0.4
glucose 1.1 0.1 0.9 1.1 1.2
Fit Diagnostics:
mean sd 10% 50% 90%
mean_PPD 0.4 0.0 0.3 0.4 0.4
The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
MCMC diagnostics
mcse Rhat n_eff
(Intercept) 0.0 1.0 2663
bmi 0.0 1.0 2623
pedigree 0.0 1.0 2128
age 0.0 1.0 2551
glucose 0.0 1.0 2861
mean_PPD 0.0 1.0 2566
log-posterior 0.1 1.0 747
For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
install.packages("shinystan")
and launch_shinystan(m3)
library(splines)
priors <- normal(location = c(0.7, rep(0, 7)),
scale = c(0.2, rep(1, 7)))
m4 <- stan_glm(diab ~ bmi + pedigree +
bs(age, df = 4) +
glucose + glucose:pedigree,
prior_intercept = normal(0, 0.5),
prior = priors,
family = binomial(link = "logit"),
data = d,
refresh = 0,
seed = 123,
iter = 1000)
summary(m4)
Model Info:
function: stan_glm
family: binomial [logit]
formula: diab ~ bmi + pedigree + bs(age, df = 4) + glucose + glucose:pedigree
algorithm: sampling
sample: 2000 (posterior sample size)
priors: see help('prior_summary')
observations: 752
predictors: 9
Estimates:
mean sd 10% 50% 90%
(Intercept) -1.6 0.3 -2.0 -1.6 -1.2
bmi 0.8 0.1 0.6 0.8 0.9
pedigree 0.9 0.3 0.6 0.9 1.3
bs(age, df = 4)1 0.3 0.4 -0.2 0.3 0.8
bs(age, df = 4)2 2.6 0.6 1.9 2.6 3.3
bs(age, df = 4)3 0.7 0.7 -0.2 0.7 1.6
bs(age, df = 4)4 -0.6 0.8 -1.6 -0.6 0.4
glucose 1.1 0.1 1.0 1.1 1.3
pedigree:glucose -0.6 0.3 -0.9 -0.6 -0.3
Fit Diagnostics:
mean sd 10% 50% 90%
mean_PPD 0.4 0.0 0.3 0.4 0.4
The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
MCMC diagnostics
mcse Rhat n_eff
(Intercept) 0.0 1.0 1643
bmi 0.0 1.0 2228
pedigree 0.0 1.0 2015
bs(age, df = 4)1 0.0 1.0 1758
bs(age, df = 4)2 0.0 1.0 1999
bs(age, df = 4)3 0.0 1.0 1853
bs(age, df = 4)4 0.0 1.0 2086
glucose 0.0 1.0 1749
pedigree:glucose 0.0 1.0 1633
mean_PPD 0.0 1.0 1912
log-posterior 0.1 1.0 960
For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
loo
and loo_compare
in R)\[ \begin{split} f(\theta, \tau \mid y) & \propto \underbrace{f(y \mid \theta, \tau)}_\text{Likelihood} \; \underbrace{ f(\theta, \tau)}_\text{Joint Prior} \\ & = f(y \mid \theta, \tau) \; f(\theta \mid \tau) \; f(\tau) \\ & = f(y \mid \theta) \; f(\theta \mid \tau) \; f(\tau) \end{split} \]
sleepstudy
from lme4
package\[ \theta_j^{\text{pooled}} \approx \frac{\frac{n_j}{\sigma_{y}^2} \overline y_j + \frac{1}{\sigma_{\tau}^2} \overline y_{\tau}} {\frac{n_j}{\sigma_{y}^2} + \frac{1}{\sigma_{\tau}^2}} \]
\[ \begin{eqnarray} y_{ij} & \sim & \text{Normal}(\mu, \ \sigma^2) \\ \mu & \sim & \text{Normal}(300, 10^2) \\ \sigma & \sim & \text{Exponential}(0.02) \end{eqnarray} \]
Model Info:
function: stan_glm
family: gaussian [identity]
formula: Reaction ~ 1
algorithm: sampling
sample: 10000 (posterior sample size)
priors: see help('prior_summary')
observations: 125
predictors: 1
Estimates:
mean sd 10% 50% 90%
(Intercept) 294.5 4.2 289.2 294.5 299.9
sigma 52.9 3.4 48.7 52.7 57.3
Fit Diagnostics:
mean sd 10% 50% 90%
mean_PPD 294.4 6.3 286.4 294.4 302.5
The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
MCMC diagnostics
mcse Rhat n_eff
(Intercept) 0.1 1.0 6347
sigma 0.0 1.0 6890
mean_PPD 0.1 1.0 8264
log-posterior 0.0 1.0 4152
For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
stan_glm
, or you can do in all at once\[ \begin{eqnarray} y_{ij} & \sim & \text{Normal}(\mu_j, \ \sigma^2) \\ \mu_j & \sim & \text{Normal}(300, s_j^2) \\ \sigma & \sim & \text{Exponential}(0.02) \end{eqnarray} \]
m2 <- stan_glm(Reaction ~ Subject - 1,
prior = normal(300, 10),
prior_aux = exponential(0.02),
family = gaussian,
data = d,
iter = 1000,
refresh = 0,
seed = 123)
summary(m2)
Model Info:
function: stan_glm
family: gaussian [identity]
formula: Reaction ~ Subject - 1
algorithm: sampling
sample: 2000 (posterior sample size)
priors: see help('prior_summary')
observations: 125
predictors: 18
Estimates:
mean sd 10% 50% 90%
Subject308 309.7 9.4 297.5 309.9 321.7
Subject309 279.0 9.0 267.7 278.8 290.3
Subject310 279.6 8.8 268.6 279.6 290.6
Subject330 301.8 8.2 290.9 302.0 312.5
Subject331 302.0 8.4 291.5 301.8 312.6
Subject332 294.1 8.9 282.6 294.4 305.4
Subject333 300.9 9.2 289.2 300.9 312.8
Subject334 299.2 8.4 288.9 299.1 310.0
Subject335 286.3 8.6 275.3 286.2 297.6
Subject337 314.5 9.0 303.1 314.5 326.0
Subject349 296.0 9.0 284.4 295.8 307.6
Subject350 308.9 8.6 298.1 308.8 319.9
Subject351 296.9 9.2 284.7 297.1 308.8
Subject352 308.7 9.3 296.7 308.5 320.6
Subject369 298.5 8.7 287.6 298.5 309.6
Subject370 292.0 8.7 280.9 292.0 303.5
Subject371 298.5 7.8 288.7 298.7 308.6
Subject372 305.2 7.9 295.1 305.3 315.5
sigma 47.0 3.4 42.7 46.8 51.4
Fit Diagnostics:
mean sd 10% 50% 90%
mean_PPD 298.1 4.6 292.3 298.1 304.1
The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
MCMC diagnostics
mcse Rhat n_eff
Subject308 0.2 1.0 2606
Subject309 0.2 1.0 2503
Subject310 0.2 1.0 2222
Subject330 0.1 1.0 3110
Subject331 0.1 1.0 3117
Subject332 0.2 1.0 3470
Subject333 0.2 1.0 2827
Subject334 0.2 1.0 2636
Subject335 0.2 1.0 2273
Subject337 0.2 1.0 2461
Subject349 0.2 1.0 2402
Subject350 0.2 1.0 3113
Subject351 0.2 1.0 2253
Subject352 0.2 1.0 2841
Subject369 0.2 1.0 2801
Subject370 0.2 1.0 2983
Subject371 0.2 1.0 2548
Subject372 0.1 1.0 2766
sigma 0.1 1.0 2400
mean_PPD 0.1 1.0 2112
log-posterior 0.1 1.0 878
For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
\[ \begin{eqnarray} y_{ij} &\sim& \text{Normal}(\mu_j, \ \sigma_y^2) \\ \mu_{j} &\sim& \text{Normal}(\mu, \ \sigma_{\mu}^2) \\ \mu &\sim& \text{Normal}(300, 50^2) \\ \sigma_y &\sim& \text{Exponential}(0.02) \\ \sigma_{\mu} &\sim& \text{Exponential}(1) \end{eqnarray} \]
Model Info:
function: stan_glmer
family: gaussian [identity]
formula: Reaction ~ (1 | Subject)
algorithm: sampling
sample: 3000 (posterior sample size)
priors: see help('prior_summary')
observations: 125
groups: Subject (18)
Estimates:
mean sd 10% 50% 90%
(Intercept) 294.2 9.8 281.8 294.5 306.5
b[(Intercept) Subject:308] 49.3 17.5 27.3 49.0 71.5
b[(Intercept) Subject:309] -71.7 16.5 -92.5 -71.5 -50.8
b[(Intercept) Subject:310] -62.7 15.1 -82.3 -62.5 -43.9
b[(Intercept) Subject:330] 10.0 15.0 -8.8 9.8 29.0
b[(Intercept) Subject:331] 13.3 16.4 -7.2 13.4 33.9
b[(Intercept) Subject:332] -19.2 16.4 -39.7 -19.1 2.1
b[(Intercept) Subject:333] 9.8 18.0 -13.0 9.8 32.4
b[(Intercept) Subject:334] 2.7 14.8 -16.2 2.6 21.6
b[(Intercept) Subject:335] -40.4 15.4 -60.1 -40.2 -21.1
b[(Intercept) Subject:337] 57.4 16.1 37.2 57.2 77.4
b[(Intercept) Subject:349] -10.8 16.3 -31.4 -10.8 9.9
b[(Intercept) Subject:350] 37.1 15.5 17.3 37.1 56.8
b[(Intercept) Subject:351] -10.9 18.2 -34.6 -11.0 12.7
b[(Intercept) Subject:352] 43.1 17.3 21.1 42.8 65.0
b[(Intercept) Subject:369] 0.5 16.3 -20.3 0.3 21.2
b[(Intercept) Subject:370] -28.2 16.3 -48.9 -28.3 -6.9
b[(Intercept) Subject:371] 0.7 14.3 -17.1 0.6 19.3
b[(Intercept) Subject:372] 21.3 14.1 3.8 21.3 39.0
sigma 37.5 2.5 34.3 37.4 40.8
Sigma[Subject:(Intercept),(Intercept)] 1685.8 706.9 968.5 1538.3 2541.1
Fit Diagnostics:
mean sd 10% 50% 90%
mean_PPD 293.0 4.9 286.8 292.9 299.4
The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
MCMC diagnostics
mcse Rhat n_eff
(Intercept) 0.5 1.0 383
b[(Intercept) Subject:308] 0.5 1.0 1285
b[(Intercept) Subject:309] 0.5 1.0 1171
b[(Intercept) Subject:310] 0.5 1.0 987
b[(Intercept) Subject:330] 0.5 1.0 821
b[(Intercept) Subject:331] 0.6 1.0 889
b[(Intercept) Subject:332] 0.5 1.0 1103
b[(Intercept) Subject:333] 0.5 1.0 1267
b[(Intercept) Subject:334] 0.5 1.0 797
b[(Intercept) Subject:335] 0.5 1.0 1092
b[(Intercept) Subject:337] 0.5 1.0 965
b[(Intercept) Subject:349] 0.5 1.0 1196
b[(Intercept) Subject:350] 0.5 1.0 885
b[(Intercept) Subject:351] 0.4 1.0 1648
b[(Intercept) Subject:352] 0.5 1.0 1246
b[(Intercept) Subject:369] 0.5 1.0 928
b[(Intercept) Subject:370] 0.5 1.0 1020
b[(Intercept) Subject:371] 0.5 1.0 819
b[(Intercept) Subject:372] 0.5 1.0 754
sigma 0.0 1.0 2630
Sigma[Subject:(Intercept),(Intercept)] 23.6 1.0 895
mean_PPD 0.1 1.0 3211
log-posterior 0.2 1.0 617
For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
muj <- m3 |>
spread_draws(`(Intercept)`, b[ ,Subject]) |>
mutate(mu_j = `(Intercept)` + b) |>
select(Subject, mu_j) |>
mean_qi(.width = 0.90)
head(muj)
# A tibble: 6 × 7
Subject mu_j .lower .upper .width .point .interval
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 Subject:308 344. 318. 369. 0.9 mean qi
2 Subject:309 223. 200. 245. 0.9 mean qi
3 Subject:310 231. 211. 252. 0.9 mean qi
4 Subject:330 304. 285. 324. 0.9 mean qi
5 Subject:331 308. 284. 330. 0.9 mean qi
6 Subject:332 275. 252. 298. 0.9 mean qi
m4 <- stan_glmer(Reaction ~ Days + (Days | Subject),
prior_intercept = normal(300, 50),
prior = normal(0, 2),
prior_aux = exponential(0.02),
prior_covariance = decov(reg = 1,
conc = 1, shape = 1, scale = 1),
family = gaussian, data = d, iter = 1500,
cores = 4, seed = 123, refresh = 0)
summary(m4)
Model Info:
function: stan_glmer
family: gaussian [identity]
formula: Reaction ~ Days + (Days | Subject)
algorithm: sampling
sample: 3000 (posterior sample size)
priors: see help('prior_summary')
observations: 125
groups: Subject (18)
Estimates:
mean sd 10% 50% 90%
(Intercept) 260.7 9.8 248.7 260.3 272.7
Days 4.4 1.7 2.2 4.5 6.7
b[(Intercept) Subject:308] 49.2 23.8 19.6 47.4 80.6
b[Days Subject:308] 3.1 5.0 -3.2 3.3 9.2
b[(Intercept) Subject:309] -44.2 15.8 -64.2 -43.5 -24.5
b[Days Subject:309] -4.3 3.2 -8.3 -4.3 -0.2
b[(Intercept) Subject:310] -45.8 15.6 -66.0 -45.0 -25.8
b[Days Subject:310] -1.0 3.1 -5.0 -1.1 3.0
b[(Intercept) Subject:330] 24.7 14.4 6.7 24.7 43.6
b[Days Subject:330] -0.6 2.7 -4.1 -0.6 2.9
b[(Intercept) Subject:331] 17.7 14.2 -0.5 17.4 35.9
b[Days Subject:331] 4.5 3.1 0.6 4.4 8.5
b[(Intercept) Subject:332] 2.5 14.4 -16.1 2.8 20.5
b[Days Subject:332] -1.7 3.1 -5.7 -1.8 2.2
b[(Intercept) Subject:333] 11.4 15.8 -8.4 11.5 31.2
b[Days Subject:333] 3.7 3.4 -0.7 3.6 8.1
b[(Intercept) Subject:334] -13.4 15.1 -32.3 -13.1 5.5
b[Days Subject:334] 6.7 2.9 3.1 6.6 10.4
b[(Intercept) Subject:335] -5.8 15.0 -24.8 -5.6 13.2
b[Days Subject:335] -5.6 2.8 -9.2 -5.6 -2.0
b[(Intercept) Subject:337] 29.0 15.1 9.6 29.2 48.1
b[Days Subject:337] 14.3 3.4 10.1 14.3 18.7
b[(Intercept) Subject:349] -28.8 21.1 -55.6 -28.1 -2.6
b[Days Subject:349] 5.7 3.8 0.8 5.5 10.7
b[(Intercept) Subject:350] -17.2 19.9 -43.0 -16.6 7.7
b[Days Subject:350] 12.9 3.4 8.8 12.9 17.2
b[(Intercept) Subject:351] 7.4 18.7 -15.7 7.4 30.6
b[Days Subject:351] -1.7 4.1 -7.0 -1.8 3.4
b[(Intercept) Subject:352] 26.9 17.9 4.3 26.8 49.5
b[Days Subject:352] 7.1 3.7 2.4 7.1 11.8
b[(Intercept) Subject:369] -2.8 14.7 -21.6 -2.1 15.6
b[Days Subject:369] 5.2 3.0 1.2 5.1 9.0
b[(Intercept) Subject:370] -36.6 17.2 -58.5 -36.3 -14.9
b[Days Subject:370] 9.5 3.9 4.5 9.5 14.4
b[(Intercept) Subject:371] -4.7 14.5 -23.4 -4.7 13.2
b[Days Subject:371] 4.3 2.7 0.9 4.3 7.9
b[(Intercept) Subject:372] 7.3 14.9 -11.6 7.9 25.6
b[Days Subject:372] 6.6 2.8 3.0 6.5 10.1
sigma 21.9 1.6 19.9 21.8 24.1
Sigma[Subject:(Intercept),(Intercept)] 987.0 518.4 462.4 873.6 1651.3
Sigma[Subject:Days,(Intercept)] -21.9 94.4 -139.2 -12.1 82.0
Sigma[Subject:Days,Days] 70.1 37.1 33.1 62.2 116.3
Fit Diagnostics:
mean sd 10% 50% 90%
mean_PPD 293.3 2.8 289.8 293.3 296.8
The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
MCMC diagnostics
mcse Rhat n_eff
(Intercept) 0.4 1.0 569
Days 0.1 1.0 961
b[(Intercept) Subject:308] 0.6 1.0 1552
b[Days Subject:308] 0.1 1.0 1472
b[(Intercept) Subject:309] 0.5 1.0 1134
b[Days Subject:309] 0.1 1.0 2150
b[(Intercept) Subject:310] 0.5 1.0 887
b[Days Subject:310] 0.1 1.0 1677
b[(Intercept) Subject:330] 0.4 1.0 1642
b[Days Subject:330] 0.1 1.0 2355
b[(Intercept) Subject:331] 0.4 1.0 1413
b[Days Subject:331] 0.1 1.0 2351
b[(Intercept) Subject:332] 0.4 1.0 1535
b[Days Subject:332] 0.1 1.0 2530
b[(Intercept) Subject:333] 0.4 1.0 1640
b[Days Subject:333] 0.1 1.0 2551
b[(Intercept) Subject:334] 0.5 1.0 921
b[Days Subject:334] 0.1 1.0 1513
b[(Intercept) Subject:335] 0.4 1.0 1559
b[Days Subject:335] 0.1 1.0 2214
b[(Intercept) Subject:337] 0.5 1.0 1073
b[Days Subject:337] 0.1 1.0 1528
b[(Intercept) Subject:349] 0.7 1.0 1040
b[Days Subject:349] 0.1 1.0 1273
b[(Intercept) Subject:350] 0.8 1.0 636
b[Days Subject:350] 0.1 1.0 1125
b[(Intercept) Subject:351] 0.4 1.0 2400
b[Days Subject:351] 0.1 1.0 2377
b[(Intercept) Subject:352] 0.4 1.0 1719
b[Days Subject:352] 0.1 1.0 2001
b[(Intercept) Subject:369] 0.5 1.0 1019
b[Days Subject:369] 0.1 1.0 1930
b[(Intercept) Subject:370] 0.7 1.0 683
b[Days Subject:370] 0.1 1.0 1002
b[(Intercept) Subject:371] 0.5 1.0 1022
b[Days Subject:371] 0.1 1.0 1760
b[(Intercept) Subject:372] 0.4 1.0 1214
b[Days Subject:372] 0.1 1.0 1789
sigma 0.1 1.0 943
Sigma[Subject:(Intercept),(Intercept)] 18.0 1.0 829
Sigma[Subject:Days,(Intercept)] 4.3 1.0 484
Sigma[Subject:Days,Days] 1.2 1.0 891
mean_PPD 0.0 1.0 3200
log-posterior 0.4 1.0 359
For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
new_subj <- data.frame(Days = 0:9,
Subject = as.factor(rep(400, 10)))
ypred_subj <- posterior_predict(m4,
newdata = new_subj)
new_subj |>
add_predicted_draws(m4) |>
ggplot(aes(x = Days)) +
stat_lineribbon(aes(y = .prediction),
width = c(.9, .8, .5),
alpha = 0.25) +
ylab("Reaction time") +
ggtitle("Prediction for an unobserved subject") +
scale_fill_brewer(palette = "Greys")