[1] 0.1
NYU Applied Statistics for Social Science Research
\[ \DeclareMathOperator{\E}{\mathbb{E}} \DeclareMathOperator{\P}{\mathbb{P}} \DeclareMathOperator{\V}{\mathbb{V}} \DeclareMathOperator{\L}{\mathcal{L}} \DeclareMathOperator{\I}{\text{I}} \DeclareMathOperator*{\argmax}{arg\,max} \DeclareMathOperator*{\argmin}{arg\,min} \]
logit <- qlogis
and invlogit <- plogis
To derive the posterior distribution for Poisson, we consider K regression inputs and independent priors on all \(K + 1\) unknowns: \(\alpha\) and \(\beta_1, \beta_2, ..., \beta_k\)
Bernoulli likelihood is: \(f(y \mid p) = p^y (1 - p)^{1-y}\) with \(y \in \{0, 1\}\)
And each \(p_i = \text{logit}^{-1}(\alpha + X_i\beta)\) \[ f\left(\alpha,\beta \mid y,X\right) \propto f_{\alpha}\left(\alpha\right) \cdot \prod_{k=1}^K f_{\beta}\left(\beta_k\right) \cdot \\ \prod_{i=1}^N \left(\text{logit}^{-1}(\alpha + X_i\beta) \right)^{y_i} \left(1 - \text{logit}^{-1}(\alpha + X_i\beta)\right)^{1-y_i} \]
In Stan, the likelihood term can be written on a log scale as y ~ bernoulli_logit_glm(x, alpha, beta)
or bernoulli_logit_glm_lupmf(y | x, alpha, beta)
set.seed(123)
logit <- qlogis; invlogit <- plogis
n <- 100
a <- 1.2
b <- 0.4
x <- runif(n, -15, 10)
eta <- a + x * b
Pr <- invlogit(eta)
y <- rbinom(n, 1, Pr)
sim <- tibble(y, x, Pr)
p <- ggplot(aes(x, y), data = sim)
p <- p + geom_point(size = 0.5) +
geom_line(aes(x, Pr), linewidth = 0.2) +
geom_vline(xintercept = 0, color = "red", linewidth = 0.2,
linetype = "dashed", alpha = 1/3) +
geom_hline(yintercept = invlogit(a), color = "red", linewidth = 0.2,
linetype = "dashed", alpha = 1/3) +
geom_hline(yintercept = 0.50, linewidth = 0.2, linetype = "dashed", alpha = 1/3) +
ggtitle(TeX("$y_i \\sim Bernoulli(logit^{-1}(1.2 + 0.4x_i))$")) +
annotate("text", x = -5.5, y = invlogit(a) - 0.02,
label = TeX("Intercept = $logit^{-1}(1.2)$ \\approx 0.77")) +
annotate("text", x = -8, y = 0.53,
label = TeX("$slope_{.5} = \\frac{0.4}{4} = 0.10$")) +
ylab(TeX("$logit^{-1}(1.2 + 0.4x)$")); print(p)
Model Info:
function: stan_glm
family: binomial [logit]
formula: y ~ x
algorithm: sampling
sample: 2000 (posterior sample size)
priors: see help('prior_summary')
observations: 100
predictors: 2
Estimates:
mean sd 10% 50% 90%
(Intercept) 1.5 0.5 0.9 1.5 2.1
x 0.5 0.1 0.4 0.5 0.7
Fit Diagnostics:
mean sd 10% 50% 90%
mean_PPD 0.5 0.0 0.5 0.5 0.6
The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
MCMC diagnostics
mcse Rhat n_eff
(Intercept) 0.0 1.0 1132
x 0.0 1.0 1106
mean_PPD 0.0 1.0 1584
log-posterior 0.0 1.0 817
For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
prior_pred_logit <- function(x) {
a <- rnorm(1, mean = 1.2, sd = 0.5)
b <- rnorm(1, mean = 0.4, sd = 0.1)
Pr <- invlogit(a + b * x)
return(Pr)
}
prior_pred <- replicate(50, prior_pred_logit(x)) |>
as.data.frame()
df_long <- prior_pred |>
mutate(x = x) |>
pivot_longer(cols = -x, names_to = "line", values_to = "y")
p <- ggplot(aes(x, y), data = df_long)
p + geom_line(aes(group = line), linewidth = 0.2, alpha = 1/5) +
geom_line(aes(y = Pr), data = sim, linewidth = 0.5, color = 'red') +
ylab(TeX("$logit^{-1}(\\alpha + \\beta x)$")) +
ggtitle(TeX("Simulating from prior $logit^{-1}(\\alpha + \\beta x_i))$"),
subtitle = TeX("$\\alpha \\sim Normal(1.2, 0.5)$ and $\\beta \\sim Normal(0.4, 0.1)$"))
pdp
pregnant | glucose | pressure | triceps | insulin | mass | pedigree | age | diabetes |
---|---|---|---|---|---|---|---|---|
6 | 148 | 72 | 35 | NA | 33.6 | 0.627 | 50 | pos |
1 | 85 | 66 | 29 | NA | 26.6 | 0.351 | 31 | neg |
8 | 183 | 64 | NA | NA | 23.3 | 0.672 | 32 | pos |
1 | 89 | 66 | 23 | 94 | 28.1 | 0.167 | 21 | neg |
0 | 137 | 40 | 35 | 168 | 43.1 | 2.288 | 33 | pos |
5 | 116 | 74 | NA | NA | 25.6 | 0.201 | 30 | neg |
mice
package and brms
, which works nicely with mice
p1 <- ggplot(pima, aes(x = mass)) +
geom_density(aes(group = diabetes, fill = diabetes, color = diabetes), alpha = 1/5) +
xlab("BMI")
p2 <- pima |>
drop_na() |>
mutate(bmi_cut = cut(mass, breaks = seq(15, 70, by = 5))) |>
group_by(bmi_cut) |>
summarize(p = mean(diabetes == "pos"),
n = n(),
se = sqrt(p * (1 - p) / n),
lower = p + se,
upper = p - se) |>
ggplot(aes(x = bmi_cut, y = p)) +
geom_point() + geom_linerange(aes(ymin = lower, ymax = upper), linewidth = 0.2) +
xlab("BMI range") + ylab("Proportion") +
ggtitle("Proportion of diabetics by BMI +/- 1 SE")
library(pdp)
d <- pima |>
as_tibble() |>
select(diabetes, age, pedigree, mass, glucose) |>
drop_na() |> # in an important analysis, you should not do this; instead, impute
mutate(diab = if_else(diabetes == "pos", 1, 0),
age = (age - mean(age)) / 10,
pedigree = (pedigree - mean(pedigree)),
bmi = ((mass - mean(mass)) / 10),
glucose = ((glucose - mean(glucose)) / sd(glucose)))
head(d)
# A tibble: 6 × 7
diabetes age pedigree mass glucose diab bmi
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 pos 1.67 0.154 33.6 0.852 1 0.115
2 neg -0.231 -0.122 26.6 -1.21 0 -0.585
3 pos -0.131 0.199 23.3 2.00 1 -0.915
4 neg -1.23 -0.306 28.1 -1.08 0 -0.435
5 pos -0.0312 1.81 43.1 0.492 1 1.06
6 neg -0.331 -0.272 25.6 -0.194 0 -0.685
\[ \begin{eqnarray} y_i &\sim& \text{Bernoulli}(p_i) \\ \eta_i &=& \alpha + \beta \left( \frac{x_{i} - \bar x}{10} \right) \\ p_i &=& \frac{e^{\eta_i}}{1 + e^{\eta_i}} \\ \alpha &\sim& \text{Normal}(\mu_\alpha ,\ \sigma_\alpha) \\ \beta &\sim& \text{Normal}(\mu_\beta , \ \sigma_\beta) \end{eqnarray} \]
invlogit(c(-1.5, 1.5)) = [0.18, 0.82]
\[ \begin{eqnarray} y_i &\sim& \text{Bernoulli}(p_i) \\ \eta_i &=& \alpha + \beta \left( \frac{x_{i} - \bar x}{10} \right) \\ p_i &=& \frac{e^{\eta_i}}{1 + e^{\eta_i}} \\ \alpha &\sim& \text{Normal}(0 ,\ 0.5) \\ \beta &\sim& \text{Normal}(0 , \ 1) \end{eqnarray} \]
Model Info:
function: stan_glm
family: binomial [logit]
formula: diab ~ bmi
algorithm: sampling
sample: 2000 (posterior sample size)
priors: see help('prior_summary')
observations: 752
predictors: 2
Estimates:
mean sd 10% 50% 90%
(Intercept) 0.0 0.5 -0.6 0.0 0.6
bmi 0.0 1.0 -1.3 0.0 1.3
MCMC diagnostics
mcse Rhat n_eff
(Intercept) 0.0 1.0 1215
bmi 0.0 1.0 1108
log-posterior 0.0 1.0 932
For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
\[ \begin{eqnarray} y_i &\sim& \text{Bernoulli}(p_i) \\ \eta_i &=& \alpha + \beta_1 \left( \frac{x_{i} - \bar x}{10} \right) \\ p_i &=& \frac{e^{\eta_i}}{1 + e^{\eta_i}} \\ \alpha &\sim& \text{Normal}(0 ,\ 0.5) \\ \beta &\sim& \text{Normal}(0.7 , \ 0.2) \end{eqnarray} \]
Model Info:
function: stan_glm
family: binomial [logit]
formula: diab ~ bmi
algorithm: sampling
sample: 2000 (posterior sample size)
priors: see help('prior_summary')
observations: 752
predictors: 2
Estimates:
mean sd 10% 50% 90%
(Intercept) 0.0 0.5 -0.6 0.0 0.6
bmi 0.7 0.2 0.4 0.7 1.0
MCMC diagnostics
mcse Rhat n_eff
(Intercept) 0.0 1.0 1215
bmi 0.0 1.0 1108
log-posterior 0.0 1.0 932
For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
Model Info:
function: stan_glm
family: binomial [logit]
formula: diab ~ bmi
algorithm: sampling
sample: 2000 (posterior sample size)
priors: see help('prior_summary')
observations: 752
predictors: 2
Estimates:
mean sd 10% 50% 90%
(Intercept) -0.6 0.1 -0.7 -0.6 -0.5
bmi 0.9 0.1 0.8 0.9 1.1
Fit Diagnostics:
mean sd 10% 50% 90%
mean_PPD 0.4 0.0 0.3 0.4 0.4
The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
MCMC diagnostics
mcse Rhat n_eff
(Intercept) 0.0 1.0 1315
bmi 0.0 1.0 1299
mean_PPD 0.0 1.0 1651
log-posterior 0.0 1.0 780
For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
posterior_predict
or by constructing the posterior distribution directlybmi_scaled <- (40 - mean(d$mass)) / 10
yepred_m2 <- posterior_epred(m2, newdata = data.frame(bmi = bmi_scaled))
quantile(yepred_m2, probs = c(0.05, 0.50, 0.95)) |> round(2)
5% 50% 95%
0.47 0.51 0.56
d_m2 <- as_tibble(m2) |>
mutate(log_odds = `(Intercept)` + bmi * bmi_scaled,
prob = invlogit(log_odds),
ypred = rbinom(2e3, size = 1, prob = prob))
d_m2[1:3, ]
# A tibble: 3 × 5
`(Intercept)` bmi log_odds prob ypred
<dbl> <dbl> <dbl> <dbl> <int>
1 -0.676 0.897 0.000969 0.500 1
2 -0.746 0.936 -0.0403 0.490 0
3 -0.682 0.957 0.0402 0.510 0
5% 50% 95%
0.47 0.51 0.56
Model Info:
function: stan_glm
family: binomial [logit]
formula: diab ~ bmi + pedigree + age + glucose
algorithm: sampling
sample: 2000 (posterior sample size)
priors: see help('prior_summary')
observations: 752
predictors: 5
Estimates:
mean sd 10% 50% 90%
(Intercept) -0.8 0.1 -0.9 -0.8 -0.7
bmi 0.8 0.1 0.7 0.8 1.0
pedigree 0.8 0.3 0.5 0.8 1.2
age 0.3 0.1 0.2 0.3 0.4
glucose 1.1 0.1 0.9 1.1 1.2
Fit Diagnostics:
mean sd 10% 50% 90%
mean_PPD 0.4 0.0 0.3 0.4 0.4
The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
MCMC diagnostics
mcse Rhat n_eff
(Intercept) 0.0 1.0 2663
bmi 0.0 1.0 2623
pedigree 0.0 1.0 2128
age 0.0 1.0 2551
glucose 0.0 1.0 2861
mean_PPD 0.0 1.0 2566
log-posterior 0.1 1.0 747
For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
install.packages("shinystan")
and launch_shinystan(m3)
library(splines)
priors <- normal(location = c(0.7, rep(0, 7)),
scale = c(0.2, rep(1, 7)))
m4 <- stan_glm(diab ~ bmi + pedigree +
bs(age, df = 4) +
glucose + glucose:pedigree,
prior_intercept = normal(0, 0.5),
prior = priors,
family = binomial(link = "logit"),
data = d,
refresh = 0,
seed = 123,
iter = 1000)
summary(m4)
Model Info:
function: stan_glm
family: binomial [logit]
formula: diab ~ bmi + pedigree + bs(age, df = 4) + glucose + glucose:pedigree
algorithm: sampling
sample: 2000 (posterior sample size)
priors: see help('prior_summary')
observations: 752
predictors: 9
Estimates:
mean sd 10% 50% 90%
(Intercept) -1.6 0.3 -2.0 -1.6 -1.2
bmi 0.8 0.1 0.6 0.8 0.9
pedigree 0.9 0.3 0.6 0.9 1.3
bs(age, df = 4)1 0.3 0.4 -0.2 0.3 0.8
bs(age, df = 4)2 2.6 0.6 1.9 2.6 3.3
bs(age, df = 4)3 0.7 0.7 -0.2 0.7 1.6
bs(age, df = 4)4 -0.6 0.8 -1.6 -0.6 0.4
glucose 1.1 0.1 1.0 1.1 1.3
pedigree:glucose -0.6 0.3 -0.9 -0.6 -0.3
Fit Diagnostics:
mean sd 10% 50% 90%
mean_PPD 0.4 0.0 0.3 0.4 0.4
The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
MCMC diagnostics
mcse Rhat n_eff
(Intercept) 0.0 1.0 1643
bmi 0.0 1.0 2228
pedigree 0.0 1.0 2015
bs(age, df = 4)1 0.0 1.0 1758
bs(age, df = 4)2 0.0 1.0 1999
bs(age, df = 4)3 0.0 1.0 1853
bs(age, df = 4)4 0.0 1.0 2086
glucose 0.0 1.0 1749
pedigree:glucose 0.0 1.0 1633
mean_PPD 0.0 1.0 1912
log-posterior 0.1 1.0 960
For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
loo
and loo_compare
in R)sleepstudy
from lme4
package\[ \theta_j \approx \frac{\frac{n_j}{\sigma_{y}^2} \overline y_j + \frac{1}{\sigma_{\tau}^2} \overline y_{\tau}} {\frac{n_j}{\sigma_{y}^2} + \frac{1}{\sigma_{\tau}^2}} \]
\[ \begin{eqnarray} y_{ij} & \sim & \text{Normal}(\mu, \ \sigma^2) \\ \mu & \sim & \text{Normal}(300, 10^2) \\ \sigma & \sim & \text{Exponential}(0.02) \end{eqnarray} \]
Model Info:
function: stan_glm
family: gaussian [identity]
formula: Reaction ~ 1
algorithm: sampling
sample: 10000 (posterior sample size)
priors: see help('prior_summary')
observations: 125
predictors: 1
Estimates:
mean sd 10% 50% 90%
(Intercept) 294.5 4.2 289.2 294.5 299.9
sigma 52.9 3.4 48.7 52.7 57.3
Fit Diagnostics:
mean sd 10% 50% 90%
mean_PPD 294.4 6.3 286.4 294.4 302.5
The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
MCMC diagnostics
mcse Rhat n_eff
(Intercept) 0.1 1.0 6347
sigma 0.0 1.0 6890
mean_PPD 0.1 1.0 8264
log-posterior 0.0 1.0 4152
For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
stan_glm
, or you can do in all at once\[ \begin{eqnarray} y_{ij} & \sim & \text{Normal}(\mu_j, \ \sigma^2) \\ \mu_j & \sim & \text{Normal}(300, s_j^2) \\ \sigma & \sim & \text{Exponential}(0.02) \end{eqnarray} \]
m2 <- stan_glm(Reaction ~ Subject - 1,
prior = normal(300, 10, autoscale = TRUE),
prior_aux = exponential(0.02),
family = gaussian,
data = d,
iter = 1000,
refresh = 0,
seed = 123)
summary(m2)
Model Info:
function: stan_glm
family: gaussian [identity]
formula: Reaction ~ Subject - 1
algorithm: sampling
sample: 2000 (posterior sample size)
priors: see help('prior_summary')
observations: 125
predictors: 18
Estimates:
mean sd 10% 50% 90%
Subject308 353.4 17.1 331.0 353.8 375.1
Subject309 213.0 14.6 194.4 212.8 231.8
Subject310 223.9 12.9 207.0 224.0 240.2
Subject330 305.2 12.7 288.6 305.1 321.6
Subject331 310.0 15.3 290.1 310.2 329.3
Subject332 272.0 15.2 252.6 271.8 291.7
Subject333 306.1 16.9 284.3 306.0 327.7
Subject334 296.9 12.2 281.1 297.1 312.5
Subject335 248.6 13.2 231.4 248.8 265.1
Subject337 359.4 14.3 341.2 359.4 378.3
Subject349 282.0 15.6 262.2 282.0 301.1
Subject350 336.6 14.5 317.6 336.7 354.8
Subject351 280.5 18.3 256.7 280.8 303.6
Subject352 346.1 16.5 325.2 345.9 367.1
Subject369 294.7 14.3 276.4 294.6 313.6
Subject370 262.1 14.9 243.7 262.2 281.4
Subject371 295.4 11.7 280.3 295.2 309.9
Subject372 317.6 11.9 302.7 317.6 333.0
sigma 37.4 2.6 34.4 37.2 40.7
Fit Diagnostics:
mean sd 10% 50% 90%
mean_PPD 293.2 4.8 287.1 293.2 299.2
The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
MCMC diagnostics
mcse Rhat n_eff
Subject308 0.3 1.0 2742
Subject309 0.3 1.0 3019
Subject310 0.3 1.0 2411
Subject330 0.3 1.0 2263
Subject331 0.3 1.0 2563
Subject332 0.3 1.0 2924
Subject333 0.3 1.0 2560
Subject334 0.2 1.0 2928
Subject335 0.2 1.0 3386
Subject337 0.3 1.0 2424
Subject349 0.3 1.0 2629
Subject350 0.3 1.0 2490
Subject351 0.4 1.0 2272
Subject352 0.3 1.0 2909
Subject369 0.3 1.0 2915
Subject370 0.3 1.0 2594
Subject371 0.2 1.0 2399
Subject372 0.2 1.0 2760
sigma 0.1 1.0 1113
mean_PPD 0.1 1.0 2215
log-posterior 0.1 1.0 719
For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
\[ \begin{eqnarray} y_{ij} &\sim& \text{Normal}(\mu_j, \ \sigma_y^2) \\ \mu_{j} &\sim& \text{Normal}(\mu, \ \sigma_{\mu}^2) \\ \mu &\sim& \text{Normal}(300, 50^2) \\ \sigma_y &\sim& \text{Exponential}(0.02) \\ \sigma_{\mu} &\sim& \text{Exponential}(1) \end{eqnarray} \]
Model Info:
function: stan_glmer
family: gaussian [identity]
formula: Reaction ~ (1 | Subject)
algorithm: sampling
sample: 3000 (posterior sample size)
priors: see help('prior_summary')
observations: 125
groups: Subject (18)
Estimates:
mean sd 10% 50% 90%
(Intercept) 294.2 9.8 281.8 294.5 306.5
b[(Intercept) Subject:308] 49.3 17.5 27.3 49.0 71.5
b[(Intercept) Subject:309] -71.7 16.5 -92.5 -71.5 -50.8
b[(Intercept) Subject:310] -62.7 15.1 -82.3 -62.5 -43.9
b[(Intercept) Subject:330] 10.0 15.0 -8.8 9.8 29.0
b[(Intercept) Subject:331] 13.3 16.4 -7.2 13.4 33.9
b[(Intercept) Subject:332] -19.2 16.4 -39.7 -19.1 2.1
b[(Intercept) Subject:333] 9.8 18.0 -13.0 9.8 32.4
b[(Intercept) Subject:334] 2.7 14.8 -16.2 2.6 21.6
b[(Intercept) Subject:335] -40.4 15.4 -60.1 -40.2 -21.1
b[(Intercept) Subject:337] 57.4 16.1 37.2 57.2 77.4
b[(Intercept) Subject:349] -10.8 16.3 -31.4 -10.8 9.9
b[(Intercept) Subject:350] 37.1 15.5 17.3 37.1 56.8
b[(Intercept) Subject:351] -10.9 18.2 -34.6 -11.0 12.7
b[(Intercept) Subject:352] 43.1 17.3 21.1 42.8 65.0
b[(Intercept) Subject:369] 0.5 16.3 -20.3 0.3 21.2
b[(Intercept) Subject:370] -28.2 16.3 -48.9 -28.3 -6.9
b[(Intercept) Subject:371] 0.7 14.3 -17.1 0.6 19.3
b[(Intercept) Subject:372] 21.3 14.1 3.8 21.3 39.0
sigma 37.5 2.5 34.3 37.4 40.8
Sigma[Subject:(Intercept),(Intercept)] 1685.8 706.9 968.5 1538.3 2541.1
Fit Diagnostics:
mean sd 10% 50% 90%
mean_PPD 293.0 4.9 286.8 292.9 299.4
The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
MCMC diagnostics
mcse Rhat n_eff
(Intercept) 0.5 1.0 383
b[(Intercept) Subject:308] 0.5 1.0 1285
b[(Intercept) Subject:309] 0.5 1.0 1171
b[(Intercept) Subject:310] 0.5 1.0 987
b[(Intercept) Subject:330] 0.5 1.0 821
b[(Intercept) Subject:331] 0.6 1.0 889
b[(Intercept) Subject:332] 0.5 1.0 1103
b[(Intercept) Subject:333] 0.5 1.0 1267
b[(Intercept) Subject:334] 0.5 1.0 797
b[(Intercept) Subject:335] 0.5 1.0 1092
b[(Intercept) Subject:337] 0.5 1.0 965
b[(Intercept) Subject:349] 0.5 1.0 1196
b[(Intercept) Subject:350] 0.5 1.0 885
b[(Intercept) Subject:351] 0.4 1.0 1648
b[(Intercept) Subject:352] 0.5 1.0 1246
b[(Intercept) Subject:369] 0.5 1.0 928
b[(Intercept) Subject:370] 0.5 1.0 1020
b[(Intercept) Subject:371] 0.5 1.0 819
b[(Intercept) Subject:372] 0.5 1.0 754
sigma 0.0 1.0 2630
Sigma[Subject:(Intercept),(Intercept)] 23.6 1.0 895
mean_PPD 0.1 1.0 3211
log-posterior 0.2 1.0 617
For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
muj <- m3 |>
spread_draws(`(Intercept)`, b[ ,Subject]) |>
mutate(mu_j = `(Intercept)` + b) |>
select(Subject, mu_j) |>
mean_qi(.width = 0.90)
head(muj)
# A tibble: 6 × 7
Subject mu_j .lower .upper .width .point .interval
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 Subject:308 344. 318. 369. 0.9 mean qi
2 Subject:309 223. 200. 245. 0.9 mean qi
3 Subject:310 231. 211. 252. 0.9 mean qi
4 Subject:330 304. 285. 324. 0.9 mean qi
5 Subject:331 308. 284. 330. 0.9 mean qi
6 Subject:332 275. 252. 298. 0.9 mean qi
m4 <- stan_glmer(Reaction ~ Days + (Days | Subject),
prior_intercept = normal(300, 50),
prior = normal(0, 2, autoscale = TRUE),
prior_aux = exponential(0.02),
prior_covariance = decov(reg = 1,
conc = 1, shape = 1, scale = 1),
family = gaussian, data = d, iter = 1500,
cores = 4, seed = 123, refresh = 0)
summary(m4)
Model Info:
function: stan_glmer
family: gaussian [identity]
formula: Reaction ~ Days + (Days | Subject)
algorithm: sampling
sample: 3000 (posterior sample size)
priors: see help('prior_summary')
observations: 125
groups: Subject (18)
Estimates:
mean sd 10% 50% 90%
(Intercept) 256.3 7.7 246.6 256.2 265.9
Days 8.9 1.7 6.7 8.9 11.1
b[(Intercept) Subject:308] 42.1 22.7 15.0 39.7 72.8
b[Days Subject:308] 0.9 4.4 -4.9 1.2 6.2
b[(Intercept) Subject:309] -43.1 14.0 -61.2 -43.0 -25.6
b[Days Subject:309] -7.9 3.1 -11.8 -7.9 -3.8
b[(Intercept) Subject:310] -42.8 13.5 -60.4 -42.3 -26.0
b[Days Subject:310] -5.0 2.9 -8.7 -5.0 -1.3
b[(Intercept) Subject:330] 23.8 14.4 5.7 23.3 42.0
b[Days Subject:330] -4.1 2.8 -7.8 -4.0 -0.6
b[(Intercept) Subject:331] 19.2 12.9 3.0 19.0 35.8
b[Days Subject:331] 0.7 2.9 -3.1 0.7 4.4
b[(Intercept) Subject:332] 2.8 12.6 -12.8 2.3 19.2
b[Days Subject:332] -5.1 2.9 -8.9 -5.0 -1.4
b[(Intercept) Subject:333] 11.7 14.9 -7.0 11.2 31.0
b[Days Subject:333] 0.3 3.1 -3.7 0.4 4.2
b[(Intercept) Subject:334] -9.7 13.2 -26.9 -9.3 7.0
b[Days Subject:334] 2.4 2.7 -1.0 2.4 5.8
b[(Intercept) Subject:335] -7.2 14.4 -25.5 -7.5 11.5
b[Days Subject:335] -8.9 2.8 -12.5 -8.9 -5.2
b[(Intercept) Subject:337] 32.9 13.6 15.6 32.9 50.4
b[Days Subject:337] 9.9 3.0 6.0 9.9 13.7
b[(Intercept) Subject:349] -25.4 18.1 -49.2 -24.0 -3.1
b[Days Subject:349] 1.5 3.4 -2.9 1.4 6.0
b[(Intercept) Subject:350] -10.7 16.7 -32.6 -10.1 10.6
b[Days Subject:350] 8.1 3.1 4.3 8.0 12.3
b[(Intercept) Subject:351] 3.2 16.9 -17.9 2.7 24.5
b[Days Subject:351] -4.1 3.7 -8.8 -4.0 0.5
b[(Intercept) Subject:352] 26.9 16.4 6.8 26.4 47.7
b[Days Subject:352] 3.4 3.2 -0.7 3.5 7.4
b[(Intercept) Subject:369] -0.4 13.1 -17.0 -0.4 15.6
b[Days Subject:369] 1.1 2.8 -2.4 1.1 4.7
b[(Intercept) Subject:370] -31.5 14.5 -50.2 -31.0 -13.4
b[Days Subject:370] 5.0 3.5 0.4 5.0 9.5
b[(Intercept) Subject:371] -1.9 12.8 -18.0 -1.8 14.3
b[Days Subject:371] 0.1 2.6 -3.1 0.1 3.4
b[(Intercept) Subject:372] 10.2 12.6 -5.7 10.3 26.5
b[Days Subject:372] 2.4 2.6 -0.9 2.4 5.7
sigma 22.1 1.8 20.0 22.0 24.4
Sigma[Subject:(Intercept),(Intercept)] 845.2 460.0 381.3 751.0 1416.9
Sigma[Subject:Days,(Intercept)] 9.2 64.9 -68.9 14.1 77.8
Sigma[Subject:Days,Days] 44.8 22.7 22.0 39.6 73.6
Fit Diagnostics:
mean sd 10% 50% 90%
mean_PPD 293.3 2.8 289.8 293.2 296.9
The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
MCMC diagnostics
mcse Rhat n_eff
(Intercept) 0.2 1.0 1300
Days 0.0 1.0 1246
b[(Intercept) Subject:308] 0.6 1.0 1616
b[Days Subject:308] 0.1 1.0 1862
b[(Intercept) Subject:309] 0.3 1.0 2340
b[Days Subject:309] 0.1 1.0 2460
b[(Intercept) Subject:310] 0.3 1.0 1957
b[Days Subject:310] 0.1 1.0 2338
b[(Intercept) Subject:330] 0.4 1.0 1533
b[Days Subject:330] 0.1 1.0 1666
b[(Intercept) Subject:331] 0.3 1.0 2157
b[Days Subject:331] 0.1 1.0 2760
b[(Intercept) Subject:332] 0.3 1.0 1876
b[Days Subject:332] 0.1 1.0 2272
b[(Intercept) Subject:333] 0.3 1.0 2132
b[Days Subject:333] 0.1 1.0 2255
b[(Intercept) Subject:334] 0.3 1.0 1935
b[Days Subject:334] 0.1 1.0 1922
b[(Intercept) Subject:335] 0.3 1.0 1693
b[Days Subject:335] 0.1 1.0 1648
b[(Intercept) Subject:337] 0.3 1.0 2140
b[Days Subject:337] 0.1 1.0 2307
b[(Intercept) Subject:349] 0.4 1.0 1636
b[Days Subject:349] 0.1 1.0 1769
b[(Intercept) Subject:350] 0.4 1.0 1649
b[Days Subject:350] 0.1 1.0 1450
b[(Intercept) Subject:351] 0.4 1.0 2235
b[Days Subject:351] 0.1 1.0 2194
b[(Intercept) Subject:352] 0.3 1.0 2721
b[Days Subject:352] 0.1 1.0 2473
b[(Intercept) Subject:369] 0.3 1.0 2130
b[Days Subject:369] 0.1 1.0 2044
b[(Intercept) Subject:370] 0.4 1.0 1427
b[Days Subject:370] 0.1 1.0 1486
b[(Intercept) Subject:371] 0.3 1.0 1885
b[Days Subject:371] 0.1 1.0 2096
b[(Intercept) Subject:372] 0.3 1.0 2141
b[Days Subject:372] 0.1 1.0 1802
sigma 0.0 1.0 1532
Sigma[Subject:(Intercept),(Intercept)] 13.9 1.0 1099
Sigma[Subject:Days,(Intercept)] 2.1 1.0 931
Sigma[Subject:Days,Days] 0.6 1.0 1364
mean_PPD 0.1 1.0 2854
log-posterior 0.3 1.0 518
For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
new_subj <- data.frame(Days = 0:9,
Subject = as.factor(rep(400, 10)))
ypred_subj <- posterior_predict(m4,
newdata = new_subj)
new_subj |>
add_predicted_draws(m4) |>
ggplot(aes(x = Days)) +
stat_lineribbon(aes(y = .prediction),
width = c(.9, .8, .5),
alpha = 0.25) +
ylab("Reaction time") +
ggtitle("Prediction for an unobserved subject") +
scale_fill_brewer(palette = "Greys")