Hierarchical models

Jay Brophy MD PhD

Departments of Medicine, Epidemiology and Biostatistics, McGill University

2024-11-26

Conflicts of interest

No conflicts of interest
To the best of my knowledge I’m equally disliked, or at best deemed irrelevant, by all drug and device companies

AI generated image

Objectives

Review of probability concepts

Review of hierarchical modeling concepts

Realization of their practical applications

Hierarchical models (models with memory)

Often data is not iid (e.g. repeat or clustered measurements)
- nested data (patients within hospitals, within regions, with countries)
- panel or time-series cross-sectional data

Hierarchical (mixed of multilevel) models account for the dependency structure with potential pooling of information across clusters

Benefits of the multilevel approach

(1) Provide robust and conservative estimates across levels
(2) Estimate variation on all levels of the model
(3) Predict values of new groups not originally present in the data

Simple averaging can be dangerous leading to false confidence

Multilevel models allow us to preserve all uncertainties & should be default models

Increasing use of hierarchical models

Hierarchical model

Hierarchical model: terms

The standard Bayesian random effects meta-analysis model follows the same hierarchical approach where level 1 is
\[ y_j = \theta_j + \epsilon \:where\>error\> term\> \epsilon_j \, \sim \, normal(0,s_j)\] models the uncertainty of the effect from each individual study, \(\theta_j\). We assume that the estimate \(y_j\) from the j-th study is unbiased and normally distributed with standard error \(s_j\). Then level 2 is

\[ \theta_j = \mu + \mu_j \: where\>error\> term\> \mu_j \, \sim \, normal(0,\tau)\] where \(\mu\) is the average treatment effect in the (hypothetical) superpopulation of similar studies, \(\theta_j\) is the effect in the j-th individual study and the variance \(\tau^2\) is referred to as the heterogeneity

Hierarchical model: comparisons

Stein’s paradox (1997)

Which average (model) to use?

Want to predict each player’s season-long average

3 possible predictive models

(a) complete pooling, which assumes each player is the same, \(\boldsymbol y\), corresponds to zero population variance
(b) no pooling, which assumes the players are unrelated, \(y_i\), corresponds to infinite population variance
(c) partial pooling (hierarchical), where each player is assumed to have a different chance of success, but the data for all of the observed players informs the estimates for each individual player

IOW, how much pooling depends on the variance in the population, more variance -> less pooling
Stein’s paradox is that these new Stein/James (partial) estimators, \((z_i = \boldsymbol y + c(y_i - \boldsymbol y))\), predict better than individual averages, \(y_i\), or the global average, \(\boldsymbol y\) where c is the shrinkage factor

Stein’s paradox (1977)

If more than 3 means, estimating each by its own mean is not the most efficient

Essence of the Stein method is that of “borrowing” of information or “shrinkage” will produce “better” estimates, meaning lowered mean squared errors

Baseball data for early season average (hits (successes) / 45 chances (AB)) for 18 players

library(rstanarm)
data(bball1970)
bball <- bball1970
# graph on next slide
bball_graph <- bball %>%
  mutate(final_avg = (Hits + RemainingHits) / (AB + RemainingAB), early_avg = Hits / AB,
         stein_est = .265 + .212*(early_avg-.265)) %>%
  dplyr::select(Player, final_avg, early_avg, stein_est) %>%
  pivot_longer(!Player, names_to = "average", values_to = "count") %>%
  ggplot(aes(x=Player, y=count, fill=factor(average)))+
  ylab("batting average") +
  geom_col(position='dodge') + 
  geom_hline(yintercept = .265, linewidth=2, color = "orange") + # early average
#  geom_hline(yintercept = .269, linewidth=2, color = "black") + final average
  coord_flip()

       Player AB Hits RemainingAB RemainingHits
1    Clemente 45   18         367           127
2    Robinson 45   17         426           127
3      Howard 45   16         521           144
4   Johnstone 45   15         275            61
5       Berry 45   14         418           114
6     Spencer 45   14         466           126
7   Kessinger 45   13         586           155
8    Alvarado 45   12         138            29
9       Santo 45   11         510           137
10    Swaboda 45   11         200            46
11 Petrocelli 45   10         538           142
12  Rodriguez 45   10         186            42
13      Scott 45   10         435           132
14      Unser 45   10         277            73
15   Williams 45   10         591           195
16 Campaneris 45    9         558           159
17     Munson 45    8         408           129
18      Alvis 45    7          70            14

Which average is better?

Key points
1. Stein estimators (blue bar) closer to global early mean than individual early averages (red bar) (“shrinkage”)
2. Shrinkage is greatest for points farthest from the global average
3. Stein estimators (blue bar) closer to individual final avg (green bar) 14 vs 4 times than for the individual early avg (red bar) (i.e beats separate model)
4. Stein estimators (blue bar) closer to individual final avg (green bar) 13 vs 5 times than for the global early avg (orange vertical line) (i.e beats pooled model)

Bayesian approach

Stein estimator is not Bayesian as it is not dependent on priors, but it is a precursor to Bayesian multilevel models (shrinkage, borrowing information)
Full Bayesian analysis (with weakly informative priors) is possible and follows using rstanarm package

SEED <- 101
fit_partialpool <- stan_glmer(cbind(Hits, AB - Hits) ~ (1 | Player), data = bball,   #<<
    family = binomial("logit"), seed = SEED, refresh=0)
# shift each player's estimate by intercept (and then drop intercept)
shift_draws <- function(draws) {sweep(draws[, -1], MARGIN = 1, STATS = draws[, 1], FUN = "+")}
summary_stats <- function(posterior) {x <- invlogit(posterior)  # log-odds -> probabilities
  t(apply(x, 2, quantile, probs = c(0.05, 0.5, 0.95))) }
alphas <- shift_draws(as.matrix(fit_partialpool));partialpool <- summary_stats(alphas)
partialpool <- partialpool[-nrow(partialpool),];rownames(partialpool) <- as.character(bball$Player)
batting_avg <- function(x) print(format(round(x, digits = 3), nsmall = 3), quote = FALSE)

James–Stein estimators also have direct implications on the machine learning era of statistical innovation - Efron 2023

Radon example

919 obs. of 4 variables from houses in 85 counties in Minnesota
- log_radon - Radon measurement from the house (log scale)
- floor - 0 = basement, 1 = first floor
- county - County name (factor)
- log_uranium - Uranium level in the county (log scale)

# Load required libraries
pacman::p_load(tidyverse, cmdstanr, ggplot2, ggthemes, rstanarm, brms)
xlabels_90 <- theme(axis.text.x = element_text(angle = 90, hjust = 1))
mn_radon <- read_csv("data/mn_radon.csv", show_col_types = FALSE) # Read data

Radon Exploratory Data Analysis

Radon models - no pooling plot

Gives completely independent estimates for each group without any Bayesian shrinkage, useful only if suspect that the groups (counties) are very different and nothing to be gained from borrowing information across groups.

Radon models - pooling model

Each county has its own intercept (alpha[j]) with common floor slope (beta) The error term is normally distributed with standard deviation sigma.

# Example Stan code:
data {
  int<lower=1> N;  // observations
  int<lower=1> J;  // counties
  array[N] int<lower=1, upper=J> county;
  vector[N] x;     // floor
  vector[N] y;     // radon
}
parameters {
  vector[J] alpha;
  real beta;
  real<lower=0> sigma;
}
model {
  // Model specifications
  alpha ~ normal(0, 10);
  beta ~ normal(0, 10);
  sigma ~ normal(0, 10);

  // Likelihood
  y ~ normal(alpha[county] + beta * x, sigma);  
}
generated quantities {
  array[N] real log_lik;     // Log-likelihood for each observation
  array[N] real y_rep;       // Posterior predictive checks
  for (i in 1:N) {
    log_lik[i] = normal_lpdf(y[i] | alpha[county[i]] + beta * x[i], sigma);
    y_rep[i] = normal_rng(alpha[county[i]] + beta * x[i], sigma);  // Generate predictive samples
  }
}

Radon models - complete vs. partial pooling

Easier to use front end packages like rstanarm and brms than stan directly. These packages are not approprite for no pooling because they’re designed to do Bayesian shrinkage.

library(rstanarm)
library(loo)
mn_radon <- read_csv("data/mn_radon.csv", show_col_types = FALSE)

# ====== Complete Pooling ====== 
model_cp_rstanarm <- stan_glm(log_radon ~ floor, data = mn_radon, family = gaussian(), refresh=0)

# ====== Partial Pooling ====== 
model_pp_rstanarm <- stan_glmer(log_radon ~ floor + (1|county_id), data = mn_radon, family = gaussian(), refresh=0)

save(model_pp_rstanarm, file="output/model_pp_rstanarm.RData")
rstan_comp <- loo_compare(loo(model_cp_rstanarm), loo(model_pp_rstanarm))
save(rstan_comp, file="output/rstan_comp.RData")

library(brms)
library(loo)
# ====== Complete Pooling ====== 
model_cp_brms <- brm(log_radon ~ floor, data = mn_radon, family = gaussian(), chains = 2, iter = 2000, refresh=0)

# ====== Partial Pooling ====== 
model_pp_brms <- brm(log_radon ~ floor + (1|county_id), data = mn_radon, family = gaussian(), chains = 2, iter = 2000, refresh=0)

brms_comp <- loo_compare(loo(model_cp_brms), loo(model_pp_brms))
save(brms_comp, file="output/brms_comp.RData")

Radon models - complete vs. partial pooling plots

1. More shrinkage of point estimate toward global mean with i) “trials” farther away from global mean ii) smaller sample size
2. More shrinkage of credible intervals with smaller studies

Radon models - complete vs. partial pooling

Pooled model, log_radon ~ floor, ignores any grouping structure (e.g., counties) in the data and assumes that all observations are independent and identically distributed & has a straightforward interpretation floor coefficient = average effect across all observations, without accounting for any variation between different counties.

Treats floor effect as uniform across all counties, assumes variance homogeneity in dependent variable.

Provides overall average effect but can lead to biased or misleading conclusions if significant group-level variability exists, underestimates the standard errors of the estimated effects due to ignoring the intra-group correlation.

Hierarchical,AKA mixed-effects, model, log_radon ~ floor + (1|county_id),intercept varies by county around the overall average radon level

If 2 models are very different, -> high between-group variation, & more complex hierarchical model is likely better.

Radon models - which is better?

load("rstan_comp.RData")
rstan_comp

                  elpd_diff se_diff
model_pp_rstanarm   0.0       0.0  
model_cp_rstanarm -53.2      10.6

load("brms_comp.RData")
brms_comp

              elpd_diff se_diff
model_pp_brms   0.0       0.0  
model_cp_brms -52.8      10.6

Explanation of elpd_diff

\(\Delta\) in expected log pointwise predictive density (epld) between two statistical models measures model performance, specifically how well a model predicts new data

Metric is commonly used when comparing models using tools like the loo package in R, which implements leave-one-out cross-validation (LOO) for Bayesian models.

Higher values suggest better predictive performance.

elpd_diff = -53.2 (52.8) suggests that the first model (hierarchical) likely to be a better predictor of new data than the comparison completely pooled model with se = 10.6 and therefore z \(\approx\) 5

Interpretation: There is strong evidence to suggest a hierarchical model is preferred over the completely pooled model.

Radon models

What about other possible confounders e.g. log_uranium

# The subset of the radon data we need is included in rstanarm
library("rstanarm"); data(radon)
# Now adjust the pooled  model for the county-level uranium predictor
modelA <- stan_lmer(
 log_radon ~ floor + log_uranium + floor:log_uranium + (1 + floor | county),
  data = radon, cores = 4, iter = 2000, chains = 4
)
saveRDS(modelA, file = "output/modelA.rds")
# Recall the hierarchical model without the county-level uranium predictor
modelB <- update(modelA, formula = log_radon ~ floor + (1 + floor | county))
saveRDS(modelB, file = "output/modelB.rds")

# loading outcome to save repeating MCMC calculations
modelA <- readRDS("output/modelA.rds");modelB <- readRDS("output/modelB.rds")
# Compare models needs to separate code block from preceeding (why?)
looA <- loo(modelA); looB <- loo(modelB)
print(loo_compare(looA, looB))

   elpd_diff se_diff

modelA 0.0 0.0
modelB -9.8 5.1

elpd_diff = -9.8 suggests that the including the variable log_uranium in the hierarchical likely leads to better predictor compared to the baseline hierarchical model without it, but se = 5.1 suggesting some uncertainty (z = - 1.92)

Interpretation: Although there is some uncertainty does suggest hierarchical model including county uranium levels (model A) is preferred.

Radon output table

head(summary(modelA))

                                    mean         mcse         sd        10%
(Intercept)                   1.49961323 0.0005561482 0.03486938  1.4554032
floor                        -0.65714227 0.0010488821 0.07912519 -0.7557767
log_uranium                   0.78698453 0.0014784930 0.09189782  0.6686785
floor:log_uranium            -0.43203819 0.0027913042 0.20460791 -0.6922567
b[(Intercept) county:AITKIN] -0.01533251 0.0015191119 0.12560828 -0.1696316
b[floor county:AITKIN]        0.01564660 0.0028778328 0.21529226 -0.2296711
                                      50%        90% n_eff      Rhat
(Intercept)                   1.499581802  1.5436340  3931 1.0004394
floor                        -0.659742443 -0.5542719  5691 0.9992758
log_uranium                   0.789064497  0.9040145  3863 0.9997451
floor:log_uranium            -0.431594916 -0.1716113  5373 0.9998638
b[(Intercept) county:AITKIN] -0.011744665  0.1349220  6837 0.9995214
b[floor county:AITKIN]        0.005217733  0.2764645  5597 0.9998745

The effect of floor is therefore \(exp^{-0.66}\) = 0.51 decrease in radon measurements from basement to first floor

Multinational RCT

PLATO NEJM 2009 multinational RCT of antiplatelets, ticagrelor vs. clopidogrel > 5,000 citations

Results Primary end point — composite of death from vascular causes, myocardial infarction, or stroke — occurred in 9.8% of ticagrelor subjects vs. 11.7% of clopidogrel subjects (HR, 0.84; 95% CI 0.77 to 0.92; P<0.001)
Conculsion Ticagrelor significantly reduced the primary outcome
Data - 862 centers, 43 countries, 6 geographic regions (iid?)

library(epiR)
dat <- read.csv("data/Plato_FDA_470.csv", header = TRUE)
kable(dat)

Region	tic_N	tic_O	clo_N	clo_O
US	707	84	706	67
Eastern Europe	3820	299	3825	394
Western Europe	2725	240	2704	281
Asia	819	90	812	114
Latin America	621	91	615	104
Other	641	60	628	54

Reproduce NEJM results

NEJM -> time to event analysis. With only aggregate data, limited to cumulative RR (epiR::epi.2by2)

nt <- sum(dat$tic_N); nc <- sum(dat$clo_N); et <- sum(dat$tic_O); ec <- sum(dat$clo_O)
total <- matrix(c(et,nt-et,ec,nc-ec), nrow = 2, byrow = TRUE, dimnames = list(c("Ticagrelor", "Clopidogrel"), c("Died", "Alive")))
epi.2by2(dat = as.table(total), method = "cohort.count", conf.level = 0.95, units = 100, outcome = "as.columns")

             Outcome +    Outcome -      Total                 Inc risk *
Exposed +          864         8469       9333        9.26 (8.68 to 9.86)
Exposed -         1014         8276       9290     10.91 (10.29 to 11.57)
Total             1878        16745      18623      10.08 (9.66 to 10.53)

Point estimates and 95% CIs:
-------------------------------------------------------------------
Inc risk ratio                                 0.85 (0.78, 0.92)
Inc odds ratio                                 0.83 (0.76, 0.92)
Attrib risk in the exposed *                   -1.66 (-2.52, -0.79)
Attrib fraction in the exposed (%)            -17.90 (-28.50, -8.18)
Attrib risk in the population *                -0.83 (-1.60, -0.06)
Attrib fraction in the population (%)         -8.24 (-12.61, -4.03)
-------------------------------------------------------------------
Uncorrected chi2 test that OR = 1: chi2(1) = 14.106 Pr>chi2 = <0.001
Fisher exact test that OR = 1: Pr>chi2 = <0.001
 Wald confidence limits
 CI: confidence interval
 * Outcomes per 100 population units

Results concordant with reported (HR, 0.84; 95% CI 0.77 to 0.92; P<0.001)

Data clustering

Original treated observations as iid, ignores possible clustering within specific geographic regions -> error underestimation -> over confidence in results precision
Reasonable to ignore these differences?
Possible solution - hierarchical model

US data

# to get individual non-pooled results for each region
dat2 <- dat[dat$Region=="US",]
mat <- matrix(c(dat2$tic_O, dat2$tic_N - dat2$tic_O, dat2$clo_O, dat2$clo_N - dat2$clo_O), nrow = 2, byrow = TRUE,
              dimnames = list(c("Ticagrelor", "Clopidogrel"), c("Died", "Alive")))
tt <- epi.2by2(dat = as.table(mat), method = "cohort.count", conf.level = 0.95, units = 100,  outcome = "as.columns")
tt$massoc.summary[1,1:4]

             var      est     lower    upper
1 Inc risk ratio 1.251958 0.9241111 1.696115

Eastern Europe data

# to get individual non-pooled results for each region
dat2 <- dat[dat$Region=="Eastern Europe",]
mat <- matrix(c(dat2$tic_O, dat2$tic_N - dat2$tic_O, dat2$clo_O, dat2$clo_N - dat2$clo_O), nrow = 2, byrow = TRUE,
              dimnames = list(c("Ticagrelor", "Clopidogrel"), c("Died", "Alive")))
tt <- epi.2by2(dat = as.table(mat), method = "cohort.count", conf.level = 0.95, units = 100,  outcome = "as.columns")
tt$massoc.summary[1,1:4]

             var       est     lower     upper
1 Inc risk ratio 0.7598766 0.6583074 0.8771167

Different models

Separate RCTs, or in this case from separate regions, can be modeled as follows \(y_j\) = log(RR)
\[y_j \sim N(\theta_j, \sigma^2_j) \\ \theta_j \sim N(\mu, \tau)\] where \(y_j = log(RR_j)\) and \(\sigma^2_j\) is assumed to be known with certainty (large sample sizes). Meta-analyses are typically concerned with the overall mean, \(\mu\).

Recall, in general, three ways to estimate \(\theta_j\);
1. No-pooling: there is a separate model for each study and \(\theta_j=y_j\). i.e. hierarchical model in which the between study variation \(\tau = \infty\).
2. Complete-pooling: patients in each study are random samples from a common distribution so \(\theta_j = \mu\). i.e. hierarchical model in with \(\tau = 0\).
3. Partial-pooling: a compromise between 1. and 2. In this case \(\tau\) is estimated from the data and \(\theta_j\) is closer to \(\mu\) when \(\tau\) is small relative to \(\sigma^2_j\) and closer to \(y_j\) when the reverse is true.

Meta-analysis

Random effects meta-analysis is a form of hierarchical modeling (regression analysis) where both within and between group variations are considered

A relevant statistic is the relative risk ratio, or \(p_{1j} / p_{0j}\), where \(p_{1j} = d_{1j}/n_{1j}\)and \(p_{0j} = d_{0j}/n_{0j}\)
Log relative risk ratio, \(y_j = log(p_{1j}) - log(p_{0j})\) preferred, since \(\approx \:N(\mu, \sigma)\)
Variance of each \(y_j\) calculated by treating \(p_{1j}\) and \(p_{0j}\) as sample proportions and using the delta method, so \(y_j\) variance is

\[\sigma^2_j \approx \frac{1 - p_{1j}}{n_{1j}p_{1j}} + \frac{1 - p_{0j}}{n_{0j}p_{0j}}\]

# hand calculation of rr and se
dat$p1 <- dat$tic_O/dat$tic_N; dat$p0 <- dat$clo_O/dat$clo_N; dat$rr <- dat$p1/dat$p0; dat$lrr <- log(dat$rr)
dat$lse <- sqrt((1 - dat$p1)/(dat$p1 * dat$tic_N) + (1 - dat$p0)/(dat$p0 * dat$clo_N)); dat$lower <- exp(dat$lrr - qnorm(.975) * dat$lse); dat$upper <- exp(dat$lrr + qnorm(.975) * dat$lse)
# easier would be
# dat <- metafor::escalc(measure="RR", ai=tic_O, n1i = tic_N, ci=clo_O,  n2i =clo_N, data=dat)

Rather than doing this by hand, escalc from metafor package also works well

Meta-analysis - Tabular data

Based on hand calculations from previous slide

dat

          Region tic_N tic_O clo_N clo_O         p1         p0        rr
1             US   707    84   706    67 0.11881188 0.09490085 1.2519580
2 Eastern Europe  3820   299  3825   394 0.07827225 0.10300654 0.7598766
3 Western Europe  2725   240  2704   281 0.08807339 0.10392012 0.8475105
4           Asia   819    90   812   114 0.10989011 0.14039409 0.7827260
5  Latin America   621    91   615   104 0.14653784 0.16910569 0.8665459
6          Other   641    60   628    54 0.09360374 0.08598726 1.0885769
          lrr        lse     lower     upper
1  0.22470875 0.15491700 0.9241111 1.6961153
2 -0.27459929 0.07320748 0.6583074 0.8771167
3 -0.16545202 0.08359779 0.7194267 0.9983979
4 -0.24497252 0.13202467 0.6042688 1.0138866
5 -0.14324021 0.13178799 0.6692886 1.1219402
6  0.08487123 0.17897700 0.7665026 1.5459826

where \(lrr = log (RR_i)\)

e.g. for US lrr = log((84/707)/(67/706)) = 0.2247
US RR = exp(.2247) = 1.25

Meta-analysis tabular results

Quantitative results from packages metafor::rma or meta::metabin

library(metafor)
me.fe <- rma(dat$lrr, sei=dat$lse, method = "FE") # exponentiation c(exp(me.fe$b), exp(me.fe$ci.lb), exp(me.fe$ci.ub))
me.re <- rma(dat$lrr, sei=dat$lse, method = "REML") # exponentiation c(exp(me.re$b), exp(me.re$ci.lb), exp(me.re$ci.ub))
plot_Plato <- meta::metabin(dat$tic_O, dat$tic_N, dat$clo_O, dat$clo_N, sm="RR", method ="I", studlab=dat$Region, prediction=TRUE, comb.random =TRUE)
print(summary(plot_Plato,prediction=TRUE), digits=2)

                 RR       95%-CI %W(common) %W(random)
US             1.25 [0.92; 1.70]        8.1       12.8
Eastern Europe 0.76 [0.66; 0.88]       36.1       23.8
Western Europe 0.85 [0.72; 1.00]       27.6       22.2
Asia           0.78 [0.60; 1.01]       11.1       15.3
Latin America  0.87 [0.67; 1.12]       11.1       15.3
Other          1.09 [0.77; 1.55]        6.0       10.6

Number of studies: k = 6
Number of observations: o = 18623 (o.e = 9333, o.c = 9290)
Number of events: e = 1878

                       RR       95%-CI     z p-value
Common effect model  0.85 [0.78; 0.92] -3.75  0.0002
Random effects model 0.88 [0.77; 1.02] -1.73  0.0842
Prediction interval       [0.59; 1.33]              

Quantifying heterogeneity:
 tau^2 = 0.0162 [0.0000; 0.2158]; tau = 0.1272 [0.0000; 0.4645]
 I^2 = 54.2% [0.0%; 81.6%]; H = 1.48 [1.00; 2.33]

Test of heterogeneity:
     Q d.f. p-value
 10.91    5  0.0531

Details on meta-analytical method:
- Inverse variance method
- Restricted maximum-likelihood estimator for tau^2
- Q-Profile method for confidence interval of tau^2 and tau
- Prediction interval based on t-distribution (df = 4)

Meta-analysis - Forest plot

possible plotting functions metafor::forest or meta::forest

meta::forest(plot_Plato)

Do we understand these summary statistics and their difference?

Summary statistics

Fixed effect meta-analysis assumes a common treatment effect in each study and variation in observed study estimates is due only to chance, within study variation. The summary result provides the best estimate of an assumed common treatment effect

Random effects meta-analysis assumes the true treatment effect differs from study to study and provides an estimate of the average treatment effect considers both within and between study variation). The summary result gives the average from the distribution of treatment effects across studies

Prediction interval aids the interpretation of random effects meta-analysis by providing a predicted range for the true treatment effect in an individual (next) study

Bayesian meta-analysis

Maximum likelihood approach treats parameters as fixed
Ignores uncertainty in \(\tau\) and confidence intervals for \(\mu\) are too narrow
Bayesian model using brms wrapper package addresses this concern

library(tidybayes);library(brms)
# non-informative default prior
brm_out <- brm(lrr | se(lse) ~ 1 + (1|Region), #<<       
               data = dat, iter = 5000, warmup = 2000, cores = 4, refresh=0)   # #prior = set_prior("normal(0,10)", class = "sd"), summary(brm_out)
post <- brm_out %>%
  spread_draws(b_Intercept, r_Region[Region,]) %>%
  median_qi(condition_mean = b_Intercept + r_Region, .width = c(.95)) %>%
  mutate(Region=factor(Region, levels=c("Other","Latin.America","Asia", "Western.Europe", "Eastern.Europe","US" )))
cbind(post[,1], exp(post[,c(2:4)]))

cbind(post[,1], exp(post[,c(2:4)]))

          Region condition_mean    .lower    .upper
1           Asia      0.8296125 0.6616653 1.0057436
2 Eastern.Europe      0.7917701 0.6834757 0.9006256
3  Latin.America      0.8690452 0.7093868 1.0744040
4          Other      0.9444320 0.7624756 1.3088410
5             US      1.0297661 0.8056876 1.4245776
6 Western.Europe      0.8560940 0.7337112 0.9877441

Bayesian meta-analysis

The individual effects are shrunken towards the global mean (dotted line) and the variance is also improved by this borrowing of information and is better shown in the figure below.

p.dat <- data.frame(lower = post$.lower, rr = post$condition_mean, upper = post$.upper)
p.dat <- exp(p.dat)
p.dat$Region <- factor(c("Asia","Eastern Europe", "Latin America", "Other", "US","Western Europe"))
p.dat <- rbind(p.dat, dat[, c("lower", "rr", "upper", "Region")])
p.dat$lab <- rep(c("Theta", "Y"), each = 6)
p.dat$id <- c(4,2,5,6,1,3,1,2,3,4,5,6)
ggplot(p.dat, aes(x = factor(Region,levels=c("Other", "Latin America", "Asia", "Western Europe", "Eastern Europe", "US")), y = rr, ymin = lower, ymax = upper, col = lab)) +  
    geom_pointrange(aes(col = lab), position = position_dodge(width = 0.50)) +
    coord_flip() + geom_hline(aes(yintercept = 0.84), lty = 2) +  xlab("") + 
    ylab("")  + theme(legend.position="bottom") + geom_hline(aes(yintercept = 1), lty = 1) +
    scale_colour_discrete(name="", 
      labels = c("Theta" = bquote("Random effect:"~exp(theta[J])~" "),
       "Y"= bquote("Relative risk:"~exp(Y[J])))) +
  theme_bw()

Bayesian meta-analysis - Predictions

Easy in a Bayesian framework since study effects are assumed to be exchangeable
Simulate the posterior of \(\tilde{\theta_j}\) by drawing \(\tilde{\theta}_j \sim N(\mu, \tau)\) using \(\mu\) and \(\tau\) from the posterior

library(bayesmeta)     
ma01 <- bayesmeta(y = exp(dat[,"lrr"]), sigma = dat[,"lse"], labels = dat[,"Region"], mu.prior.mean = 0, mu.prior.sd = 4, tau.prior = function(t){dhalfnormal(t,scale=0.5)})
ma01$theta; ma01$summary # summary

                 US Eastern Europe Western Europe      Asia Latin America
y         1.2519580     0.75987655     0.84751053 0.7827260    0.86654589
sigma     0.1549170     0.07320748     0.08359779 0.1320247    0.13178799
mode      1.0003205     0.80668934     0.86023523 0.8502698    0.86891932
median    1.0344567     0.79999914     0.85972569 0.8398095    0.87445741
mean      1.0480330     0.79747707     0.85937876 0.8346058    0.87624220
sd        0.1530569     0.06993446     0.07175383 0.1015066    0.09956671
95% lower 0.7853382     0.65875763     0.71652872 0.6235269    0.67742674
95% upper 1.3458635     0.93052361     1.00151553 1.0321938    1.07999519
              Other
y         1.0885769
sigma     0.1789770
mode      0.8952275
median    0.9490039
mean      0.9666059
sd        0.1346969
95% lower 0.7308986
95% upper 1.2510222

                tau         mu     theta
mode      0.1151484 0.87873048 0.8669743
median    0.1411086 0.88993912 0.8843447
mean      0.1610604 0.89668778 0.8966878
sd        0.1128046 0.09628676 0.2190497
95% lower 0.0000000 0.71619581 0.4560304
95% upper 0.3708797 1.10094337 1.3694340

Bayesian meta-analysis - Predictions

Predictions can also be displayed graphically

forestplot(ma01, xlog=TRUE) #adds vertical line at x=1

Compare to MLE - random effects 0.88 (0.77 - 1.02), prediction interval 0.88 (0.59 - 1.33)

Bayesian meta-analysis - Predictions

Predictions can also be displayed graphically using ggridges package

Bayesian meta-analysis

In summary, the standard Bayesian random effects meta-analysis model follows a hierarchical approach \[ \beta_j = \mu + \mu_j \: where\>error\> term\> \mu_j \, \sim \, normal(0,\tau)\] where \(\mu\) is the average treatment effect in the (hypothetical) superpopulation of similar studies, \(\beta_j\) is the effect in the j-th individual study and the variance \(\tau^2\) is referred to as the heterogeneity and
\[ b_j = \beta_j + \epsilon \:where\>error\> term\> \epsilon_j \, \sim \, normal(0,s_j)\] models the uncertainty of the estimates from each individual study. We assume that the estimate \(b_j\) from the j-th study is unbiased and normally distributed with standard error \(s_j\).

Bayesian meta-analysis

While the observed treatment effect of a single unbiased clinical trial estimates the underlying “true” treatment effect, it is incorrect to view this as an immutable property of the treatment

Many reasons exist to expect additional variation among the underlying effects from differences between study populations, application of the treatment and measurement protocols among other factors if the study was repeated.

Leads to a paradox whereby the precision for a single trial (with less data) is better than for a meta-analysis of several trials (with more data) due to the consideration of the between study heterogeneity, \(\tau\), in the meta-analysis

Bayesian meta-analysis of a single trial!

To address this paradox Gelman has proposed using the distribution of treatment effects and heterogeneity among the 1,636 meta-analyses from the Cochrane Database of Systematic Reviews (CDSR) as prior information.

Using simulated data and cross validation, they show that their Bayesian “meta-analyses of single studies” perform much better than naively considering the single trial, which equates to setting \(\tau \> = 0\).

The CDSR prior on the heterogeneity results in better quantification of the uncertainty, reducing the mean squared error both for estimating the study-level and population-level effects by about 60% on average, the equivalent to more than doubling the sample size!

Hierarchical modeling can aid in causal inference

Causal inference can be formulated statistically as a missing-data problem & hierarchical modeling aids in causal inference by

1. Accounting for data collection: In any data analysis, it is appropriate to account for any individual or group characteristics that are predictive of treatment assignment and inclusion in the dataset
2. Adjusting for unmeasured covariates: In structured data, multilevel modeling can yield more efficient estimates than classical no-pooling estimates
3. Modeling variation in treatment effects: Can model not only the expected treatment effect as a function of pre-treatment covariates 𝑥, but can also model the unexplained variance in the treatment effect

References

Session info

sessionInfo()

R version 4.4.0 (2024-04-24)
Platform: aarch64-apple-darwin20
Running under: macOS 15.1.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Toronto
tzcode source: internal

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] ggridges_0.5.6       epiR_2.0.75          survival_3.7-0      
 [4] ggthemes_5.1.0       loo_2.7.0            baggr_0.7.8         
 [7] brms_2.21.0          modelsummary_2.1.1   cmdstanr_0.8.0      
[10] tidybayes_3.0.6      bayesmeta_3.4        mvtnorm_1.2-5       
[13] metafor_4.6-0        numDeriv_2016.8-1.1  metadat_1.2-0       
[16] Matrix_1.7-0         forestplot_3.1.3     abind_1.4-8         
[19] checkmate_2.3.2      patchwork_1.2.0      ggdist_3.3.2        
[22] janitor_2.2.0        distributional_0.5.0 lubridate_1.9.3     
[25] forcats_1.0.0        stringr_1.5.1        dplyr_1.1.4         
[28] purrr_1.0.2          readr_2.1.5          tidyr_1.3.1         
[31] tibble_3.2.1         ggplot2_3.5.1        tidyverse_2.0.0     
[34] haven_2.5.4          rstanarm_2.32.1      Rcpp_1.0.13         
[37] xaringanthemer_0.4.2 broom_1.0.6          here_1.0.1          
[40] knitr_1.48          

loaded via a namespace (and not attached):
  [1] svUnit_1.0.6            shinythemes_1.2.0       splines_4.4.0          
  [4] later_1.3.2             ggplotify_0.1.2         BiasedUrn_2.0.12       
  [7] xts_0.14.0              lifecycle_1.0.4         sf_1.0-16              
 [10] rprojroot_2.0.4         StanHeaders_2.32.9      vroom_1.6.5            
 [13] processx_3.8.4          lattice_0.22-6          MASS_7.3-61            
 [16] insight_0.20.1          crosstalk_1.2.1         backports_1.5.0        
 [19] magrittr_2.0.3          rmarkdown_2.27          yaml_2.3.10            
 [22] httpuv_1.6.15           zip_2.3.1               askpass_1.2.0          
 [25] pkgbuild_1.4.4          DBI_1.2.3               minqa_1.2.7            
 [28] multcomp_1.4-26         yulab.utils_0.1.8       TH.data_1.1-2          
 [31] tensorA_0.36.2.1        sandwich_3.1-0          gdtools_0.3.7          
 [34] ggrepel_0.9.5           inline_0.3.19           crul_1.4.2             
 [37] testthat_3.2.1.1        units_0.8-5             bridgesampling_1.1-2   
 [40] codetools_0.2-20        xml2_1.3.6              DT_0.33                
 [43] tidyselect_1.2.1        httpcode_0.3.0          farver_2.1.2           
 [46] bayesplot_1.11.1        lme4_1.1-35.4           matrixStats_1.4.1      
 [49] stats4_4.4.0            base64enc_0.1-3         mathjaxr_1.6-0         
 [52] showtext_0.9-7          jsonlite_1.8.9          e1071_1.7-14           
 [55] emmeans_1.10.2          systemfonts_1.1.0       tools_4.4.0            
 [58] ragg_1.3.2              glue_1.8.0              gridExtra_2.3          
 [61] xfun_0.46               withr_3.0.1             fastmap_1.2.0          
 [64] boot_1.3-30             fansi_1.0.6             openssl_2.2.0          
 [67] shinyjs_2.1.0           callr_3.7.6             digest_0.6.37          
 [70] timechange_0.3.0        R6_2.5.1                mime_0.12              
 [73] gridGraphics_0.5-1      estimability_1.5.1      textshaping_0.4.0      
 [76] colorspace_2.1-1        gtools_3.9.5            markdown_1.13          
 [79] threejs_0.3.3           utf8_1.2.4              generics_0.1.3         
 [82] fontLiberation_0.1.0    data.table_1.15.4       class_7.3-22           
 [85] htmlwidgets_1.6.4       pkgconfig_2.0.3         dygraphs_1.1.1.6       
 [88] gtable_0.3.5            brio_1.1.5              htmltools_0.5.8.1      
 [91] fontBitstreamVera_0.1.1 sysfonts_0.8.9          scales_1.3.0           
 [94] posterior_1.6.0         snakecase_0.11.1        rstudioapi_0.16.0      
 [97] uuid_1.2-1              tzdb_0.4.0              reshape2_1.4.4         
[100] coda_0.19-4.1           nlme_3.1-165            curl_5.2.1             
[103] nloptr_2.0.3            proxy_0.4-27            showtextdb_3.0         
[106] zoo_1.8-12              flextable_0.9.6         KernSmooth_2.23-24     
[109] parallel_4.4.0          miniUI_0.1.1.1          pillar_1.9.0           
[112] vctrs_0.6.5             shinystan_2.6.0         promises_1.3.0         
[115] arrayhelpers_1.1-0      xtable_1.8-4            evaluate_1.0.0         
[118] meta_7.0-0              cli_3.6.3               compiler_4.4.0         
[121] rlang_1.1.4             crayon_1.5.3            rstantools_2.4.0       
[124] labeling_0.4.3          classInt_0.4-10         ps_1.7.6               
[127] plyr_1.8.9              fs_1.6.4                pander_0.6.5           
[130] stringi_1.8.4           rstan_2.32.6            QuickJSR_1.2.2         
[133] tables_0.9.25           munsell_0.5.1           CompQuadForm_1.4.3     
[136] colourpicker_1.3.0      fontquiver_0.2.1        pacman_0.5.1           
[139] Brobdingnag_1.2-9       V8_4.4.2                hms_1.1.3              
[142] bit64_4.0.5             gfonts_0.2.0            shiny_1.8.1.1          
[145] igraph_2.0.3            RcppParallel_5.1.7      bit_4.0.5              
[148] officer_0.6.6