loo_subsample slower than loo when many lookups are needed in complex models #1782

rubenarslan · 2025-05-27T08:45:28Z

loo_subsample is meant to be a faster alternative to loo and is most likely to be used for complex models.

Unfortunately, it seems the implementation in brms (unlike the implementation in loo_subsample) incurs overhead when looking up parameters that (in my simulations) costs more than the subsampling saves.

When I profile the function (documented in the forum), most of the time is spent in p called by predictor_re. Apparently, according to @jgabry you could get rid of the r_eff call but not log_lik. So, based on my limited understanding I thought, to solve this I guess either p would need to become faster (probably not possible) or the extraction of observations and draws happens once, vectorised?

My reprex:

library(brms)

options(mc.cores = 4, brms.backend="cmdstanr")
max_s_id <- max(inhaler$subject, na.rm = TRUE)

enlarged_inhaler <- purrr::map_dfr(
  .x = 0:9, # Creates 9 versions (0=original, 1=first new copy, ..., 9=last new copy)
  .f = ~ dplyr::mutate(inhaler, subject = subject + (.x * max_s_id))
)

fit1 <- brm(rating ~ treat + period + carry,
            data = enlarged_inhaler)

fit2 <- brm(rating ~ treat + (1|period) + carry + (1|subject),
            data = enlarged_inhaler)


options(mc.cores = 1, brms.backend="cmdstanr")

system.time({
  fit1l <- add_criterion(fit1, criterion = "loo", overwrite = T, ndraws = 500)
  fit2l <- add_criterion(fit2, criterion = "loo", overwrite = T, ndraws = 500)
})

system.time({
  fit1ls <- add_criterion(fit1, criterion = "loo_subsample", observations = 100, ndraws = 500, overwrite = T)
  fit2ls <- add_criterion(fit2, criterion = "loo_subsample", observations = fit1ls$criteria$loo_subsample, ndraws = 500, overwrite = T)
})

profvis::profvis({
  fit2ls <- add_criterion(fit2, criterion = "loo_subsample", observations = fit1ls$criteria$loo_subsample, ndraws = 500, overwrite = T)
})

The text was updated successfully, but these errors were encountered:

jgabry · 2025-05-27T16:00:24Z

Apparently, according to @jgabry you could get rid of the r_eff call but not log_lik.

Right, the r_eff calculation is nice to have but often not essential. In many of the packages that call the loo package (e.g. rstanarm, brms, cmdstanr, etc.) we were always computing r_eff. But @avehtari pointed out that it's usually not necessary and we should probably default not compute it unless the user requests it.

paul-buerkner added the feature label May 27, 2025

paul-buerkner added this to the brms 2.23.0 milestone May 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

loo_subsample slower than loo when many lookups are needed in complex models #1782

loo_subsample slower than loo when many lookups are needed in complex models #1782

rubenarslan commented May 27, 2025

jgabry commented May 27, 2025

Uh oh!

Uh oh!

loo_subsample slower than loo when many lookups are needed in complex models #1782

loo_subsample slower than loo when many lookups are needed in complex models #1782

Comments

rubenarslan commented May 27, 2025

jgabry commented May 27, 2025

Uh oh!