Skip to content

prior_summary() |> dplyr::select() alters the content in the prior column #1761

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ASKurz opened this issue Apr 2, 2025 · 5 comments
Open
Labels

Comments

@ASKurz
Copy link

ASKurz commented Apr 2, 2025

It looks like there's a bug when you try to subset the columns from prior_summary() with dplyr::select(). Some of the information in the prior column changes. To give a sense, here's a simplified version of fit2 from the inhaler documentation.

# Load
library(tidyverse)
library(brms)

# Fit
fit2.1 <- brm(rating ~ treat + (1 | subject),
              data = inhaler, 
              family = cumulative(),
              prior = set_prior("normal(0, 5)"),
              cores = 4, seed = 1)

# Before `select()`
prior_summary(fit2.1)
                prior     class      coef   group resp dpar nlpar lb ub       source
          normal(0,5)         b                                                 user
          normal(0,5)         b     treat                               (vectorized)
 student_t(3, 0, 2.5) Intercept                                              default
 student_t(3, 0, 2.5) Intercept         1                               (vectorized)
 student_t(3, 0, 2.5) Intercept         2                               (vectorized)
 student_t(3, 0, 2.5) Intercept         3                               (vectorized)
 student_t(3, 0, 2.5)        sd                                    0         default
 student_t(3, 0, 2.5)        sd           subject                  0    (vectorized)
 student_t(3, 0, 2.5)        sd Intercept subject                  0    (vectorized)

To my eye, this all looks correct. Now see what happens when we use select().

prior_summary(fit2.1) |> 
  select(prior, class, coef, group) 
                prior     class      coef   group    source
          normal(0,5)         b                   (unknown)
               (flat)         b     treat         (unknown)
 student_t(3, 0, 2.5) Intercept                   (unknown)
               (flat) Intercept         1         (unknown)
               (flat) Intercept         2         (unknown)
               (flat) Intercept         3         (unknown)
 student_t(3, 0, 2.5)        sd                   (unknown)
               (flat)        sd           subject (unknown)
               (flat)        sd Intercept subject (unknown)

Several rows have now changed to (flat) in the prior column. The issue appears the same if you instead use the get_prior(fit2.1) or fit2.1$prior methods. It also persists if you use the base bracket notation in place of select() (prior_summary(fit2.1)[, 1:4]).

Though my primary interest is with the prior column, I find the behavior of the source column surprising, too. When I tried to drop that column with select(), it still appeared in the print output, but now with all (unknown).

@paul-buerkner
Copy link
Owner

paul-buerkner commented Apr 2, 2025 via email

@lunafazio
Copy link
Contributor

when changing the class to a regular data.frame the special printing is no longer applied. that is what you are seeing

That's not exactly true. dplyr does keep the brmsprior class, which is why (flat) is printed instead of the actual content of those entries which is just an empty string.

There are a few other weird things that can happen when manipulating the brmsprior frame in ways that alter its structure. I suppose this could be avoided with some sort of validation that warns the user when they break structure or proactively unclasses the object as draws_df do when they lose their required metadata columns.

@paul-buerkner
Copy link
Owner

Yeah that is a good point. Unclassing brmsprior objects upon certain columns being changes/dropped could be sensible.

@ASKurz
Copy link
Author

ASKurz commented Apr 3, 2025

@paul-buerkner, I'm not sure if this is implied by your last comment, but from my standpoint it would be really great if code like this

prior_summary(fit2.1) |> 
  data.frame()
                 prior     class      coef   group resp dpar nlpar lb ub  source
1          normal(0,5)         b                                            user
2                              b     treat                               default
3 student_t(3, 0, 2.5) Intercept                                         default
4                      Intercept         1                               default
5                      Intercept         2                               default
6                      Intercept         3                               default
7 student_t(3, 0, 2.5)        sd                                    0    default
8                             sd           subject                       default
9                             sd Intercept subject                       default

carried the filled-in cells in the prior column down to the blank cells below, similar to what happens in the print() method.

prior_summary(fit2.1) |> 
  print()
                prior     class      coef   group resp dpar nlpar lb ub       source
          normal(0,5)         b                                                 user
          normal(0,5)         b     treat                               (vectorized)
 student_t(3, 0, 2.5) Intercept                                              default
 student_t(3, 0, 2.5) Intercept         1                               (vectorized)
 student_t(3, 0, 2.5) Intercept         2                               (vectorized)
 student_t(3, 0, 2.5) Intercept         3                               (vectorized)
 student_t(3, 0, 2.5)        sd                                    0         default
 student_t(3, 0, 2.5)        sd           subject                  0    (vectorized)
 student_t(3, 0, 2.5)        sd Intercept subject                  0    (vectorized)

Same request about what happens with the lb (and presumably the ub) column. It'd be great if the values carried down in the data frame, as they do with the print() method.

@paul-buerkner
Copy link
Owner

I see. Could make sense. I will have to think about what the best behavior is in that regard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants