Update package site:

trannttoan · trannttoan · commit 918aec7a8798 · 2025-10-03T14:39:10.000-04:00
- Reorganize README/landing page
- Move full installation instruction to README
- Add R Markdown for MRP method description
diff --git a/README.md b/README.md
@@ -16,47 +16,49 @@
 
 ## Getting Started
 
+You can use **shinymrp** in two flexible ways:
 
-You can use **shinymrp** in two flexible ways, both available through a single easy installation:
-
-1. Shiny App
+### Shiny App
 
 The graphical user interface (GUI), built with the Shiny framework, is designed for newcomers and those looking for an interactive, code-free analysis experience.
 
-2. Object-Oriented Programming Interface
+Launch the app locally in R with:
 
-Leverage the full flexibility of the exported R6 classes for a programmatic workflow, ideal for advanced users and those integrating MRP into larger R projects.
+```r
+shinymrp::run_app()
+```
 
-### Installation 
+#### Try the Demo
 
-To get started, install the latest development version from [GitHub](https://github.com/mrp-interface/shinymrp):
+Explore the Shiny app without installation via our [online demo](https://mrpinterface.shinyapps.io/shinymrp/).
 
-```R
-# If you don't have 'remotes', install it first:
-install.packages('remotes')
-remotes::install_github('mrp-interface/shinymrp')
-```
-### Launch the Shiny App
+Need a walkthrough? Watch our step-by-step [video tutorial](https://youtu.be/CUcRYn92fmU?si=EhcAbuwuG2XM-0N0).
 
-New to **shinymrp**? We recommend starting with the Shiny app:
+### Object-Oriented Programming Interface
 
-```R
-shinymrp::run_app()
+Leverage the full flexibility of the exported R6 classes for a programmatic workflow, ideal for advanced users and those integrating MRP into larger R projects.
+
+Import **shinymrp** in scripts or R Markdown documents just like any other R package:
+
+```r
+library(shinymrp)
 ```
 
-### Import programmatic components
+### Installation
 
-For those experienced with R and object-oriented programming, use the package in scripts or R Markdown documents:
+To get started, install the latest development version of **shinymrp** from [GitHub](https://github.com/mrp-interface/shinymrp) using `remotes`:
 
-```R
-library(shinymrp)
+```r
+# If 'remotes' is not installed:
+install.packages("remotes") 
+remotes::install_github("mrp-interface/shinymrp")
 ```
 
-## Try the Demo
+The package installation does not automatically install all prerequisites. Specifically, **shinymrp** uses [CmdStanR](https://mc-stan.org/cmdstanr/) as the bridge to run [Stan](https://mc-stan.org/), a state-of-the-art platform for Bayesian modeling. Stan requires a modern C++ toolchain (compiler and GNU Make build utility). 
 
-Explore the **shinymrp** features instantly, no installation required, via our [online demo](https://mrpinterface.shinyapps.io/shinymrp/). 
+- For setting up your toolchain, see [Stan’s documentation](https://mc-stan.org/docs/cmdstan-guide/installation.html#cpp-toolchain).
+- Once ready, follow the [CmdStanR installation instructions](https://mc-stan.org/cmdstanr/articles/cmdstanr.html#installing-cmdstan) to install CmdStanR and CmdStan.
 
-Need a walkthrough? Watch our step-by-step [video tutorial](https://youtu.be/CUcRYn92fmU?si=EhcAbuwuG2XM-0N0).
 
 ## Learn More
 
diff --git a/_pkgdown.yml b/_pkgdown.yml
@@ -4,6 +4,7 @@ url: https://mrp-interface.github.io/shinymrp/
 template:
   bootstrap: 5
   bootswatch: cosmo
+  math-rendering: mathjax
 
 navbar:
   logo:
@@ -47,8 +48,9 @@ articles:
       - getting-started
   - title: "More details"
     desc: >
-      A deeper dive into the programmatic interface and data preprocessing in shinymrp.
+      A deeper dive into the programmatic interface, the methodology, and data preprocessing in shinymrp.
     contents:
       - workflow
       - data-prep
+      - method
       - example
diff --git a/vignettes/getting-started.Rmd b/vignettes/getting-started.Rmd
@@ -18,22 +18,7 @@ vignette: >
 
 ![](./figures/workflow.png)
 
-If you prefer a graphical and interactive experience, you can launch the Shiny app with `shinymrp::run_app()`, which includes a built-in user guide. Users interested in using the programmatic interface can follow examples in the vignettes, starting with the [Key steps](https://mrp-interface.github.io/shinymrp/articles/getting-started#key-steps) section below. Regardless of the interface you choose, please follow the instructions below to install the package prerequisites.
-
-## Installation
-
-To get started, install the latest development version of **shinymrp** from [GitHub](https://github.com/mrp-interface/shinymrp):
-
-```{r, eval = FALSE}
-# If 'remotes' is not installed:
-install.packages("remotes") 
-remotes::install_github("mrp-interface/shinymrp")
-```
-The package installation does not automatically install all prerequisites. Specifically, **shinymrp** uses [CmdStanR](https://mc-stan.org/cmdstanr/) as the bridge to run [Stan](https://mc-stan.org/), a state-of-the-art platform for Bayesian modeling. Stan requires a modern C++ toolchain (compiler and GNU Make build utility). 
-
-- For setting up your toolchain, see [Stan’s documentation](https://mc-stan.org/docs/cmdstan-guide/installation.html#cpp-toolchain).
-- Once ready, follow the [CmdStanR installation instructions](https://mc-stan.org/cmdstanr/articles/cmdstanr.html#installing-cmdstan) to install CmdStanR and CmdStan.
-
+If you prefer a graphical and interactive experience, you can launch the Shiny app with `shinymrp::run_app()`, which includes a built-in user guide. Users interested in using the programmatic interface can follow examples in the vignettes, starting with the [Key steps](https://mrp-interface.github.io/shinymrp/articles/getting-started#key-steps) section below.
 
 ## Key steps
 
diff --git a/vignettes/method.Rmd b/vignettes/method.Rmd
@@ -0,0 +1,114 @@
+---
+title: "MRP methodological guide"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{MRP methodological guide}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r, include = FALSE}
+knitr::opts_chunk$set(
+  collapse = TRUE,
+  comment = "#>"
+)
+```
+
+MRP has two key steps: (1) fit a multilevel model for the response with the adjustment variables based on the input data; and (2) poststratify using the population distribution of the adjustment variables, yielding prevalence estimates in the target population and subgroups.
+
+## MRP for cross-sectional data
+
+We use cross-sectional data to refer to the dataset with measures collected at a specific time point that does not account for temporal variation in the modeling or poststratification adjustment. We use a binary outcome of interest as an example. Let $y_i (=0/1)$ be the binary response for individual $i$, with $y_i=1$ indicating the positive response. We employ a logistic regression with varying effects for age, race, and ZIP code, where the ZIP-code-level variation is further explained by the ZIP-code-level predictors.
+\[
+\label{mrp-1}
+\textrm{Pr}(y_i = 1) = \textrm{logit}^{-1}(
+\beta_1+\beta_2{\rm male}_i +
+\alpha_{\rm a[i]}^{\rm age}
++ \alpha_{\rm r[i]}^{\rm race}
++ \alpha_{\rm s[i]}^{\rm ZIP}
+),
+\]
+where ${\rm male}_i$ is an indicator for men, $\alpha_{\rm a}^{\rm age}$ is the age effect, with a value of $a[i]$ for subject $i$, on the log-odds function of the probability of having a positive response, $\alpha_{\rm r}^{\rm race}$ is the racial effect, and $\alpha_{\rm s}^{\rm ZIP}$ is the ZIP-code-level effect. In the Bayesian framework, we assign hierarchical priors to varying intercepts as default:
+\begin{align}
+\label{prior}
+\nonumber &\alpha^{\rm age} \sim \mbox{normal}(0,\sigma^{\rm age} ), \,\,\, \sigma^{\rm age}\sim \mbox{normal}_+ (0,2.5)\\
+&\alpha^{\rm race} \sim \mbox{normal}(0,\sigma^{\rm race} ), \,\,\, \sigma^{\rm race}\sim \mbox{normal}_+ (0,2.5).
+\end{align}
+Here $\mbox{normal}_+ (0,2.5)$ represents a half-normal distribution with the mean $0$ and standard deviation $2.5$ restricted to positive values. As we have ZIP-code-level predictors $\vec{Z}^{\rm ZIP}_{s}$, we need to build another model in which $\alpha_{\rm s}^{\rm ZIP}$ is the outcome of a linear regression with ZIP-code-level predictors:
+\begin{align}
+\label{prior-zip}
+\alpha_{\rm s}^{\rm ZIP} =\vec{\alpha}\vec{Z}^{\rm ZIP}_{s} +  e_s, \,\,\, e_s\sim \mbox{normal}(0,\sigma^{\rm ZIP} ),\,\,\, \sigma^{\rm ZIP}\sim \mbox{normal}_+ (0,2.5),
+\end{align}
+where $e_s$ is a ZIP-code-level random error.
+
+The interface allows users to specify alternative priors, including structured priors for high-order interaction terms developed by [Si et al. (2020)](https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2020002/article/00003-eng.pdf?st=iF1_Fbrh).
+
+Because the outcome model assumes that the people in the same poststratification cell share the same response probability, we can replace the microdata with cellwise aggregates and employ a binomial model for the sum of the responses in cell $j$ as $y^*_j \sim \textrm{binomial}(n_j, \theta_j)$, where $n_j$ is the sample cell size and $\theta_j=\textrm{logit}^{-1}(
+\beta_1+\beta_2{\rm male}_j +
+\alpha_{\rm a[j]}^{\rm age}
++ \alpha_{\rm r[j]}^{\rm race}
++ \alpha_{\rm s[j]}^{\rm ZIP}
+)
+$ using the cellwise effects of all factors. The interface thus allows users to upload microdata or cellwise aggregates as the input data. 
+
+To generate overall population or subgroup estimates, we combine model predictions within the poststratification cells---in the contingency table of sex, age, race, and ZIP---weighted by the population cell frequencies $N_j$, which are derived from the linked ACS data in our application. Additionally, users may choose to upload custom poststratification data for specific target populations (e.g., a different country, rather than the U.S.). If we write the expected outcome in cell $j$ as $\hat{\theta}_j$ in cell $j$, the population average from MRP is then:
+$$
+\hat{\theta}^{\rm pop} = \frac{\sum_j N_j \hat{\theta}_j}{\sum_j N_j}.
+$$
+The MRP estimator for county $c$ aggregates over covered cells $j$ in that county as,
+$$
+\hat{\theta}_s^{\rm pop} = \frac{\sum_{j \in \textrm{county c}} N_j \hat{\theta}_j}{\sum_{j \in \textrm{county c}} N_j}.
+$$
+We implement Bayesian inference for the estimates, where the variance estimates and 95\% credible intervals are computed based on the posterior samples.
+
+When the outcome is continuous, we specify linear regression models and estimate residual variance with introduced prior distributions.
+
+## MRP for time-varying data with measurement error
+
+As an example of time-varying data, we model weekly PCR testing results. We use a Bayesian framework to account for the PCR testing sensitivity and specificity. Here, MRP proceeds in two steps: (1) fit a multilevel model to the testing data for incidence incorporating time and covariates, and (2) poststratify using the population distribution of the adjustment variables: sex, age, race, and ZIP codes, where we assume the population distribution is the same during the study period. Hence, the poststratification cell is defined by the cross-tabulation of sex, age, race, ZIP code, and indicators of time in weeks based on the test result dates.
+
+We denote the PCR test result for individual $i$ as $y_i$, where $y_i=1$ indicates a positive result and $y_i=0$ indicates negative. Similarly, with poststratification cells, we assume that people in the same cell have the same infection rate and can directly model cellwise summaries. We obtain aggregated counts as the number of tests $n_j$ and the number of positive cases $y^*_j$ in cell $j$. Let $p_j=\textrm{Pr}(y_{j[i]}=1)$ be the probability that person $i$ in cell $j$ tests positive. We account for the PCR testing sensitivity and specificity, where the positivity $p_j$ is a function of the test sensitivity $\delta$, specificity $\gamma$, and the true incidence $\pi_j$ for people in cell $j$:  
+\begin{align}
+\label{positivity}
+p_j=(1-\gamma)(1-\pi_j )+\delta \pi_j.
+\end{align}
+
+We fit a binomial model for $y^*_j$, $y^*_j \sim \textrm{binomial}(n_j, p_j)$ with a logistic regression for $\pi_j$ with covariates---sex, age, race, ZIP codes, and time in weeks---to allow time-varying incidence in the multilevel model.
+\begin{align}
+\label{pi}
+\textrm{logit}(\pi_j)=\beta_1+\beta_2{\rm male}_j+\alpha_{{\rm a}[j]}^{\rm age}+\alpha_{{\rm r}[j]}^{\rm race}+\alpha_{{\rm s}[j]}^{\rm ZIP}+\alpha_{{\rm t}[j]}^{\rm time},
+\end{align}
+where ${\rm male}_j$ is an indicator for men; ${\rm a}[j]$, ${\rm r}[j]$, and ${\rm s}[j]$ represent age, race, and ZIP levels; and ${\rm t}[j]$ denotes the time in weeks when the test result is collected for cell $j$. We include ZIP-code-level predictors $\vec{Z}^{\rm ZIP}_{s}$ for ZIP code $s$,
+\[
+\alpha_{s}^{\rm ZIP} =\vec{\alpha}\vec{Z}^{\rm ZIP}_{s} +  e_s.
+\]
+We assign the same priors to those in the cross-sectional case to varying intercepts and error terms $e_s$. 
+\begin{align}
+\nonumber &\alpha^{\rm age} \sim \mbox{normal}(0,\sigma^{\rm age} ), \,\,\, \sigma^{\rm age}\sim \mbox{normal}_+ (0,2.5)\\
+&\alpha^{\rm race} \sim \mbox{normal}(0,\sigma^{\rm race} ), \,\,\, \sigma^{\rm race}\sim \mbox{normal}_+ (0,2.5).\\
+\alpha_{\rm s}^{\rm ZIP} &=\vec{\alpha}\vec{Z}^{\rm ZIP}_{s} +  e_s, \,\,\, e_s\sim \mbox{normal}(0,\sigma^{\rm ZIP} ),\,\,\, \sigma^{\rm ZIP}\sim \mbox{normal}_+ (0,2.5).
+\end{align}
+
+As to time-varying effects, we assume $\alpha_{{\rm t}}^{\rm time} \sim \mbox{normal}(0,\sigma^{\rm time} )$, with a weakly informative hyperprior, $\sigma^{\rm time}\sim \mbox{normal}_+ (0,5)$.
+
+As an example, we assign normal priors to the ZIP-code-level and time-varying effects. The interface leverages Stan’s modeling capabilities to allow alternative prior choices and can be extended with advanced modeling.
+
+Using the estimated incidence $\hat{\pi}_j$, we adjust for selection bias by applying the sociodemographic distributions in the community with population cell counts $N_j$ based on the ACS, yielding population-level weekly incidence estimates:
+\[
+\hat{\pi}_{t} = \frac{\sum_{j \in \mbox{week,} t} N_j\hat{\pi}_j}{\sum_{j \in \mbox{week,} t} N_j}, 
+\]
+which can be restricted to specific subgroups or regions of interest, as another key property of MRP is to yield robust estimates for small groups. We obtain the Bayesian credible intervals from the posterior samples for inference. 
+
+## More readings
+
+1. [Y Si, T Tran, J Gabry, M Morris, and A Gelman (2025), Multilevel Regression and Poststratification Interface: Application to Track Community-level COVID-19 Viral Transmission, Population Health Metrics (under review)](http://arxiv.org/abs/2405.05909).
+
+2. [Y Si (2025). On the Use of Auxiliary Variables in Multilevel Regression and Poststratification, Statistical Science, 40(2), 272--288](http://dx.doi.org/10.1214/24-STS932).
+
+3. [Y Si, L Covello, S Wang, T Covello, and A Gelman (2022). Beyond Vaccination Rates: A Synthetic Random Proxy Metric of Total SARS-CoV-2 Immunity Seroprevalence in the Community, Epidemiology, 33(4), 457--464](https://journals.lww.com/epidem/Fulltext/2022/07000/Beyond_Vaccination_Rates__A_Synthetic_Random_Proxy.3.aspx).
+
+4. [L Covello, A Gelman, Y Si, and S Wang (2021). Routine Hospital-Based SARS-CoV-2 Testing Outperforms State-Based Data in Predicting Clinical Burden, Epidemiology, 32(6), 792--799](https://journals.lww.com/epidem/Fulltext/2021/11000/Routine_Hospital_based_SARS_CoV_2_Testing.4.aspx).
+
+5. [Y Si, R Trangucci, J Gabry, and A Gelman (2020). Bayesian Hierarchical Weighting Adjustment and Survey Inference, Survey Methodology, 46(2), 181--214](https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2020002/article/00003-eng.pdf?st=iF1_Fbrh).
+
+