EarthyScience
diff --git a/‎.gitignore‎
Lines changed: 4 additions & 0 deletions b/‎.gitignore‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 67 additions & 7 deletions b/‎README.md‎
Lines changed: 67 additions & 7 deletions
diff --git a/‎dev/Project.toml‎
Lines changed: 6 additions & 0 deletions b/‎dev/Project.toml‎
Lines changed: 6 additions & 0 deletions
@@ -10,3 +10,7 @@ tmp/
 **/tmp.svg
 dev/intermediate/*
 dev/tmp.pdf
+docs/src/**/*_files/libs
+docs/src/**/*.html
+docs/src/**/*.ipynb
+docs/src/**/*Manifest.toml
@@ -1,16 +1,76 @@
-# HybridVariationalInference
+# HybridVariationalInference HVI
 
 [![Stable](https://img.shields.io/badge/docs-stable-blue.svg)](https://EarthyScience.github.io/HybridVariationalInference.jl/stable/)
 [![Dev](https://img.shields.io/badge/docs-dev-blue.svg)](https://EarthyScience.github.io/HybridVariationalInference.jl/dev/)
 [![Build Status](https://github.com/EarthyScience/HybridVariationalInference.jl/actions/workflows/CI.yml/badge.svg?branch=main)](https://github.com/EarthyScience/HybridVariationalInference.jl/actions/workflows/CI.yml?query=branch%3Amain)
 [![Coverage](https://codecov.io/gh/EarthyScience/HybridVariationalInference.jl/branch/main/graph/badge.svg)](https://codecov.io/gh/EarthyScience/HybridVariationalInference.jl)
 [![Aqua](https://raw.githubusercontent.com/JuliaTesting/Aqua.jl/master/badge.svg)](https://github.com/JuliaTesting/Aqua.jl)
 
-Extending Variational Inference (VI), an approximate bayesian inversion method,
-to hybrid models, i.e. models that combine mechanistic and machine-learning parts.
+Estimating uncertainty in hybrid models, 
+i.e. models that combine mechanistic and machine-learning parts,
+by extending Variational Inference (VI), an approximate bayesian inversion method.
+
+## Problem
+
+Consider the case of Parameter learning, a special case of hybrid models, 
+where a machine learning model, $g_{\phi_g}$, uses known covariates $x_{Mi}$ at site $i$,
+to predict a subset of the parameters, $\theta$ of the process based model, $f$.
+
+The analyst is interested in both,
+- the uncertainty of hybrid model predictions, $ŷ$ (predictive posterior), and
+- the uncertainty of process-model parameters $\theta$, including their correlations
+  (posterior)
+
+For example consider a soil organic matter process-model that predicts carbon stocks for 
+different sites. We need to parameterize the unknown carbon use efficiency (CUE) of the soil
+microbial community that differs by site, but is hypothesized to correlate with climate variables
+and pedogenic factors, such as clay content.
+We apply a machine learning model to estimate CUE and fit it end-to-end with other
+parameters of the process-model to observed carbon stocks.
+In addition to the predicted CUE, we are interested in the uncertainty of CUE and its correlation 
+with other parameters. 
+We are interested in the entire posterior probability distribution of the model parameters.
+
+To understand the background of HVI, refer to the [documentation]((https://EarthyScience.github.io/HybridVariationalInference.jl/dev/)).
+
+## Usage
+![image info](./docs/src/hybrid_variational_setup.png)
+
+In order to apply HVI, the user has to construct a `HybridProblem` object by specifying
+- the machine learning model, $g$
+- covariates $X_{Mi}$ for each site, $i$
+- the names of parameters that differs across sites, $\theta_M$, and global parameters
+  that are the same across sites, $\theta_P$
+  - optionally, sub-blocks in the within-site correlation structure of the parameters
+  - optionally, which global parameters should be provided to $g$ as additional covariates,
+    to account for correlations between global and site parameters
+- the parameter transformations from unconstrained scale to the scale relevant to the process models, $\theta = T(\zeta)$, e.g. for strictly positive parameters specify `exp`.
+- the process-model, $f$
+- drivers of the process-model $X_{Pi}$ at each site, $i$
+- the likelihood function of the observations, given the model predictions, $p(y|ŷ, \theta)$
+
+Next this problem is passed to a `HybridPosteriorSolver` that fits an approximation
+of the posterior. It returns a NamedTuple of
+- `ϕ`: the fitted parameters, a ComponentVector with components
+  - the machine learning model parameters (usually weights), $\phi_g$
+  - means of the global parameters, $\phi_P = \mu_{\zeta_P}$ at transformed 
+    unconstrained scale
+  - additional parameters, $\phi_{unc}$ of the posterior, $q(\zeta)$, such as 
+    coefficients that describe the scaling of variance with magnitude 
+    and coefficients that parameterize the choleski-factor or the correlation matrix.
+- `θP`: predicted means of the global parameters, $\theta_P$ 
+- `resopt`: the original result object of the optimizer (useful for debugging)
+
+TODO to get
+- means of the site parameters for each site
+- samples of posterior
+- samples of predictive posterior
+## Example
+TODO
+
+see test/test_HybridProblem.jl
+
+
+
 
-The model inversion, infers parametric approximations of posterior density
-of model parameters, by comparing model outputs to uncertain observations. At
-the same time, a machine learning model is fit that predicts parameters of these
-approximations by covariates.
 
@@ -1,15 +1,21 @@
 [deps]
+AlgebraOfGraphics = "cbdf2221-f076-402e-a563-3d30da359d67"
 Bijectors = "76274a88-744f-5084-9051-94815aaf08c4"
 CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
+CairoMakie = "13f3f980-e62b-5c42-98c6-ff1f3baf88f0"
+CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597"
 ChainRulesTestUtils = "cdddcdb0-9152-4a09-a978-84456f9df70a"
+ColorBrewer = "a2cac450-b92f-5266-8821-25eda20663c8"
 ComponentArrays = "b0b7db55-cfe3-40fc-9ded-d10e2dbeff66"
 DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
 DistributionFits = "45214091-1ed4-4409-9bcf-fdb48a05e921"
 Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
+FigureHelpers = "9ae22f58-2487-4805-bfc5-386577db46c8"
 Flux = "587475ba-b771-5e3f-ad9e-33799f191a9c"
 GPUArraysCore = "46192b85-c4d5-4398-a991-12ede77f4527"
 HybridVariationalInference = "a108c475-a4e2-4021-9a84-cfa7df242f64"
 JLD2 = "033835bb-8acc-5ee8-8aae-3f567f8a3819"
+LaTeXStrings = "b964fa9f-0449-5b57-a5c2-d3ea65f4040f"
 MCMCChains = "c7f686f2-ff18-58e9-bc7b-31028e88f75d"
 MLDataDevices = "7e8f7934-dd98-4c1a-8fe8-92b47a384d40"
 MLUtils = "f1d291b0-491e-4a28-83b9-f70985020b54"