Skip to content

sdaza/experiment-utils-pd

Repository files navigation

ci PyPI version

Experiment utils

Generic functions for experiment analysis and design:

Installation

PyPI

pip install experiment-utils-pd

From GitHub

pip install git+https://github.com/sdaza/experiment-utils-pd.git

How to use it

Experiment Analyzer

Suppose you have a DataFrame df with columns for experiment group, treatment assignment, outcomes, and covariates.

import pandas as pd
from experiment_utils import ExperimentAnalyzer

# Example data
df = pd.DataFrame({
    "experiment_id": [1, 1, 1, 2, 2, 2],
    "user_id": [101, 102, 103, 201, 202, 203],
    "treatment": [0, 1, 0, 1, 0, 1],
    "age": [25, 34, 29, 40, 22, 31],
    "gender": [1, 0, 1, 0, 1, 0],
    "outcome1": [0, 1, 0, 1, 0, 1],
    "outcome2": [5.2, 6.1, 5.8, 7.0, 5.5, 6.8],
})

covariates = ["age", "gender"]

# Initialize analyzer with balance adjustment
analyzer = ExperimentAnalyzer(
    df,
    treatment_col="treatment",
    outcomes=["outcome1", "outcome2"],
    covariates=covariates,
    experiment_identifier=["experiment_id"],
    unit_identifier=["user_id"], # Optional: To retrieve balance weights
    adjustment="balance",  # Options: 'balance', 'IV', or None
    balance_method="ps-logistic",  # Options: 'ps-logistic', 'ps-xgboost', 'entropy'
    target_effect="ATE"  # Options: 'ATT', 'ATE', 'ATC'
)

# Estimate effects
analyzer.get_effects()
print(analyzer.results)

Parameters:

  • adjustment: Choose 'balance' for covariate balancing (using balance_method), 'IV' for instrumental variable adjustment, or None for unadjusted analysis.
  • balance_method: Selects the method for balancing: 'ps-logistic' (logistic regression), 'ps-xgboost' (XGBoost), or 'entropy' (entropy balancing).
  • target_effect: Specifies the estimand: 'ATT', 'ATE', or 'ATC'.

Retrieve IPW Weights

To inspect the weights and selected sample after balancing:

# Get the DataFrame with weights and experiment identifiers
weights_df = analyzer.weights
print(weights_df.head())

Non-inferiority test

Test for non-inferiority after estimating effects:

# Test non-inferiority with a 10% margin
analyzer.test_non_inferiority(relative_margin=0.10)
print(analyzer.results[["outcome1", "non_inferiority_margin", "ci_lower_bound", "is_non_inferior"]])

Multiple comparison adjustment

Adjust p-values for multiple outcomes per experiment:

# Bonferroni adjustment
analyzer.adjust_pvalues(method="bonferroni")
print(analyzer.results[["outcome1", "pvalue", "pvalue_adj", "stat_significance_adj"]])

# Or use FDR (Benjamini-Hochberg)
analyzer.adjust_pvalues(method="fdr_bh")
print(analyzer.results[["outcome1", "pvalue", "pvalue_adj", "stat_significance_adj"]])

Power Analysis

from experiment_utils import PowerSim
p = PowerSim(metric='proportion', relative_effect=False,
  variants=1, nsim=1000, alpha=0.05, alternative='two-tailed')

p.get_power(baseline=[0.33], effect=[0.03], sample_size=[3000])

Utilities

Balanced Random Assignment

You can use the balanced_random_assignment utility to assign units to experimental groups with forced balance. Optionally stratify by covariates to ensure balance within strata.

from experiment_utils.utils import balanced_random_assignment
import pandas as pd

# Example DataFrame
users = pd.DataFrame({
    "user_id": range(100),
    "age_group": ["young", "old"] * 50,
    "gender": ["M", "F"] * 50
})

# Binary assignment (test/control, 50/50) without stratification
users["assignment"] = balanced_random_assignment(users, allocation_ratio=0.5)
print(users)

# Binary assignment with stratification by age_group and gender
users["assignment_stratified"] = balanced_random_assignment(
    users, 
    allocation_ratio=0.5, 
    balance_covariates=["age_group", "gender"]
)
print(users)

# Multiple variants with equal allocation
users["assignment_multi"] = balanced_random_assignment(
    users, 
    variants=["control", "A", "B"]
)
print(users)

# Multiple variants with custom allocation and stratification
users["assignment_custom"] = balanced_random_assignment(
    users,
    variants=["control", "A", "B"],
    allocation_ratio={"control": 0.5, "A": 0.3, "B": 0.2},
    balance_covariates=["age_group"]
)
print(users)

About

Generic functions for experiment analysis and design

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages