Repository: https://github.com/chirindaopensource/identifying_quantifying_financial_bubbles_hyped_log_period_power_law
Owner: 2025 Craig Chirinda (Open Source Projects)
This repository contains an independent, professional-grade Python implementation of the research methodology from the 2025 paper entitled "Identifying and Quantifying Financial Bubbles with the Hyped Log-Periodic Power Law Model" by:
- Zheng Cao
- Xingran Shao
- Yuheng Yan
- Helyette Geman
The project provides a complete, end-to-end computational framework for replicating the paper's findings. It delivers a modular, auditable, and extensible pipeline that executes the entire research workflow: from rigorous data validation and NLP feature engineering to LPPL model fitting, deep learning, and backtesting.
- Introduction
- Theoretical Background
- Features
- Methodology Implemented
- Core Components (Notebook Structure)
- Key Callable:
execute_full_study - Prerequisites
- Installation
- Input Data Structure
- Usage
- Output Structure
- Project Structure
- Customization
- Contributing
- Recommended Extensions
- License
- Citation
- Acknowledgments
This project provides a Python implementation of the methodologies presented in the 2025 paper "Identifying and Quantifying Financial Bubbles with the Hyped Log-Periodic Power Law Model." The core of this repository is the iPython Notebook identifying_quantifying_financial_bubbles_hyped_log_period_power_law_draft.ipynb, which contains a comprehensive suite of functions to replicate the paper's findings, from initial data validation to the final generation of all analytical tables and figures.
The paper proposes a novel framework (HLPPL) that fuses three distinct domains—econophysics, natural language processing, and deep learning—to create a superior, real-time indicator of financial asset mispricing. This codebase operationalizes that framework, allowing users to:
- Rigorously validate and manage the entire experimental configuration via a
config.yamlfile. - Process raw market data and news text through a multi-stage feature engineering pipeline.
- Fit the Log-Periodic Power Law (LPPL) model at scale using a robust, multi-start optimization strategy.
- Construct the novel
BubbleScoreby fusing technical and behavioral signals. - Train a state-of-the-art Dual-Stream Transformer model to forecast the
BubbleScore. - Run a complete, event-driven backtest to evaluate the trading performance of the generated signals.
- Automatically conduct ablation and sensitivity studies to validate the model's robustness.
The implemented methods are grounded in econophysics, behavioral finance, and deep learning.
1. Log-Periodic Power Law (LPPL) Model:
Originating from the physics of critical phenomena, the LPPL model describes the super-exponential growth of an asset price leading up to a crash (a critical point). The implementation fits the 7-parameter model defined in Equation (1):
$$
\ln p(t) = A + B(t_c - t)^m + C(t_c - t)^m \cos(\omega \ln(t_c - t) + \phi)
$$
The normalized residual from this fit,
2. Behavioral Finance Signals (NLP): Two NLP-derived features are constructed to capture market psychology:
-
Hype Index (
$H_{i,t}$ ): The share of media attention a stock receives on a given day, measuring intensity. (Equation 11) -
Sentiment Score (
$S_{i,t}$ ): The confidence-weighted average sentiment (positive, neutral, negative) of news articles, measuring tone. (Equation 9)
3. Hyped LPPL (HLPPL) BubbleScore:
The paper's core innovation is the fusion of the technical and behavioral signals into a single BubbleScore. The formula is regime-dependent, with the Hype Index acting as an amplifier in both positive and negative deviations. (Equation 14)
$$
\text{BubbleScore}{i}(t) =
\begin{cases}
\epsilon{\text{norm}}(t) + \alpha_1 H_{i,t} + \alpha_2 S_{i,t}, & \text{if } \epsilon_{\text{norm}}(t) > 0 \
\epsilon_{\text{norm}}(t) - \alpha_1 H_{i,t} + \alpha_2 S_{i,t}, & \text{if } \epsilon_{\text{norm}}(t) \le 0
\end{cases}
$$
4. Dual-Stream Transformer:
A deep learning model is trained to forecast the BubbleScore. Its architecture is designed to process stock-specific features and market-wide features in parallel, allowing them to interact via a bi-directional cross-attention mechanism before making a final prediction.
The provided iPython Notebook (identifying_quantifying_financial_bubbles_hyped_log_period_power_law_draft.ipynb) implements the full research pipeline, including:
- Modular, Multi-Task Architecture: The entire pipeline is broken down into 33 distinct, modular tasks, each with its own orchestrator function for maximum clarity and testability.
- Configuration-Driven Design: All study parameters are managed in an external
config.yamlfile, allowing for easy customization and replication. - Idempotent & Resumable Pipeline: Computationally expensive steps (e.g., NLP processing, LPPL fitting, model training) create checkpoint files, allowing the pipeline to be resumed efficiently.
- Robust LPPL Fitting: Implements a multi-start, constrained non-linear least squares optimization to robustly fit the 7-parameter LPPL model across thousands of rolling windows.
- State-of-the-Art Deep Learning: Implements a Dual-Stream Transformer in PyTorch with modern training techniques (
AdamW,OneCycleLR, gradient clipping, early stopping) and a custom multi-component loss function. - Realistic Event-Driven Backtester: Simulates trading performance with daily stop-loss checks and transaction costs.
- Automated Ablation & Sensitivity Analysis: Includes a top-level orchestrator to automatically re-run the entire pipeline under different configurations to test the contribution of each model component.
The notebook is a direct, sequential implementation of the paper's methodology:
- Validation & Cleansing (Tasks 1-6): Ingests and validates the
config.yamland raw data, cleanses the data, adjusts for corporate actions, and engineers primary features. - NLP Feature Engineering (Tasks 7-12): Uses BERTopic and FinBERT to process news text and generate the
Sentiment_ScoreandHype_Index. - LPPL Signal Generation (Tasks 13-18): Defines rolling windows, fits the LPPL model, computes normalized residuals, fuses them into the
BubbleScore, and labels discrete episodes. - ML Data Preparation (Tasks 19-22): Normalizes features (with leakage protection), constructs fixed-length sequences for the stock and market streams, creates multi-horizon targets, and performs a strict chronological split.
- Deep Learning (Tasks 23-28): Defines the
DualStreamTransformerarchitecture andCompositeLoss, trains the model with early stopping, persists the final artifact, and evaluates its out-of-sample predictive performance. - Backtesting (Tasks 29-31): Converts predictions into discrete trading signals, runs the event-driven backtest, and computes a full suite of performance metrics.
- Final Orchestration (Tasks 32-33): Provides top-level functions to run the entire baseline pipeline and the full suite of ablation studies.
The identifying_quantifying_financial_bubbles_hyped_log_period_power_law_draft.ipynb notebook is structured as a logical pipeline with modular orchestrator functions for each of the 33 major tasks. All functions are self-contained, fully documented with type hints and docstrings, and designed for professional-grade execution.
The project is designed around a single, top-level user-facing interface function:
execute_full_study: This master orchestrator function, located in the final section of the notebook, runs the entire automated research pipeline from end-to-end. A single call to this function reproduces the entire computational portion of the project, from data validation to the final report.
- Python 3.9+
- A CUDA-enabled GPU is highly recommended for the deep learning and NLP components.
- Core dependencies:
pandas,numpy,torch,transformers,sentence-transformers,bertopic,umap-learn,hdbscan,scipy,pyyaml,matplotlib,seaborn,tqdm.
-
Clone the repository:
git clone https://github.com/chirindaopensource/identifying_quantifying_financial_bubbles_hyped_log_period_power_law.git cd identifying_quantifying_financial_bubbles_hyped_log_period_power_law -
Create and activate a virtual environment (recommended):
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install Python dependencies:
pip install -r requirements.txt
The pipeline requires a pandas.DataFrame (df_raw) with a MultiIndex of ('Date', 'TICKER') and the following columns and dtypes:
PERMNO:int64SIC_Code:int64Close_Price_Raw:float64Volume_Raw:int64CFACSHR:float64PE_Ratio:float64PB_Ratio:float64VIX_Close:float64News_Articles:object(containinglistofstr)
All other parameters are controlled by the config.yaml file.
The identifying_quantifying_financial_bubbles_hyped_log_period_power_law_draft.ipynb notebook provides a complete, step-by-step guide. The primary workflow is to execute the final cell of the notebook, which contains the main execution block:
# Final cell of the notebook
# This function generates a sample DataFrame for demonstration.
# In a real run, you would load your own data here.
df_raw = create_sample_dataframe()
# Load the master configuration from the YAML file.
with open("config.yaml", 'r') as f:
base_config = yaml.safe_load(f)
# --- Execute the entire study ---
# To run only the baseline model (faster):
final_results = execute_full_study(
df_raw=df_raw,
base_config=base_config,
run_ablation=False
)
# To run the baseline AND all ablation/sensitivity studies (very slow):
# final_results = execute_full_study(
# df_raw=df_raw,
# base_config=base_config,
# run_ablation=True
# )
# The `final_results` dictionary will contain the key outputs.
print(final_results['baseline_performance'])The execute_full_study function creates a study_results/ directory with the following structure:
study_results/
│
├── baseline/
│ ├── data_intermediate/
│ ├── logs/
│ ├── models/
│ └── reports/
│ └── performance_summary.csv
│
└── ablation_studies/
├── ablation_no_hype/
│ ├── data_intermediate/
│ ├── logs/
│ ├── models/
│ └── reports/
├── ... (other experiments)
│
├── ablation_comparison_summary.csv
└── ablation_core_performance.png
identifying_quantifying_financial_bubbles_hyped_log_period_power_law/
│
├── identifying_quantifying_financial_bubbles_hyped_log_period_power_law_draft.ipynb # Main implementation notebook
├── config.yaml # Master configuration file
├── requirements.txt # Python package dependencies
├── LICENSE # MIT license file
└── README.md # This documentation file
The pipeline is highly customizable via the config.yaml file. Users can easily modify all study parameters, including LPPL window size, BubbleScore weights, Transformer architecture, and backtesting thresholds, without altering the core Python code.
Contributions are welcome. Please fork the repository, create a feature branch, and submit a pull request with a clear description of your changes. Adherence to PEP 8, type hinting, and comprehensive docstrings is required.
Future extensions could include:
- Alternative Architectures: Replacing the Transformer with other sequence models like LSTMs or state-space models (e.g., Mamba).
- Dynamic Alpha Weights: Making the
alpha_1andalpha_2weights in theBubbleScoredynamic, perhaps dependent on market volatility. - Advanced Backtesting: Integrating a more sophisticated backtesting engine that handles portfolio-level constraints, realistic order execution, and market impact.
- Cross-Asset Analysis: Applying the HLPPL framework to other asset classes like cryptocurrencies, commodities, or fixed income.
This project is licensed under the MIT License.
If you use this code or the methodology in your research, please cite the original paper:
@article{cao2025identifying,
title = {Identifying and Quantifying Financial Bubbles with the Hyped Log-Periodic Power Law Model},
author = {Cao, Zheng and Shao, Xingran and Yan, Yuheng and Geman, Helyette},
journal = {arXiv preprint arXiv:2510.10878},
year = {2025}
}For the implementation itself, you may cite this repository:
Chirinda, C. (2025). A Professional-Grade Implementation of the "Hyped Log-Periodic Power Law Model" Framework.
GitHub repository: https://github.com/chirindaopensource/identifying_quantifying_financial_bubbles_hyped_log_period_power_law
- Credit to Zheng Cao, Xingran Shao, Yuheng Yan, and Helyette Geman for the foundational research that forms the entire basis for this computational replication.
- This project is built upon the exceptional tools provided by the open-source community. Sincere thanks to the developers of the scientific Python ecosystem, including Pandas, NumPy, Scikit-learn, PyTorch, Hugging Face, SciPy, and Jupyter.
--
This README was generated based on the structure and content of the identifying_quantifying_financial_bubbles_hyped_log_period_power_law_draft.ipynb notebook and follows best practices for research software documentation.