Overthink implements a hierarchical reasoning model for stock market intraday forecasting.
This implementation is inspired by the Hierarchical Reasoning Model (HRM) by Sapient Intelligence, adapted specifically for stock market forecasting tasks.
- Install uv:
pip install uv- Run the basic example script using uv:
uv run example.py- Run the FiLM-conditioned example:
uv run example_film.pyThis will install dependencies and execute the example scripts in a single step.
flowchart TD
Start([Input: Historical Time Series<br/>B × lookback_horizon × feature_num]) --> InputProj[Input Projection<br/>Linear + SwiGLU + Dropout]
InputProj --> InitFiLM{FiLM Enabled?}
InitFiLM -->|Yes| FiLMMod[Generate FiLM Parameters<br/>γ, β from film_features]
InitFiLM -->|No| NoFiLM[No FiLM Conditioning]
FiLMMod --> InitStates[Initialize Reasoning States<br/>High Freq & Low Freq<br/>for each AR step]
NoFiLM --> InitStates
InitStates --> LoopStart{Deep Reasoning loop}
LoopStart -->|Iterate| ResH[res_h = low_freq + residual]
ResH --> HFLoop{Temporal Mixing}
HFLoop -->|Iterate| HFReasoning[High Freq Reasoning<br/>TransStack/TemporalMixStack<br/>+ residual]
HFReasoning -->|Update high_freq| HFLoop
HFLoop -->|Complete| LFReasoning[Low Freq Reasoning<br/>TransBlock<br/>Input: low_freq + high_freq]
LFReasoning -->|Update low_freq| LoopStart
LoopStart -->|Complete| ApplyFiLM{FiLM Enabled?}
ApplyFiLM -->|Yes| FiLMApply[Apply FiLM Modulation<br/>lf_state = γ * lf_state + β]
ApplyFiLM -->|No| NoFiLMApply[No FiLM Modulation]
FiLMApply --> Forecast[Forecast Head<br/>Autoregressive Prediction<br/>next = last_value + δ]
NoFiLMApply --> Forecast
Forecast --> Prediction[Next Step Prediction<br/>B × 1 × feature_num]
Prediction --> TeacherForce{Teacher Forcing?<br/>Training Mode}
TeacherForce -->|Yes| UseTruth[Use Ground Truth<br/>with probability]
TeacherForce -->|No| UsePred[Use Model Prediction]
UseTruth --> UpdateSeq[Update Sequence<br/>Slide window: remove oldest,<br/>append next input]
UsePred --> UpdateSeq
UpdateSeq --> ARLoop{Autoregressive<br/>Loop<br/>forecast_horizon<br/>steps?}
ARLoop -->|Continue| InitStates
ARLoop -->|Complete| Output([Output: Forecasted Series<br/>B × forecast_horizon × feature_num])
%% Note about no_grad for most reasoning iterations
Note1["Note: Most reasoning iterations<br/>run with torch.no_grad()<br/>except final iteration"]
style Start fill:#e1f5ff
style Output fill:#e1ffe1
style HFReasoning fill:#fff4e1
style LFReasoning fill:#ffe1f5
style Forecast fill:#f5e1ff
style FiLMMod fill:#f0f8ff
style FiLMApply fill:#f0f8ff
style TeacherForce fill:#f5f5dc
Instead of creating a large deep tranformer model, Overthink creates a shallow stack of transformer layers but running them on a deep reasoning loop. This results in a model that can strike a balance between memory bandwidth and computational efficiency. Meanwhile, as the parameters are converging to a stationary point, most of the deep reasoning loop iterations are run with no_grad, further improving efficiency during inference. The combination of shallow transformer layers and deep recursive loops lead a tendancy of overfitting, however, preliminary experiments and backtestings show that this may actually be beneficial for intraday stock market forecasting. Our naiive interpretation is that most traders being it human or bot, are usually just doing pattern matching on short-term historical performance.
Some other mechanisms employed to improve stock market forecasting performance include:
-
FiLM Conditioning: Feature-wise Linear Modulation (FiLM) layers are used to condition the model on additional contextual information, such as market indicators or macroeconomic data. When enabled, FiLM parameters (γ, β) are generated from film_features and applied to the low-frequency reasoning state.
-
Intraday Time Phase Encoding: Specialised positional encodings as time phase during trading session are added to input features. An early feature mixing layer is used to modulate input features so the model can learn intraday time-dependent patterns effectively.
-
Multi-scale trend loss: Loss functions that capture trends on multiple time scales, that adapts to most used intraday technical analysis indicators.
feature_num: Number of input/output featureslookback_horizon: Number of intended past time steps to consider, the model can handle unbounded sequence lengths with potential degraded performanceforecast_horizon: Number of intended future time steps to predict, the model can autoregressively predict unbounded sequence lengths with potential degraded performancebatch_size: Batch size for trainingdecoder_only: Whether to use decoder-only architecture (causal attention only)
high_freq_step: Number of high-frequency reasoning steps per low-frequency steplow_freq_step: Number of low-frequency reasoning stepshidden_layer_num: Number of hidden layershidden_size: Hidden dimension sizehead_num: Number of attention heads
temporal_mechanism: Whether to use "attention" (TransStack) or alternative temporal mixing mechanism (TemporalMixStack)use_causal: Whether to use causal masking in attentionuse_rope: Whether to use Rotary Positional Embeddingsexpansion_factor: MLP expansion factor (default: 4.0)attn_dropout: Dropout rate for attention weightsmixing_dropout: Dropout rate for mixing layersinput_mixing_dropout: Dropout rate for input feature mixing layer
use_film: Whether to enable FiLM conditioningfilm_feature_num: Number of FiLM conditioning featuresfilm_hidden_size: Hidden size for FiLM MLPfilm_dropout: Dropout rate for FiLM layers
teacher_forcing: Whether to use teacher forcing during trainingteacher_forcing_ratio: Probability of using ground truth instead of model prediction during training
forecast_aggregation: 'mean', 'ema', or 'last' for output aggregationforecast_ema_period: EMA period for smoothing (if using EMA)forecast_residual_scale: Scaling factor for residual connectionslearnable_forecast_residual_scale: Whether to make residual scale learnable
example.py: Basic usage example without FiLM conditioningexample_film.py: Example with FiLM conditioningexample.ipynb: Jupyter notebook example for interactive exploration
This project is licensed under the MIT License - see the LICENSE file for details.