From 5c8f6a232a25222df2ef6f5cdc7d3d45bd8da1bb Mon Sep 17 00:00:00 2001 From: Nikhil Gupta Date: Mon, 16 Jun 2025 16:35:47 -0500 Subject: [PATCH] feat: simplified exo tutorial with history and future vars --- .../01_exogenous_variables_reworked.ipynb | 302 ++++++++++++++++++ 1 file changed, 302 insertions(+) create mode 100644 nbs/docs/tutorials/01_exogenous_variables_reworked.ipynb diff --git a/nbs/docs/tutorials/01_exogenous_variables_reworked.ipynb b/nbs/docs/tutorials/01_exogenous_variables_reworked.ipynb new file mode 100644 index 000000000..e743ddbc9 --- /dev/null +++ b/nbs/docs/tutorials/01_exogenous_variables_reworked.ipynb @@ -0,0 +1,302 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#| hide\n", + "!pip install -Uqq nixtla" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#| hide\n", + "from nixtla.utils import in_colab" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#| hide\n", + "IN_COLAB = in_colab()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#| hide\n", + "if not IN_COLAB:\n", + " from nixtla.utils import colab_badge\n", + " from dotenv import load_dotenv" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## What Are Exogenous Variables?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Exogenous variables or external factors are crucial in time series forecasting\n", + "as they provide additional information that might influence the prediction.\n", + "These variables could include holiday markers, marketing spending, weather data,\n", + "or any other external data that correlate with the time series data you are\n", + "forecasting.\n", + "\n", + "For example, if you're forecasting ice cream sales, temperature data could serve\n", + "as a useful exogenous variable. On hotter days, ice cream sales may increase." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## How to Use Exogenous Variables" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#| echo: false\n", + "if not IN_COLAB:\n", + " load_dotenv()\n", + " colab_badge('docs/tutorials/01_exogenous_variables_reworked')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To incorporate exogenous variables in TimeGPT, you'll need to pair each point\n", + "in your time series data with the corresponding external data." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Step 1: Import Packages" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Import the required libraries and initialize the Nixtla client." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "from nixtla import NixtlaClient\n", + "\n", + "nixtla_client = NixtlaClient(\n", + " # defaults to os.environ.get(\"NIXTLA_API_KEY\")\n", + " api_key=\"my_api_key_provided_by_nixtla\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Step 2: Load Dataset" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this tutorial, we'll predict day-ahead electricity prices. The dataset contains:\n", + "\n", + "- Hourly electricity prices (`y`) from various markets (identified by `unique_id`)\n", + "- Exogenous variables (`Exogenous1` to `day_6`)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "df = pd.read_csv(\"https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-with-ex-vars.csv\")\n", + "df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Step 3: Baseline Forecast without Exogenous Variables" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "First, let's create a baseline forecast without using any exogenous variables." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "timegpt_fcst_no_ex_vars = nixtla_client.forecast(\n", + " df=df[[\"unique_id\", \"ds\", \"y\"]],\n", + " h=24,\n", + " level=[80, 90]\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Step 4: Forecasting electricity prices using exogenous variables" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next, let's create a forecast using the exogenous variables. To make a forecast\n", + "using exogenous variables, you need to provide historical and future exogenous\n", + "values. Below is an example dataset containing future exogenous variables. Note\n", + "that it only contains the future exogenous variable values not the target\n", + "variable `y`. We need to forecast this target variable using the exogenous\n", + "variables provided." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "future_ex_vars_df = pd.read_csv(\"https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-future-ex-vars.csv\")\n", + "future_ex_vars_df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Ensure you maintain consistent data formatting and columns in both historical\n", + "and future exogenous datasets (e.g., dates, unique_id, variable names)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "timegpt_fcst_ex_vars = nixtla_client.forecast(\n", + " df=df,\n", + " X_df=future_ex_vars_df,\n", + " h=24,\n", + " level=[80, 90]\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Step 4: Forecast Visualization" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Once you have generated your forecasts, you can visualize the results to compare\n", + "forecasts between the two methods above." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "timegpt_fcst_no_ex_vars.rename(columns={\"TimeGPT\": \"TimeGPT_no_ex_vars\"}, inplace=True)\n", + "timegpt_fcst_ex_vars.rename(columns={\"TimeGPT\": \"TimeGPT_ex_vars\"}, inplace=True)\n", + "\n", + "all_forecasts = (\n", + " timegpt_fcst_no_ex_vars\n", + " .merge(\n", + " timegpt_fcst_ex_vars,\n", + " how='outer',\n", + " on=[\"unique_id\", \"ds\"]\n", + " )\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "nixtla_client.plot(\n", + " df[[\"unique_id\", \"ds\", \"y\"]],\n", + " all_forecasts,\n", + " max_insample_length=1000,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Key Takeaways\n", + "\n", + "- Exogenous variables enrich time series forecasting.\n", + "- Ensure proper alignment of historical and future exogenous data.\n", + "\n", + "## Next Steps\n", + "\n", + " Congratulations! You have mastered the fundamentals of adding exogenous\n", + " variables to your TimeGPT forecasts. Keep refining your approach by\n", + " \n", + "- Exploring feature engineering to create domain-specific exogenous data.\n", + "- Experimenting with different modeling approaches for external variables.\n", + "- Validating forecast accuracy by comparing with real future data." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "python3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +}