Skip to content

Conversation

Erennn7
Copy link

@Erennn7 Erennn7 commented Oct 18, 2025

PR: LSTM Time Series Forecasting in R

This PR introduces a fully documented implementation of a Long Short-Term Memory (LSTM) neural network in R for time series prediction and forecasting.
The implementation leverages the keras and tensorflow packages to build, train, and evaluate LSTM models capable of learning temporal dependencies in sequential data.

Overview

The provided LSTM workflow covers:

  • Data Preprocessing:

    • Normalization of time series data to [0,1] range
    • Sequence creation for supervised learning (input sequences and target values)
  • Model Architecture:

    • Single-layer LSTM network with configurable units and dropout
    • Dense output layer for regression tasks
    • Compilation with Adam optimizer and Mean Squared Error loss
  • Training and Evaluation:

    • Training with configurable epochs and batch size
    • Validation split and monitoring of training history
    • Evaluation on test data with metrics: MSE, RMSE, MAE, and R-squared
  • Prediction Capabilities:

    • Single-step and multi-step ahead predictions
    • Denormalization of predictions for interpretation
    • Visualization of actual vs. predicted series using ggplot2
  • Best Practices:

    • Sequence length selection based on temporal dependencies
    • Dropout regularization to prevent overfitting
    • Early stopping, model checkpointing, and hyperparameter tuning

This LSTM implementation is suitable for:

  • Synthetic or real-world time series (e.g., sine waves, stock prices, sensor data)
  • Forecasting applications requiring learning of sequential patterns
  • Multi-step ahead prediction and analysis of temporal trends

Complexity

  • Training Complexity: O(n_samples × seq_length × lstm_units) per epoch
  • Inference Complexity: O(n_samples × seq_length × lstm_units)

The approach demonstrates how LSTMs can model sequential dependencies more effectively than traditional regression or moving-average methods.

@Erennn7 Erennn7 requested a review from siriak as a code owner October 18, 2025 10:40
@Copilot Copilot AI review requested due to automatic review settings October 18, 2025 10:40
@Erennn7 Erennn7 requested a review from acylam as a code owner October 18, 2025 10:40
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a comprehensive implementation of Long Short-Term Memory (LSTM) neural networks for time series forecasting in R, providing a complete workflow from data preprocessing to prediction visualization.

  • Implements LSTM model architecture with configurable parameters for time series prediction
  • Provides comprehensive data preprocessing including normalization and sequence generation
  • Includes evaluation metrics, visualization capabilities, and multi-step ahead prediction functionality

Comment on lines +233 to +234
# Reshape for ggplot
plot_data_long <- reshape2::melt(plot_data, id.vars = "Index")
Copy link

Copilot AI Oct 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reshape2 package is used without being explicitly loaded or checked for availability. Consider using tidyr::pivot_longer() as a more modern alternative, or add a requireNamespace check for reshape2.

Copilot uses AI. Check for mistakes.

Comment on lines +142 to +146
indices <- sample(1:dim(X)[1])

train_indices <- indices[1:train_size]
test_indices <- indices[(train_size + 1):length(indices)]

Copy link

Copilot AI Oct 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random sampling for train/test split breaks the temporal order of time series data. For time series, use sequential split to maintain temporal relationships: train_indices <- 1:train_size and test_indices <- (train_size + 1):dim(X)[1].

Suggested change
indices <- sample(1:dim(X)[1])
train_indices <- indices[1:train_size]
test_indices <- indices[(train_size + 1):length(indices)]
train_indices <- 1:train_size
test_indices <- (train_size + 1):dim(X)[1]

Copilot uses AI. Check for mistakes.


# Create plot
p <- ggplot(plot_data_long, aes(x = Index, y = value, color = variable)) +
geom_line(size = 1) +
Copy link

Copilot AI Oct 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'size' aesthetic is deprecated in ggplot2. Use 'linewidth' instead for line geometries.

Suggested change
geom_line(size = 1) +
geom_line(linewidth = 1) +

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant