This repository contains the implementation of a Deep Reinforcement Learning (DRL)-based approach to solve the Stochastic Capacitated Vehicle Routing Problem with Service Times and Deadlines (SCVRPSTD). This work is part of a paper under revision by my teamwork, which extends the Policy Optimization with Multiple Optima (POMO) algorithm to address the stochastic and time-sensitive nature of the problem.
The SCVRPTSD is a variant of the Vehicle Routing Problem (VRP), where:
- One or more vehicles with fixed capacities must complete a series of deliveries distributed throughout a city.
- Each route begins and ends at a central distribution depot.
- Time constraints and uncertain travel times between locations are considered.
- The goal is to minimize total travel time while ensuring that all deliveries are made within their deadlines.
- Algorithm: Extends the POMO algorithm by introducing modifications to the environment and integrating a new dynamic context within the Neural Network Model (NNM).
- Environment: Models a delivery system with stochastic travel times and time-sensitive constraints.
- Performance Evaluation: Compared against Google OR-Tools and various metaheuristics as benchmarks.
The effectiveness of the proposed method is evaluated through extensive experiments. Below are some key performance metrics:
The training process shows convergence in loss and reward over epochs.
The following picture is a solution plot for
- Python (version 3.8+)
- PyTorch (for training and inference)
- NumPy
- Matplotlib (for visualization)
- ortools (Google OR-Tools Python wrapper)
pip install torch numpy matplotlib



