This repository contains the Python code and Jupyter notebooks supporting our publication on using Graph Convolutional Networks to Predict Genotoxicity Outcomes from Simulated Metabolic Networks:
Understanding the relationship between xenobiotic metabolism and genotoxicity is crucial for chemical risk assessment. In this study, we explored the utility of metabolic graph representations in predicting genotoxicity outcomes, leveraging a dataset of 5,403 chemicals. Our approach integrates computationally predicted xenobiotic metabolism pathways with graph convolutional networks (GCNs) to learn biologically relevant representations that enhance predictive accuracy.
We generated metabolism networks using the rat liver models within the commercial expert system, TIssue MEtabolism Simulator (TIMES), and the phase I and II xenobiotic metabolism modules within the freely available BioTransformer system. We have developed a metabolic simulation framework to standardize the outputs from all of these tools, called MetSim (https://pubs.acs.org/doi/10.1021/acs.chemrestox.3c00398) These predicted pathways were then converted into graph-based representations, where metabolites and their transformations form structured metabolic networks.
To analyze these metabolic graphs, we developed a deep learning workflow using the following modules:
- MetGraphDataset: A dataset loader for metabolism graphs stored in JSON format, generated by MetSim.
- MetGCNModel: A graph convolutional network designed to extract meaningful embeddings from metabolic graphs and predict genotoxicity outcomes.
- GNNTuner: A hyperparameter optimization framework using the Optuna package to refine the GCN architecture and training strategy.
These steps are provided in 001-metgraph-genetox-gcn-tune.ipynb
- Loading and preprocessing thousands of metabolism networks using MetGraphDataset.
- Building and training a GCN model, MetGCNModel, to generate chemical embeddings and predict genotoxicity.
- Using GNNTuner to optimize model hyperparameters and improve predictive performance.
Comparing Morgan Fingerprint and GCN-Generated Metabolic Graph Embeddings for Genotoxicity Prediction
These steps are provided in 002-metgraph-genetox-case-study.ipynb
- Load and preprocess the Morgan fingerprint and GCN-generated metabolic graph embeddings.
- Generate t-SNE visualizations to compare the clustering patterns of the two methods.
- Highlight false negatives from the Morgan FP approach and corresponding true positives captured by the GCN model.
These steps are provided in 003-metgraph-genetox-alert-comparison.ipynb
This guide provides two methods for setting up the metgraph-1 repository:
- Using Conda (recommended, ensures package compatibility)
- Using pip with
requirements.txt
Run the following command to create a new Conda environment using metgraph-conda-env.yml
:
conda env create -f metgraph-conda-env.yml
Once the installation is complete, activate the environment:
conda activate metgraph
You can now run the notebooks using jupyter-lab or vscode.
It is recommended to use a virtual environment to manage dependencies:
python -m venv metgraph-env
source metgraph-env/bin/activate # On macOS/Linux
metgraph-env\Scripts\activate # On Windows
Run the following command to install all required packages:
pip install -r requirements.txt
Edit the startup.py and update the TOP
variable:-
TOP = os.environ.get('HOME')+'/ipynb/metgraph-1/'
After installation, verify that the setup is correct by running:
python -c "import torch; import torch_geometric; print('Installation successful')"
This should output:
Installation successful
Now you’re ready to use metgraph-1! 🚀