Skip to content

neo-chem-synth-wave/atom-to-atom-mapping

Repository files navigation

Atom-to-atom Mapping

Static Badge Static Badge Static Badge

Welcome to the chemical reaction compound atom-to-atom mapping research project !!!

A chemical reaction can be defined as the transformation of a set of chemical compounds into another. Accompanied by a change in energy, the atoms of the reactant chemical compounds are rearranged to form the product chemical compounds, with or without the assistance of spectator compounds. Correctly mapping this rearrangement of chemical compound atoms is paramount for capturing the essence of the chemical reaction. This task, commonly referred to as atom-to-atom mapping or atom mapping, has proven challenging as it is a generalization of the well-known subgraph isomorphism problem. Consequently, the primary objective of the Atom-to-atom Mapping research project is to systematically curate and facilitate access to relevant chemical reaction compound atom-to-atom mapping resources.

atom_to_atom_mapping_example.png

Installation

An environment can be created using the git and conda commands as follows:

git clone https://github.com/neo-chem-synth-wave/atom-to-atom-mapping.git

cd atom-to-atom-mapping

conda env create -f environment.yaml

conda activate atom-to-atom-mapping-env

The atom_to_atom_mapping package can be installed using the pip command as follows:

pip install .

Environment Troubleshooting

According to GitHub Issue 4 and GitHub Issue 5 on the LocalMapper repository, potential conflicts between the PyTorch, CUDA, and DGL libraries may arise. To resolve the conflicts, the appropriate version of the DGL library can be re-installed as follows:

# Re-install the DGL library for the PyTorch and CUDA library versions 2.4 and 12.1, respectively.

pip uninstall dgl

pip install dgl -f https://data.dgl.ai/wheels/torch-2.4/cu121/repo.html 

Utilization

The purpose of the scripts directory is to illustrate how to map chemical reaction compounds using the following approaches:

  1. Indigo [1]
  2. RXNMapper [2]
  3. Chytorch RxnMap [3]
  4. LocalMapper [4]

The map_reaction_smiles_strings script can be utilized as follows:

# Map a chemical reaction SMILES string.

python scripts/map_reaction_smiles_strings.py \
  --atom_to_atom_mapping_approach "indigo" \
  --reaction_smiles "OCN1C(=O)Cc2ccccc12.c1nc2ccccc2[nH]1>>O=C1Cc2ccccc2N1Cn1cnc2ccccc12"
# Map the chemical reaction SMILES strings from a .csv file.

python scripts/map_reaction_smiles_strings.py \
  --atom_to_atom_mapping_approach "rxnmapper" \
  --input_csv_file_path "/path/to/the/input/file.csv" \
  --reaction_smiles_column_name "name_of_the_reaction_smiles_column" \
  --output_csv_file_path "/path/to/the/output/file.csv"

License Information

The contents of this repository are published under the MIT license. Please refer to the individual references for more details regarding the license information of external resources utilized within the repository.

Contact

If you are interested in contributing to this research project by reporting bugs, suggesting improvements, or submitting feedback, feel free to do so using GitHub Issues.

Acknowledgements

Marvin was used for drawing, displaying and characterizing chemical structures, substructures and reactions. [5]

References

[1] EPAM Indigo: https://lifescience.opensource.epam.com/indigo/index.html. Accessed on: 2025/05/04.

[2] Schwaller, P., Hoover, B., Reymond, J., Strobelt, H., and Laino, T. Extraction of Organic Chemistry Grammar from Unsupervised Learning of Chemical Reactions. Sci. Adv., 7, eabe4166, 2021.

[3] Nugmanov, R., Dyubankova, N., Gedich, A., and Wegner, J.K. Bidirectional Graphormer for Reactivity Understanding: Neural Network Trained to Reaction Atom-to-atom Mapping Task. J. Chem. Inf. Model., 2022, 62, 14, 3307–3315.

[4] Chen, S., An, S., Babazade, R., and Jung, Y. Precise Atom-to-atom Mapping for Organic Reactions via Human-in-the-loop Machine Learning. Nat. Commun., 15, 2250, 2024.

[5] Marvin 24.3.1, 2024, ChemAxon: https://chemaxon.com. Accessed on: 2025/05/04.