The Next Million Names for Archaea and Bacteria, and the nomenclator Python package
To generate a large number of new names, we apply a combinatorial approach starting with two or three sets of curated roots, that are processed to produce all their possible combinations while keeping trace of their grammatical metadata to draft a valid etymology.
GAN is available on PyPI as gan-nomenclature and installs with Python 3.8+:
pip install gan-nomenclatureThis command installs the library together with its dependencies (pandas, openpyxl, ...).
To work in an isolated environment, you can create one with conda and then install the package from PyPI:
conda create -c conda-forge -n gan python=3.10 pandas pip ipython
conda activate gan
pip install gan-nomenclatureInstalling the package provides a small suite of CLI helpers:
gan-genus: generate JSON/HTML/LaTeX outputs from two or three curated root tables.gan-validate: validate the input Excel files for correct format and content.gan-init: scaffold Excel templates (optionally populated with example rows) for use withgan-genus.gan-aidraft: generate draft etymologies using OpenRouter-hosted LLMs starting from a text file used as context (e.g. a draft of a paper describing the biome where the new taxa were isolated).xls2tsv: convert each worksheet of a workbook into a separate TSV file.tsv2xls: convert TSV files back into Excel format.
Each command offers --help for additional options and usage examples.
A set of two (or three) Excel tables formatted as shown below is used to generate the list of combinations in JSON, HTML and LaTeX format.
Synopsis:
usage: gan-genus [-h] -1 FIRST -2 SECOND [-3 THIRD] -o OUTDIR [-p PREFIX] [-c CONNECTOR] [-v]For full usage and installation instructions, please check the documentation.
Using three small files in the input_test directory (8, 11 and 8 words, respectively), GAN produced 968 (8 x 11 x 8)combinations:
- in PDF format
- in HTML format
"The great automatic nomenclaturer" is a reference to a short story ("The Great Automatic Grammatizator") written by the British author Roald Dahl [link].
Mark J. Pallen et al. The Next Million Names for Archaea and Bacteria, Trends in Microbiology (2020). DOI: 10.1016/j.tim.2020.10.009


