Skip to content
/ gan Public

๐Ÿ“œ the Great Automatic Nomenclator โ€” The Next Million Names for Archaea and Bacteria

License

Notifications You must be signed in to change notification settings

telatin/gan

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

71 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Nomenclator logo

GAN: The Great Automatic Nomenclator

The Next Million Names for Archaea and Bacteria, and the nomenclator Python package

Principle

To generate a large number of new names, we apply a combinatorial approach starting with two or three sets of curated roots, that are processed to produce all their possible combinations while keeping trace of their grammatical metadata to draft a valid etymology.

Gan flowchart

Installation

GAN is available on PyPI as gan-nomenclature and installs with Python 3.8+:

pip install gan-nomenclature

This command installs the library together with its dependencies (pandas, openpyxl, ...).

To work in an isolated environment, you can create one with conda and then install the package from PyPI:

conda create -c conda-forge -n gan python=3.10 pandas pip ipython
conda activate gan
pip install gan-nomenclature

Command-line tools

Installing the package provides a small suite of CLI helpers:

  • gan-genus: generate JSON/HTML/LaTeX outputs from two or three curated root tables.
  • gan-validate: validate the input Excel files for correct format and content.
  • gan-init: scaffold Excel templates (optionally populated with example rows) for use with gan-genus.
  • gan-aidraft: generate draft etymologies using OpenRouter-hosted LLMs starting from a text file used as context (e.g. a draft of a paper describing the biome where the new taxa were isolated).
  • xls2tsv: convert each worksheet of a workbook into a separate TSV file.
  • tsv2xls: convert TSV files back into Excel format.

Each command offers --help for additional options and usage examples.

Genera generator

A set of two (or three) Excel tables formatted as shown below is used to generate the list of combinations in JSON, HTML and LaTeX format.

Excel input format

Synopsis:

usage: gan-genus [-h] -1 FIRST -2 SECOND [-3 THIRD] -o OUTDIR [-p PREFIX] [-c CONNECTOR] [-v]

For full usage and installation instructions, please check the documentation.

Example output

Using three small files in the input_test directory (8, 11 and 8 words, respectively), GAN produced 968 (8 x 11 x 8)combinations:

Etymology

"The great automatic nomenclaturer" is a reference to a short story ("The Great Automatic Grammatizator") written by the British author Roald Dahl [link].

Citation

Mark J. Pallen et al. The Next Million Names for Archaea and Bacteria, Trends in Microbiology (2020). DOI: 10.1016/j.tim.2020.10.009

About

๐Ÿ“œ the Great Automatic Nomenclator โ€” The Next Million Names for Archaea and Bacteria

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published