Skip to content

LexicMap v0.8.0

Latest
Compare
Choose a tag to compare
@shenwei356 shenwei356 released this 10 Sep 01:17
· 1 commit to main since this release
234959c

v0.8.0 - 2025-09-10

No changes to the index format (see Index format changelog).

  • New commands:
    • lexicmap utils merge-search-results: Merge a query's search results from multiple indexes.
    • lexicmap utils edit-genome-ids: Edit genome IDs in the index via a regular expression.
      It's helpful when users forgot to use the flag -N/--ref-name-regexp
      to extract the genome ID from the sequence file during indexing.
      This command help to fix it without rebuilding the index.
  • lexicmap index:
    • Significantly reduce the memory usage (by up to 25%) in the merge step.
      Also reduce some for huge data, such as long-reads or contigs in the Logan project.
  • lexicmap search:
    • Reduce memory usage, particularly for batch searching (by up to 50%).
    • Improve search speed, mainly for batch searching.
    • Support limiting search by TaxId(s) via -t/--taxids or --taxid-file.
      Only genomes with descendant TaxIds of the specific ones or themselves are searched,
      in a similar way with BLAST+ 2.15.0 or later versions.
      Negative values are allowed as a black list.
      For example, searching non-Escherichia (561) genera of Enterobacteriaceae (543) family with -t 543,-561.
      Users only need to provide NCBI-format taxdump files (-T/--taxdump, can also create from
      any taxonomy data with TaxonKit)
      and a genome-ID-to-TaxId mapping file (-G/--genome2taxid).
      There's no need to rebuild the index.
    • Check if the output file and the log file are the same.
    • Reduce the time of seed matching when using -w.
    • Change the default value of --max-query-conc from 12 to 8.
    • New flag --gc-interval (default 64, 0 for disable) for forcing garbage collection every N queries. This decreases memory usage a lot.
  • lexicmap utils subseq:
    • Accept the output file of lexicmap search as the input.
      So one can extract matched sequences (including flanking regions) from the index, after alignment with lexicmap search with or without using the flag -a/--all.
    • Support extending aligned regions with -U/--upstream and/or -D/--downstream.