You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+12-14Lines changed: 12 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,4 @@
1
-
## <ahref="https://bioinf.shenwei.me/LexicMap"><imgsrc="logo.svg"width="30"/></a> LexicMap: efficient sequence alignment against millions of prokaryotic genomes
1
+
## <ahref="https://bioinf.shenwei.me/LexicMap"><imgsrc="logo.svg"width="36"/></a> LexicMap: efficient sequence alignment against millions of prokaryotic genomes
-[ LexicMap: efficient sequence alignment against millions of prokaryotic genomes](#-lexicmap-efficient-sequence-alignment-against-millions-of-prokaryotic-genomes)
23
-
-[Table of contents](#table-of-contents)
24
22
-[Features](#features)
25
23
-[Introduction](#introduction)
26
24
-[Quick start](#quick-start)
@@ -37,19 +35,19 @@ Preprint:
37
35
## Features
38
36
39
37
1.**The accuracy of LexicMap is comparable with Blastn, MMseqs2, and Minimap2**. It
40
-
-performs **base-level alignment**, with `qcovGnm`, `qcovHSP`, `pident`, `evalue` and `bitscore` returned,
38
+
-**performs base-level alignment**, with `qcovGnm`, `qcovHSP`, `pident`, `evalue` and `bitscore` returned,
41
39
both in TSV and pairwise alignment format ([output format](https://bioinf.shenwei.me/LexicMap/tutorials/search/#output)).
42
40
- provides a genome-wide query coverage metric (`qcovGnm`),
43
41
which enables accurate interpretation of search results - particularly for [circular queries (such as plasmid, virus, and mtDNA)](https://bioinf.shenwei.me/LexicMap/tutorials/search/#searching-with-plasmids-or-other-longer-queries)
44
42
against both complete and fragmented assemblies.
45
-
- returns all possible matches, including multiple copies of a gene in a genome.
43
+
-**returns all possible matches**, including multiple copies of a gene in a genome.
46
44
1.**The alignment is fast and memory-efficient, scalable to up to millions of prokaryotic genomes**.
47
45
1. LexicMap is **easy to [install](http://bioinf.shenwei.me/LexicMap/installation/),
48
46
we provide [binary files](https://github.com/shenwei356/LexicMap/releases/)** with no dependencies for Linux, Windows, MacOS (x86 and arm CPUs).
49
47
2. LexicMap is **easy to use** (see [tutorials](http://bioinf.shenwei.me/LexicMap/tutorials/index/), [usages](http://bioinf.shenwei.me/LexicMap/usage/lexicmap/), and [FAQs](https://bioinf.shenwei.me/LexicMap/faqs/)).
50
48
-[Database building](https://bioinf.shenwei.me/LexicMap/tutorials/index/) requires only a simple command, accepting input from files, a file list, or even a directory.
51
49
-[Sequence searching](https://bioinf.shenwei.me/LexicMap/tutorials/search/) supports limiting search by TaxId(s), provides a progress bar.
52
-
-[Several utility commands](https://bioinf.shenwei.me/LexicMap/usage/utils/) are available to resume unfinished indexing, and explore the index data, extract indexed subsequences.
50
+
-[Several utility commands](https://bioinf.shenwei.me/LexicMap/usage/utils/) are available to resume unfinished indexing, explore the index data, merge search results, extract matched subsequences and more.
53
51
54
52
## Introduction
55
53
@@ -76,7 +74,7 @@ However, given the increasing rate at which genomes are sequenced, **existing to
76
74
1. LexicMap enables efficient indexing and searching of both RefSeq+GenBank and the [AllTheBacteria](https://www.biorxiv.org/content/10.1101/2024.03.08.584059v1) datasets (**2.3 and 1.9 million prokaryotic assemblies** respectively).
77
75
1. When searching in all **2,340,672 Genbank+Refseq prokaryotic genomes**, *Blastn is unable to run with this dataset on common servers as it requires >2000 GB RAM*. (see [performance](#performance)).
78
76
79
-
**With LexicMap v0.7.0** (48 CPUs),
77
+
**With LexicMap v0.7.0** (48 CPUs, indexes and queries queries in HDDs),
Copy file name to clipboardExpand all lines: docs/content/introduction/_index.md
+12-13Lines changed: 12 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,19 +11,18 @@ weight: 10
11
11
12
12
LexicMap is a **nucleotide sequence alignment** tool for efficiently querying **gene, plasmid, viral, or long-read sequences (>150 bp)** against up to **millions of prokaryotic genomes**.
1.**The accuracy of LexicMap is comparable with Blastn, MMseqs2, and Minimap2**. It
43
-
-performs **base-level alignment**, with `qcovGnm`, `qcovHSP`, `pident`, `evalue` and `bitscore` returned,
42
+
-**performs base-level alignment**, with `qcovGnm`, `qcovHSP`, `pident`, `evalue` and `bitscore` returned,
44
43
both in TSV and pairwise alignment format ([output format](https://bioinf.shenwei.me/LexicMap/tutorials/search/#output)).
45
44
- provides a genome-wide query coverage metric (`qcovGnm`),
46
45
which enables accurate interpretation of search results - particularly for [circular queries (such as plasmid, virus, and mtDNA)](https://bioinf.shenwei.me/LexicMap/tutorials/search/#searching-with-plasmids-or-other-longer-queries)
47
46
against both complete and fragmented assemblies.
48
-
- returns all possible matches, including multiple copies of a gene in a genome.
47
+
-**returns all possible matches**, including multiple copies of a gene in a genome.
49
48
1.**The alignment is fast and memory-efficient, scalable to up to millions of prokaryotic genomes**.
50
49
1. LexicMap is **easy to [install](http://bioinf.shenwei.me/LexicMap/installation/),
51
50
we provide [binary files](https://github.com/shenwei356/LexicMap/releases/)** with no dependencies for Linux, Windows, MacOS (x86 and arm CPUs).
52
51
2. LexicMap is **easy to use** (see [tutorials](http://bioinf.shenwei.me/LexicMap/tutorials/index/), [usages](http://bioinf.shenwei.me/LexicMap/usage/lexicmap/), and [FAQs](https://bioinf.shenwei.me/LexicMap/faqs/)).
53
52
-[Database building](https://bioinf.shenwei.me/LexicMap/tutorials/index/) requires only a simple command, accepting input from files, a file list, or even a directory.
54
53
-[Sequence searching](https://bioinf.shenwei.me/LexicMap/tutorials/search/) supports limiting search by TaxId(s), provides a progress bar.
55
-
-[Several utility commands](https://bioinf.shenwei.me/LexicMap/usage/utils/) are available to resume unfinished indexing, and explore the index data, extract indexed subsequences.
54
+
-[Several utility commands](https://bioinf.shenwei.me/LexicMap/usage/utils/) are available to resume unfinished indexing, explore the index data, merge search results, extract matched subsequences and more.
56
55
57
56
## Introduction
58
57
@@ -79,7 +78,7 @@ However, given the increasing rate at which genomes are sequenced, **existing to
79
78
1. LexicMap enables efficient indexing and searching of both RefSeq+GenBank and the [AllTheBacteria](https://www.biorxiv.org/content/10.1101/2024.03.08.584059v1) datasets (**2.3 and 1.9 million prokaryotic assemblies** respectively).
80
79
1. When searching in all **2,340,672 Genbank+Refseq prokaryotic genomes**, *Blastn is unable to run with this dataset on common servers as it requires >2000 GB RAM*. (see [performance](#performance)).
81
80
82
-
**With LexicMap v0.7.0** (48 CPUs),
81
+
**With LexicMap v0.7.0** (48 CPUs, indexes and queries queries in HDDs),
0 commit comments