Skip to content

Commit 9e9d781

Browse files
author
Jake Bradford
authored
Merge pull request #7 from bmds-lab/dev/v2.0.0b
Dev/v2.0.0b
2 parents 413a391 + 5af57e7 commit 9e9d781

36 files changed

+1061
-398
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
*.egg-info
2+
*__pycache__*

LICENSE

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
BSD 3-Clause License
2+
3+
Copyright (c) 2021-, Jake Bradford, Timothy Chappell, Dimitri Perrin
4+
All rights reserved.
5+
6+
Redistribution and use in source and binary forms, with or without
7+
modification, are permitted provided that the following conditions are met:
8+
9+
1. Redistributions of source code must retain the above copyright notice, this
10+
list of conditions and the following disclaimer.
11+
12+
2. Redistributions in binary form must reproduce the above copyright notice,
13+
this list of conditions and the following disclaimer in the documentation
14+
and/or other materials provided with the distribution.
15+
16+
3. Neither the name of the copyright holder nor the names of its
17+
contributors may be used to endorse or promote products derived from
18+
this software without specific prior written permission.
19+
20+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
21+
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
22+
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
23+
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
24+
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
25+
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
26+
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
27+
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
28+
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29+
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Makefile

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,15 +5,15 @@ CC = g++
55
CFLAGS = -O3 -std=c++11 -fopenmp -mpopcnt
66

77
# define any directories containing header files other than /usr/include
8-
INCLUDES = -Iparallel_hashmap
8+
INCLUDES = -Isrc/ISSL/include
99

1010
all : isslScoreOfftargets isslCreateIndex
1111

12-
isslScoreOfftargets : isslScoreOfftargets.cpp
13-
$(CC) $(CFLAGS) $(INCLUDES) -o $@ $^
12+
isslScoreOfftargets : src/ISSL/isslScoreOfftargets.cpp
13+
$(CC) $(CFLAGS) $(INCLUDES) -o bin/$@ $^
1414

15-
isslCreateIndex : isslCreateIndex.cpp
16-
$(CC) $(CFLAGS) $(INCLUDES) -o $@ $^
15+
isslCreateIndex : src/ISSL/isslCreateIndex.cpp
16+
$(CC) $(CFLAGS) $(INCLUDES) -o bin/$@ $^
1717

1818
clean:
19-
$(RM) isslScoreOfftargets isslCreateIndex
19+
$(RM) bin/isslScoreOfftargets bin/isslCreateIndex

README.md

Lines changed: 194 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,11 @@ bioRxiv 2020.02.14.950261; doi: https://doi.org/10.1101/2020.02.14.950261
88

99
## Preamble
1010

11-
We present Crackling, a new method for whole-genome identification of suitable CRISPR targets. The method maximises the efficiency of the guides by combining the results of multiple scoring approaches. On experimental data, the set of guides it selects are better than those produced by existing tools. The method also incorporates a new approach for faster off-target scoring, based on Inverted Signature Slice Lists (ISSL). This approach provides a gain of an order of magnitude in speed, while preserving the same level of accuracy.
11+
> The design of CRISPR-Cas9 guide RNAs is not trivial, and is a computationally demanding task. Design tools need to identify target sequences that will maximise the likelihood of obtaining the desired cut, whilst minimising off-target risk. There is a need for a tool that can meet both objectives while remaining practical to use on large genomes.
12+
>
13+
> Here, we present Crackling, a new method that is more suitable for meeting these objectives. We test its performance on 12 genomes and on data from validation studies. Crackling maximises guide efficiency by combining multiple scoring approaches. On experimental data, the guides it selects are better than those selected by others. It also incorporates Inverted Signature Slice Lists (ISSL) for faster off-target scoring. ISSL provides a gain of an order of magnitude in speed compared to other popular tools, such as Cas-OFFinder, Crisflash and FlashFry, while preserving the same level of accuracy. Overall, this makes Crackling a faster and better method to design guide RNAs at scale.
14+
>
15+
> Crackling is available at https://github.com/bmds-lab/Crackling under the Berkeley Software Distribution (BSD) 3-Clause license.
1216
1317
## Dependencies
1418

@@ -22,39 +26,101 @@ We present Crackling, a new method for whole-genome identification of suitable C
2226

2327
- Python v3.6+
2428

25-
## Installation
29+
## Installation & Usage
2630

2731
1. Clone or [download](https://github.com/bmds-lab/Crackling/archive/master.zip) the source.
2832

29-
2. Configure the pipeline. See `config.py`.
33+
```bash
34+
git clone https://github.com/bmds-lab/Crackling.git ~/Crackling/
35+
cd ~/Crackling
36+
```
37+
38+
2. Install using pip
39+
40+
```bash
41+
python3.6 -m pip install -e .
42+
```
43+
44+
Important: the dot `.` indicates that *pip* will run `setup.py` from the current working directory.
45+
46+
The `-e` flag is for *editable*,
47+
48+
> -e Install a project in editable mode (i.e. setuptools "develop mode") from a local project path or a VCS url.
49+
50+
2. Configure the pipeline. See `config.ini`.
3051

31-
3. Ensure Bowtie2 and RNAFold are reachable from the installation directory.
52+
4. Ensure Bowtie2 and RNAfold are reachable system-wide, by adding them to your environments *PATH* variable.
3253

33-
4. Compile the off-target scoring function. An index of off-targets is required: to prepare this, read the next section (*Off-target Indexing*).
54+
Check these are reachable by typing (the version numbers and directories may differ slightly):
3455

3556
```
36-
g++ -o isslScoreOfftargets isslScoreOfftargets.cpp -O3 -std=c++11 -fopenmp -mpopcnt -Iparallel_hashmap
57+
$ bowtie2 --version
58+
/home/<user>/bowtie2-2.3.4.1/bowtie2-align-s version 2.3.4.1
59+
64-bit
60+
Built on UbuntuDesktopMachine
61+
Monday 25 June 09:17:27 AEST 2018
62+
Compiler: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.9)
63+
Options: -O3 -m64 -msse2 -funroll-loops -g3 -std=c++98 -DPOPCNT_CAPABILITY
64+
Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}
65+
66+
67+
$ RNAfold --version
68+
RNAfold 2.4.14
3769
```
3870

39-
5. Run the pipeline:
71+
5. Compile the off-target indexing and scoring functions. An index of off-targets is required: to prepare this, read in the *Utilities* section (*Off-target Indexing*).
4072

73+
```bash
74+
make
75+
```
76+
77+
5. Create a Bowtie2 index
78+
79+
The Bowtie2 manual can be found [here](http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml).
80+
81+
Our recommended usage:
82+
83+
```
84+
bowtie2-build --threads 128 input-file output-file
4185
```
42-
python Crackling.py -c config
86+
87+
For example:
88+
89+
```bash
90+
bowtie2-build --threads 128 ~/genomes/mouse.fa ~/genomes/mouse.fa.bowtie2
4391
```
92+
93+
Bowtie2 produces multiple files for its index. When referring to the index, use the base-name (i.e. `output-file`) that you provided `bowtie2-build`.
94+
95+
5. Configure the Crackling pipeline by editing `config.ini`.
96+
97+
5. Run the pipeline:
98+
99+
```bash
100+
Crackling -c config.ini
101+
```
102+
103+
# Utilities
104+
105+
The Crackling package provides a number of utilities:
106+
107+
- Off-target indexing (including extracting target sites and generating the ISSL index)
108+
- Counting targeted transcripts per guide RNA
109+
- Retraining the provided sgRNAScorer 2.0 model (if needed)
44110

45111
## Off-target Indexing
46112

47113
1. Extract off-target sites:
48114

49-
```
50-
python extractOfftargets.py <output-file> {<input-files>... | input-dir>}
51-
```
115+
```bash
116+
extractOfftargets <output-file> {<input-files>... | input-dir>}
117+
```
52118

53-
For example:
119+
For example:
54120

55-
```
56-
python extractOfftargets.py ~/genomes/mouse_offtargets.txt ~/genomes/mouse.fa
57-
```
121+
```
122+
extractOfftargets ~/genomes/mouse_offtargets.txt ~/genomes/mouse.fa
123+
```
58124
59125
The input provided can be:
60126
@@ -64,62 +130,146 @@ We present Crackling, a new method for whole-genome identification of suitable C
64130
65131
Note: Unlike previous versions, sorting the extracted off-targets is no longer required as extractOfftargets.py completes this automatically now.
66132
133+
2. Generate the index:
67134
135+
```
136+
usage: createIsslIndex [-h] -t OFFTARGETS -l GUIDELENGTH -w SLIDEWIDTH -o
137+
OUTPUT [-b BINARY]
138+
139+
optional arguments:
140+
-h, --help show this help message and exit
141+
-t OFFTARGETS, --offtargets OFFTARGETS
142+
A text file containing off-target sites
143+
-l GUIDELENGTH, --guidelength GUIDELENGTH
144+
The length of an off-target site
145+
-w SLIDEWIDTH, --slidewidth SLIDEWIDTH
146+
The ISSL slice width in bits
147+
-o OUTPUT, --output OUTPUT
148+
A filepath to save the ISSL index
149+
-b BINARY, --binary BINARY
150+
A filepath to the createIsslIndex binary (optional)
151+
```
68152
69-
2. Build the ISSL index
153+
For example:
70154
71-
Compile the indexer first:
72-
73-
```
74-
g++ -o isslCreateIndex isslCreateIndex.cpp -O3 -std=c++11 -fopenmp -mpopcnt
75-
```
76-
77-
Generate the index:
78-
79-
*For a 20bp sgRNA where up to four mismatches are allowed, use a slice width of eight*
80-
81-
```
82-
./isslCreateIndex <offtargets-sorted> <guide-length> <slice-width-bits> <index-name>
83-
```
84-
85-
For example:
86-
87-
```
88-
./isslCreateIndex ~/genomes/mouse_offtargets-sorted.txt 20 8 ~/genomes/mouse_offtargets-sorted.txt.issl
89-
```
155+
*For a 20bp sgRNA where up to four mismatches are allowed, use a slice width of eight (4 mismatches \* 2 bits per mismatch)*
156+
157+
```
158+
createIsslIndex -t ~/genomes/mouse_offtargets.txt -l 20 -w 8 - o ~/genomes/mouse_offtargets-sorted.txt.issl
159+
```
160+
161+
A progress indicator is printed to *stderr*, like so:
90162
163+
> 8576/8583 : 6548
164+
>
165+
> 8577/8583 : 6549
166+
>
167+
> 8578/8583 : 6549
168+
>
169+
> 8579/8583 : 6549
170+
>
171+
> 8580/8583 : 6549
172+
>
173+
> 8581/8583 : 6549
174+
>
175+
> 8582/8583 : 6549
176+
>
177+
> 8583/8583 : 6550
91178
179+
formatted as `<current line of input file> / <number of lines in input file> : <running total of distinct sites>`.
92180
93-
## Bowtie2 index
181+
This is indicating that the 6549'th distinct site has been seen on lines 8577 through 8582.
94182
95-
The Bowtie2 manual can be found [here](http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml).
183+
The indicator is provided for every 10,000 input lines that are processed, and for every of the last 100 input lines.
96184
97-
Crackling requires a Bowtie2 index to be provided.
98185
99-
Our recommended usage:
186+
## Counting targeted transcripts per guide RNA
100187
188+
Using the CLI command `countHitTranscripts`:
189+
190+
```bash
191+
usage: countHitTranscripts [-h] [-a ANNOTATION] [-c CRACKLING] [-o OUTPUT]
192+
[-s]
193+
194+
optional arguments:
195+
-h, --help show this help message and exit
196+
-s, --sample Run sample
197+
198+
group:
199+
-a ANNOTATION, --annotation ANNOTATION
200+
The GFF3 annotation file
201+
-c CRACKLING, --crackling CRACKLING
202+
The Crackling output file
203+
-o OUTPUT, --output OUTPUT
204+
The output file
101205
```
102-
bowtie2-build --threads 128 input-file output-file
206+
207+
For example, two guides, *A* and *B*, have been selected by Crackling as safe and efficient. How many transcripts of a gene do each guide target?
208+
209+
Exons are presented by `|||||`.
210+
211+
Chromosome 1:
212+
213+
(Target A) (Target B) (Target C) (Target D)
214+
* * * *
215+
----||*|||-------|||*|||------||*|||----------*--- (Gene 1 - Transcript 1)
216+
----||*|||----------*---------||*|||----------*--- (Gene 1 - Transcript 2)
217+
------*----------|||*|||------||*|||----------*--- (Gene 1 - Transcript 3)
218+
------*-----------------------||*|||----------*--- (Gene 1 - Transcript 4)
219+
* * * *
220+
221+
Use `--sample` to run the utility for the example above:
222+
223+
```bash
224+
$ countHitTranscripts --sample
225+
Writing test data to file.
226+
The expected results from the test are:
227+
AAAA 2/4
228+
AAAT 2/4
229+
AATA 4/4
230+
ATAA 0/0
231+
232+
Pickled to: /tmp/tmp68qd5n6y.p
233+
['seq', 'bowtieChr', 'bowtieStart', 'bowtieEnd', 'hits']
234+
['AAAA', 'Chr1', '60', '83', '2/4']
235+
['AAAT', 'Chr1', '200', '223', '2/4']
236+
['AATA', 'Chr1', '320', '343', '4/4']
237+
['ATAA', 'Chr1', '460', '483', '0/0']
103238
```
104239

105-
For example:
240+
## Training the sgRNAScorer 2.0 model (if needed)
241+
242+
We provided a pre-trained model, however, dependent on your environment (Python and package versions), you may need to retrain it, using the CLI command `trainModel`. All arguments to this command are optional, as the utility will compute the default values for you.
106243

107244
```bash
108-
bowtie2-build --threads 128 ~/genomes/mouse.fa ~/genomes/mouse.fa.bowtie2
245+
Using user specified arguments
246+
usage: trainModel [-h] -g GOOD -b BAD -s SPACERLENGTH -p PAMORIENTATION -l
247+
PAMLENGTH -o SVMOUTPUT
248+
249+
optional arguments:
250+
-h, --help show this help message and exit
251+
-g GOOD, --good GOOD
252+
-b BAD, --bad BAD
253+
-s SPACERLENGTH, --spacerLength SPACERLENGTH
254+
-p PAMORIENTATION, --pamOrientation PAMORIENTATION
255+
-l PAMLENGTH, --pamLength PAMLENGTH
256+
-o SVMOUTPUT, --svmOutput SVMOUTPUT
109257
```
110258

111259

112260

113261
## References
114262

115-
Ben Langmead and Steven L Salzberg. Fast gapped-read alignment with Bowtie 2. Nature Methods, 9(4):357, 2012.
263+
Ben Langmead and Steven L Salzberg. Fast gapped-read alignment with Bowtie2. Nature Methods, 9(4):357, 2012.
116264

117265
Bradford, J., & Perrin, D. (2019). A benchmark of computational CRISPR-Cas9 guide design methods. PLoS computational biology, 15(8), e1007274.
118266

119267
Bradford, J., & Perrin, D. (2019). Improving CRISPR guide design with consensus approaches. BMC genomics, 20(9), 931.
120268

121269
Chari, R., Yeo, N. C., Chavez, A., & Church, G. M. (2017). sgRNA Scorer 2.0: a species-independent model to predict CRISPR/Cas9 activity. ACS synthetic biology, 6(5), 902-904.
122270

271+
Lorenz, R., Bernhart, S. H., Zu Siederdissen, C. H., Tafer, H., Flamm, C., Stadler, P. F., & Hofacker, I. L. (2011). ViennaRNA Package 2.0. *Algorithms for molecular biology*, *6*(1), 1-14.
272+
123273
Montague, T. G., Cruz, J. M., Gagnon, J. A., Church, G. M., & Valen, E. (2014). CHOPCHOP: a CRISPR/Cas9 and TALEN web tool for genome editing. Nucleic acids research, 42(W1), W401-W407.
124274

125-
Sunagawa, G. A., Sumiyama, K., Ukai-Tadenuma, M., Perrin, D., Fujishima, H., Ukai, H., ... & Shimizu, Y. (2016). Mammalian reverse genetics without crossing reveals Nr3a as a short-sleeper gene. Cell reports, 14(3), 662-677.
275+
Sunagawa, G. A., Sumiyama, K., Ukai-Tadenuma, M., Perrin, D., Fujishima, H., Ukai, H., ... & Shimizu, Y. (2016). Mammalian reverse genetics without crossing reveals Nr3a as a short-sleeper gene. Cell reports, 14(3), 662-677.

0 commit comments

Comments
 (0)