@@ -64,19 +64,19 @@ The sort order is defined by the barcode indices, lowest first.
64
64
65
65
* Lima* offers the following features:
66
66
* Process both, CLR subreads and CCS reads
67
- * BAM in- and output
67
+ * BAM, FASTA, FASTQ in- and output
68
68
* Extensive reports that allow in-depth quality control
69
69
* Clip barcode sequences and annotate ` bq ` and ` bc ` tags
70
70
* Agnostic of input barcode sequence orientation
71
- * Split output BAM files by barcode
71
+ * Split output files by barcode
72
72
* Full PacBio dataset support
73
73
* Peek into the first N ZMWs and get average barcode score
74
74
* Guess the subset of barcodes used in an input Barcode Set given a mean barcode score threshold
75
75
* Enhanced filtering options to remove ambiguous calls
76
76
* Double demux to remove PCR primers after barcode demultiplexing
77
77
78
78
## Latest Version
79
- Version ** 1.11 .0** : [ Full changelog here] ( #full-changelog )
79
+ Version ** 2.0 .0** : [ Full changelog here] ( #full-changelog )
80
80
81
81
## Execution
82
82
@@ -86,13 +86,13 @@ Version **1.11.0**: [Full changelog here](#full-changelog)
86
86
87
87
Run on CLR subread data:
88
88
89
- lima movie.subreads.bam barcodes.fasta prefix.bam
90
- lima movie.subreadset.xml barcodes.barcodeset.xml prefix.subreadset.xml
89
+ $ lima movie.subreads.bam barcodes.fasta prefix.bam
90
+ $ lima movie.subreadset.xml barcodes.barcodeset.xml prefix.subreadset.xml
91
91
92
92
Run on CCS data:
93
93
94
- lima --ccs movie.ccs.bam barcodes.fasta prefix.bam
95
- lima --ccs movie.consensusreadset.xml barcodes.barcodeset.xml prefix.consensusreadset.xml
94
+ $ lima --ccs movie.ccs.bam barcodes.fasta prefix.bam
95
+ $ lima --ccs movie.consensusreadset.xml barcodes.barcodeset.xml prefix.consensusreadset.xml
96
96
97
97
If you do not need to import the demultiplexed data into SMRT Link, it is advised
98
98
to use ` --no-pbi ` , omit the pbi index file, to minimize time to result.
@@ -109,8 +109,8 @@ to use `--no-pbi`, omit the pbi index file, to minimize time to result.
109
109
110
110
### Example execution
111
111
112
- lima m54317_180718_075644.subreadset.xml Sequel_RSII_384_barcodes_v1.barcodeset.xml \
113
- m54317_180718_075644.demux.subreadset.xml --different --peek-guess
112
+ $ lima m54317_180718_075644.subreadset.xml Sequel_RSII_384_barcodes_v1.barcodeset.xml \
113
+ m54317_180718_075644.demux.subreadset.xml --different --peek-guess
114
114
115
115
116
116
## Input data
@@ -119,6 +119,8 @@ unaligned CCS reads, generated by [CCS](https://github.com/PacificBiosciences/cc
119
119
both in the PacBio enhanced BAM format. If you want to demux RSII data, first
120
120
use SMRT Link or bax2bam to convert h5 to BAM. In addition, a ` datastore.json `
121
121
with one file entry, either a SubreadSet or ConsensusReadSet, is also allowed.
122
+ In addition, CCS reads input are also supported as FASTA or FASTQ, optionally
123
+ gzipped.
122
124
123
125
Barcodes are provided as a FASTA file, one entry per barcode sequence,
124
126
** no duplicate** sequences, only upper-case bases,
@@ -159,14 +161,46 @@ prefix as the output file, omitting suffixes `.bam`, `.subreadset.xml`, and
159
161
` .consensusreadset.xml ` . The report infix is ` lima ` .
160
162
Example:
161
163
162
- lima m54007_170702_064558.subreads.bam barcode.fasta /my/path/m54007_170702_064558_demux.subreadset.xml
164
+ $ lima m54007_170702_064558.subreads.bam barcode.fasta /my/path/m54007_170702_064558_demux.subreadset.xml
163
165
164
166
For all output files, the prefix will be ` /my/path/m54007_170702_064558_demux. `
165
167
166
168
### BAM
167
169
The first file ` prefix.bam ` contains clipped records, annotated with
168
170
barcode tags, that passed filters.
169
171
172
+ ### FASTA/Q
173
+ Alternatively, if output file is fasta or fastq, the header of each sequence
174
+ contains all tags, separated by a single whitespace, that would be present in
175
+ the BAM format. Example FASTQ header:
176
+
177
+ @m54006_171006_044150/4588126/ccs bc=3,3 bl=CGCGCGTGTGTGCGTG bq=100 bt=CGCGCGTGTGTGCGTG bx=16,16 cx=12 qe=2235 ql=p\tttropqorrtnnH qs=16 qt=G^\IGR]K8S>>^\^p
178
+
179
+ ### In- and output compatibility matrix:
180
+
181
+ For CLR data, only XML and BAM are valid in- and output file types.
182
+
183
+ For CCS data, use following compatibility matrix:
184
+
185
+ | In/Out | XML | BAM | FASTQ | FASTA |
186
+ | ------ | :-: | :-: | :---: | :---: |
187
+ | XML | YES | YES | YES | YES |
188
+ | BAM | YES | YES | YES | YES |
189
+ | FASTQ | no | no | YES | YES |
190
+ | FASTA | no | no | no | YES |
191
+
192
+ This means, you can use CCS FASTQ reads as input and FASTA as output, but
193
+ not BAM as output.
194
+
195
+ Working example:
196
+
197
+ $ lima movie.Q20.fastq Sequel_RSII_384_barcodes_v1.fasta demuxed.fastq --same
198
+
199
+ Failing example:
200
+
201
+ $ lima movie.Q20.fastq Sequel_RSII_384_barcodes_v1.fasta demuxed.bam --same
202
+ FATAL -|- Unsupported combination of FASTQ input and BAM output.
203
+
170
204
### Report
171
205
The second file is ` prefix.lima.report ` , a tab-separated file about each ZMW, unfiltered.
172
206
This report contains any information necessary to investigate the demultiplexing
@@ -1069,7 +1103,10 @@ any parameters now, but worth future investigation.
1069
1103
1070
1104
## Full Changelog
1071
1105
1072
- * ** 1.11.0** :
1106
+ * ** 2.0.0** :
1107
+ * Add support for FASTA and FASTQ
1108
+ * Fix ` -k ` with by-strand HiFi reads
1109
+ * 1.11.0:
1073
1110
* Add barcode to read groups, use one barcode pair per RG
1074
1111
* Fix double demux, used to clip wrongly for the second round of demuxing
1075
1112
* 1.10.0:
0 commit comments