Skip to content

Commit fe09b8f

Browse files
committed
Version 2.0.0
1 parent bad52ab commit fe09b8f

File tree

1 file changed

+48
-11
lines changed

1 file changed

+48
-11
lines changed

README.md

Lines changed: 48 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -64,19 +64,19 @@ The sort order is defined by the barcode indices, lowest first.
6464

6565
*Lima* offers the following features:
6666
* Process both, CLR subreads and CCS reads
67-
* BAM in- and output
67+
* BAM, FASTA, FASTQ in- and output
6868
* Extensive reports that allow in-depth quality control
6969
* Clip barcode sequences and annotate `bq` and `bc` tags
7070
* Agnostic of input barcode sequence orientation
71-
* Split output BAM files by barcode
71+
* Split output files by barcode
7272
* Full PacBio dataset support
7373
* Peek into the first N ZMWs and get average barcode score
7474
* Guess the subset of barcodes used in an input Barcode Set given a mean barcode score threshold
7575
* Enhanced filtering options to remove ambiguous calls
7676
* Double demux to remove PCR primers after barcode demultiplexing
7777

7878
## Latest Version
79-
Version **1.11.0**: [Full changelog here](#full-changelog)
79+
Version **2.0.0**: [Full changelog here](#full-changelog)
8080

8181
## Execution
8282

@@ -86,13 +86,13 @@ Version **1.11.0**: [Full changelog here](#full-changelog)
8686

8787
Run on CLR subread data:
8888

89-
lima movie.subreads.bam barcodes.fasta prefix.bam
90-
lima movie.subreadset.xml barcodes.barcodeset.xml prefix.subreadset.xml
89+
$ lima movie.subreads.bam barcodes.fasta prefix.bam
90+
$ lima movie.subreadset.xml barcodes.barcodeset.xml prefix.subreadset.xml
9191

9292
Run on CCS data:
9393

94-
lima --ccs movie.ccs.bam barcodes.fasta prefix.bam
95-
lima --ccs movie.consensusreadset.xml barcodes.barcodeset.xml prefix.consensusreadset.xml
94+
$ lima --ccs movie.ccs.bam barcodes.fasta prefix.bam
95+
$ lima --ccs movie.consensusreadset.xml barcodes.barcodeset.xml prefix.consensusreadset.xml
9696

9797
If you do not need to import the demultiplexed data into SMRT Link, it is advised
9898
to use `--no-pbi`, omit the pbi index file, to minimize time to result.
@@ -109,8 +109,8 @@ to use `--no-pbi`, omit the pbi index file, to minimize time to result.
109109

110110
### Example execution
111111

112-
lima m54317_180718_075644.subreadset.xml Sequel_RSII_384_barcodes_v1.barcodeset.xml \
113-
m54317_180718_075644.demux.subreadset.xml --different --peek-guess
112+
$ lima m54317_180718_075644.subreadset.xml Sequel_RSII_384_barcodes_v1.barcodeset.xml \
113+
m54317_180718_075644.demux.subreadset.xml --different --peek-guess
114114

115115

116116
## Input data
@@ -119,6 +119,8 @@ unaligned CCS reads, generated by [CCS](https://github.com/PacificBiosciences/cc
119119
both in the PacBio enhanced BAM format. If you want to demux RSII data, first
120120
use SMRT Link or bax2bam to convert h5 to BAM. In addition, a `datastore.json`
121121
with one file entry, either a SubreadSet or ConsensusReadSet, is also allowed.
122+
In addition, CCS reads input are also supported as FASTA or FASTQ, optionally
123+
gzipped.
122124

123125
Barcodes are provided as a FASTA file, one entry per barcode sequence,
124126
**no duplicate** sequences, only upper-case bases,
@@ -159,14 +161,46 @@ prefix as the output file, omitting suffixes `.bam`, `.subreadset.xml`, and
159161
`.consensusreadset.xml`. The report infix is `lima`.
160162
Example:
161163

162-
lima m54007_170702_064558.subreads.bam barcode.fasta /my/path/m54007_170702_064558_demux.subreadset.xml
164+
$ lima m54007_170702_064558.subreads.bam barcode.fasta /my/path/m54007_170702_064558_demux.subreadset.xml
163165

164166
For all output files, the prefix will be `/my/path/m54007_170702_064558_demux.`
165167

166168
### BAM
167169
The first file `prefix.bam` contains clipped records, annotated with
168170
barcode tags, that passed filters.
169171

172+
### FASTA/Q
173+
Alternatively, if output file is fasta or fastq, the header of each sequence
174+
contains all tags, separated by a single whitespace, that would be present in
175+
the BAM format. Example FASTQ header:
176+
177+
@m54006_171006_044150/4588126/ccs bc=3,3 bl=CGCGCGTGTGTGCGTG bq=100 bt=CGCGCGTGTGTGCGTG bx=16,16 cx=12 qe=2235 ql=p\tttropqorrtnnH qs=16 qt=G^\IGR]K8S>>^\^p
178+
179+
### In- and output compatibility matrix:
180+
181+
For CLR data, only XML and BAM are valid in- and output file types.
182+
183+
For CCS data, use following compatibility matrix:
184+
185+
| In/Out | XML | BAM | FASTQ | FASTA |
186+
| ------ | :-: | :-: | :---: | :---: |
187+
| XML | YES | YES | YES | YES |
188+
| BAM | YES | YES | YES | YES |
189+
| FASTQ | no | no | YES | YES |
190+
| FASTA | no | no | no | YES |
191+
192+
This means, you can use CCS FASTQ reads as input and FASTA as output, but
193+
not BAM as output.
194+
195+
Working example:
196+
197+
$ lima movie.Q20.fastq Sequel_RSII_384_barcodes_v1.fasta demuxed.fastq --same
198+
199+
Failing example:
200+
201+
$ lima movie.Q20.fastq Sequel_RSII_384_barcodes_v1.fasta demuxed.bam --same
202+
FATAL -|- Unsupported combination of FASTQ input and BAM output.
203+
170204
### Report
171205
The second file is `prefix.lima.report`, a tab-separated file about each ZMW, unfiltered.
172206
This report contains any information necessary to investigate the demultiplexing
@@ -1069,7 +1103,10 @@ any parameters now, but worth future investigation.
10691103

10701104
## Full Changelog
10711105

1072-
* **1.11.0**:
1106+
* **2.0.0**:
1107+
* Add support for FASTA and FASTQ
1108+
* Fix `-k` with by-strand HiFi reads
1109+
* 1.11.0:
10731110
* Add barcode to read groups, use one barcode pair per RG
10741111
* Fix double demux, used to clip wrongly for the second round of demuxing
10751112
* 1.10.0:

0 commit comments

Comments
 (0)