Skip to content

Commit 7ec7289

Browse files
committed
Version 2.5.0
1 parent ddeb080 commit 7ec7289

File tree

13 files changed

+145
-32
lines changed

13 files changed

+145
-32
lines changed

docs/changelog.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,17 @@ nav_order: 99
66

77
# Version changelog
88

9-
* **2.4.0**:
9+
* **2.5.0**:
10+
* Upcoming SMRT Link release
11+
* Add [`lima-undo` functionality](/faq/undo)
12+
* Support methylation tag clipping
13+
* Add progress and ETA for `--log-level INFO`
14+
* Rename `--preset` to [`--hifi-preset`](/faq/hifi-presets)
15+
* Add barcoded adapter `--hifi-preset SYMMETRIC-ADAPTERS`
16+
* Fixes to support stranded HiFi BAM input
17+
* Do not abort on empty input, but warn only
18+
19+
* 2.4.0:
1020
* Fix fasta/q input and `--guess`
1121
* Output empty files for missing barcode pairs `--output-missing-pairs`
1222
* Output each barcode into its own sub-directory `--split-subdirs`

docs/faq/Speed.md

Lines changed: 11 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -5,22 +5,19 @@ title: Speed
55
---
66

77
## How fast is fast?
8-
Example: 200 barcodes, asymmetric mode (try each barcode forward and
9-
reverse-complement), 300,000 CCS reads. On my 2014 iMac with 4 cores + HT:
8+
Example: 64 barcodes / asymmetric mode / 1.9M HiFi reads on a dual 64c EPYC system:
109

11-
503.57s user 11.74s system 725% cpu 1:11.01 total
10+
Processed : 1912155
11+
Throughput: 2393135/min
12+
Run Time : 48s 306ms
13+
CPU Time : 2h 14m
1214

13-
Those 1:11 minutes translate into 0.233 milliseconds per ZMW,
14-
1.16 microseconds per barcode for both sides aligning forward and reverse-complement,
15-
and 291 nanoseconds per alignment. This includes IO.
15+
That's 2.4M HiFi reads processed per minute on 128 physical CPU cores, including
16+
IO.
1617

17-
## Why doesn't *lima* utilize the maximum number of provided cores?
18-
This might be a simple IO bottleneck. With a barcode.fasta containing only a few
19-
barcodes, most of the time is spent reading and writing BAM files, as the barcode
20-
identification is too fast. Starting version 2.2.0, you can enable multi-threaded
21-
BAM reading by setting the number of threads via an environment variable
18+
## Is there a way to show the progress?
19+
Yes, please use `--log-level INFO`. If there is a `.pbi` file present, the
20+
estimated time will be shown. Otherwise, it will show progress as number of
21+
reads every 5 seconds.
2222

23-
export PB_BAMREADER_THREADS=2
2423

25-
## Is there a way to show the progress?
26-
No. Please run `wc -l prefix.report` to get the number of already processed ZMWs.

docs/faq/barcoded-adapter.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
---
2+
layout: default
3+
parent: FAQ
4+
title: Barcoded Adapter
5+
---
6+
7+
## Barcoded Adapter
8+
The most convenient way to barcode a sample is the use of barcoded adapters, as
9+
depicted in the [barcode design overview](barcode-design). One minor
10+
disadvantage is that the ligation might not be as efficient as with standard
11+
SMRTbell adapters, leaving some molecules only with one adapter. As barcoded
12+
adapter designs are inherently symmetric, we implemented ways to recover the
13+
demultiplexed yield from one-sided barcoded molecules with ease.
14+
15+
As the first step, generate HiFi data with *ccs* v6.3.0 or later. This version
16+
will store [additional tags per
17+
records](https://ccs.how/faq/missing-adapters.html), indicating if the molecule
18+
has missing adapters on either side. The second step is to use the new
19+
`--hifi-preset SYMMETRIC-ADAPTERS` introduced with *lima* v2.5.0, [described
20+
here](/faq/hifi-presets). That's it.

docs/faq/biosample.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,3 +22,18 @@ relevant. Example:
2222
Provide this CSV to lima via `--biosample-csv input.csv`.
2323

2424
This will associate the bio sample name to the read group using the `SM` tag.
25+
26+
## UUID passthrough
27+
Since *lima* v2.5.0, the functionality has been enhanced to allow specifying
28+
UUIDs for the resulting XML files; for this, use `--reuse-uuids` in addition to
29+
the extended csv for `--biosample-csv`. Example:
30+
31+
Barcodes,UUID,Bio Sample
32+
bc1001--bc1001,11111111-1111-1aaa-0111-111111111111,Alfred
33+
bc1002--bc1002,22222222-2222-2bbb-8222-222222222222,Berthold
34+
bc1003--bc1003,33333333-3333-3ccc-9222-333333333333,Constantin
35+
bc1008--bc1008,e04f12c9-7b2e-45fd-ab49-1bc2f75d653a,Holger
36+
37+
Ensure that the UUID matches the regex
38+
39+
[0-9a-f]{8}-[0-9a-f]{4}-[0-5][0-9a-f]{3}-[089ab][0-9a-f]{3}-[0-9a-f]{12}

docs/faq/hifi-presets.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
---
2+
layout: default
3+
parent: FAQ
4+
title: HiFi Presets
5+
---
6+
7+
## HiFi presets
8+
With v2.5.0 we introduced the concept of recommended parameter presets called
9+
`--hifi-preset`. All preset use
10+
11+
--ccs --min-score 80 --min-end-score 50 --min-ref-span 0.75
12+
13+
in addition they differ as following
14+
15+
| Preset | Definition |
16+
| -------------------- | ------------------------------------- |
17+
| `SYMMMETRIC` | `--same` |
18+
| `SYMMETRIC-ADAPTERS` | `--same --ignore-missing-adapters` |
19+
| `ASYMMETRIC` | `--different --min-scoring-regions 2` |
20+
21+
For barcoded adapter libraries, `SYMMETRIC-ADAPTERS` will increase demultiplexed
22+
yield. More info under [barcoded adapter FAQ](/faq/barcoded-adapter)

docs/faq/how-to-run.md

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -20,18 +20,15 @@ Run on CCS / HiFi data:
2020
$ lima <movie>.ccs.bam <barcodes>.fasta <demux>.bam
2121
$ lima <movie>.consensusreadset.xml <barcodes>.barcodeset.xml <demux>.consensusreadset.xml
2222

23-
If you do not need to import the demultiplexed data into SMRT Link, it is advised
24-
to use `--no-pbi`, omit the pbi index file, to minimize time to result.
25-
2623
### *Symmetric* or *Tailed* options
2724

2825
CLR: --same
29-
CCS: --same --ccs
26+
CCS: --preset-hifi SYMMETRIC
3027

3128
### *Asymmetric* options
3229

3330
CLR: --different
34-
CCS: --different --ccs
31+
CCS: --preset-hifi ASYMMETRIC
3532

3633
### Example execution
3734

docs/faq/primer.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,5 @@ title: Primer removal
55
---
66

77
## Can I remove PCR primers after demultiplexing?
8-
Yes! After demultiplexing, just lima on the output again with your PCR primer(s).
8+
Yes! After demultiplexing, just call *lima* on the output again with your PCR
9+
primer(s).

docs/faq/split-output.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ You can either iterate over the `prefix.bam` file N times or use
99
`--split-bam`. Each barcode has its own BAM file called
1010
`prefix.idxBest--idxCombined.bam`, e.g., `prefix.0--0.bam`.
1111

12-
The optional parameter `--split-bam-named`, names the files by their barcode names instead
12+
The optional parameter `--split-named`, names the files by their barcode names instead
1313
of their barcode indices. Non-word characters, anything except [A-Za-z0-9_],
1414
in barcode names are replaced with an underscore in the file name.
1515

@@ -26,3 +26,11 @@ sequence is barcode `0` and the last barcode `numBarcodes - 1`.
2626
If you use output BAM splitting, it can happen that you get a lot of output files.
2727
Using `--files-per-directory N` creates subdirectories and outputs at most `N`
2828
barcodes per directory.
29+
30+
## Split barcodes into own sub-directories
31+
Since v2.5.0 each barcode can be stored in its own sub-directory: `--split-subdirs`.
32+
A parent XML will point to all of the barcoded files.
33+
34+
## Output missing barcodes
35+
If you have provided bio samples with barcode pairs, option `--output-missing-pairs`
36+
allows to create empty barcode files in all split modes.

docs/faq/undo.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
---
2+
layout: default
3+
parent: FAQ
4+
title: Undo
5+
---
6+
7+
## Undo demultiplexing
8+
With the introduction of *lima* v2.5.0, it is possible to undo all
9+
demultiplexing steps for **HiFi data**. For this, the bioconda package contains a
10+
new `lima-undo` binary.
11+
12+
Example:
13+
14+
lima movie.hifi_reads.bam demux.consensusreadset.xml --hifi-preset SYMMETRIC --store-unbarcoded
15+
lima-undo demux.consensusreadset.xml undo.bam
16+
17+
Let's unroll what's happening. In the first line, we explicitly request to store
18+
the unbarcoded reads. Without this, we would not be able to recover unbarcoded
19+
reads. The `XML` contains all the file paths to the `BAM` files. The second call is
20+
to the new *lima-undo* binary that takes a `XML` or `BAM` file as input and
21+
ouput.
22+
23+
Optionally, you can also provide multiple input `BAM` files with one output `BAM`:
24+
25+
lima-undo demux.bam demux.unbarcoded.bam undo.bam
26+
27+
This works also with split BAM files:
28+
29+
lima-undo demux.bc1001-bc1001.bam demux.bc1002-bc1002.bam demux.unbarcoded.bam undo.bam
30+
31+
## How does it work?
32+
*lima* v2.5.0 and later stores everything that got clipped in an internal binary
33+
structure in the `ls` tag. Multiple demultiplexing rounds are supported. Once
34+
*lima-undo* gets called, for each read the individual demultiplexing steps get
35+
reverted until the read is identical to the original HiFi read.
36+
37+
## How can I check if undo results are correct?
38+
How to check that the result is identical:
39+
40+
samtools sort --no-PG -t "zm" undo.bam -o sorted.undo.bam
41+
samtools view --no-PG sorted.undo.bam > undo.sam
42+
samtools view --no-PG movie.hifi_reads.bam > original.sam
43+
diff original.sam undo.sam

docs/get-started.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -73,11 +73,11 @@ For CCS / HiFi data, use following compatibility matrix:
7373

7474
HiFi run from *BAM* with **symmetric** barcodes:
7575

76-
lima <movie>.hifi_reads.bam barcodes.fasta <movie>.demux.bam --same --ccs --min-score 80
76+
lima <movie>.hifi_reads.bam barcodes.fasta <movie>.demux.bam --hifi-prefix SYMMETRICS
7777

7878
HiFi run from *FASTQ* with **asymmetric** barcodes:
7979

80-
lima <movie>.hifi_reads.fq.gz barcodes.fasta <movie>.demux.fastq --different --ccs --min-score 80
80+
lima <movie>.hifi_reads.fq.gz barcodes.fasta <movie>.demux.fastq --hifi-prefix ASYMMETRIC
8181

8282
CLR run from *XML* with **symmetric** barcodes:
8383

0 commit comments

Comments
 (0)