@@ -18,6 +18,17 @@ convert your data, basically providing different levels of
18
18
convenience and flexibility corresponding to what you might
19
19
need for small, intermediate and large datasets.
20
20
21
+ :::{warning}
22
+ The documentation of vcf2zarr is under development, and
23
+ some bits are more polished than others. This "tutorial"
24
+ is experimental, and will likely evolve into a slightly
25
+ different format in the near future. It is
26
+ a work in progress and incomplete. The
27
+ {ref}` sec-vcf2zarr-cli-ref ` should be complete
28
+ and authoritative, however.
29
+ :::
30
+
31
+
21
32
## Small dataset
22
33
23
34
The simplest way to convert VCF data to Zarr is to use the
@@ -229,11 +240,33 @@ granularity). You should be careful to use this value in your scripts
229
240
230
241
231
242
Once `` dexplode-init `` is done and we know how many partitions we have,
232
- we need to call `` dexplode-partition `` this number of times.
243
+ we need to call
244
+ {ref}` dexplode-partition<cmd-vcf2zarr-dexplode-partition> ` this number of times:
233
245
234
246
``` {code-cell}
235
247
vcf2zarr dexplode-partition sample-dist.icf 0
236
248
vcf2zarr dexplode-partition sample-dist.icf 1
237
249
vcf2zarr dexplode-partition sample-dist.icf 2
238
250
```
239
251
252
+ This is not how it would be done in practise of course: you would
253
+ use your cluster scheduler of choice to dispatch these operations.
254
+ :::{todo}
255
+ Document how to do this conveniently over some popular schedulers.
256
+ :::
257
+
258
+ :::{tip}
259
+ Use the `` --one-based `` argument in cases in which it's more convenient
260
+ to index the partitions from 1 to n, rather than 0 to n - 1.
261
+ :::
262
+
263
+ Finally we need to call
264
+ {ref}` dexplode-finalise<cmd-vcf2zarr-dexplode-finalise> ` :
265
+ ``` {code-cell}
266
+ vcf2zarr dexplode-finalise sample-dist.icf
267
+ ```
268
+
269
+ :::{todo}
270
+ Document the process for dencode, noting the information output about
271
+ memory requirements.
272
+ :::
0 commit comments