-
Notifications
You must be signed in to change notification settings - Fork 10
VCF indexes without record counts lead to assertion #150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Great catch, thanks @shz9. The problem with old phase 1 files is that the ancient version of tabix here doesn't store the record counts in the index that we use. It works fine if you re-index. I'll add an update that either checks for these missing bits of information in the index and raises and informative error, or just prints a warning and handles the fact we can't show progress. |
I've been working with these without problems - can you show the backtrace please? |
AssertionError
when exploding 1000G genotype data
It's the same as above: Explode: 1.87Mvars [57:06, 547vars/s]
total_records: 1872741
metadata.num_records: 0
0
Traceback (most recent call last):
File "/home/szabad/bio2zarr_env/bin/vcf2zarr", line 8, in <module>
sys.exit(vcf2zarr())
File "/home/szabad/bio2zarr_env/lib/python3.9/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/home/szabad/bio2zarr_env/lib/python3.9/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/home/szabad/bio2zarr_env/lib/python3.9/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/szabad/bio2zarr_env/lib/python3.9/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/szabad/bio2zarr_env/lib/python3.9/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/home/szabad/bio2zarr_env/lib/python3.9/site-packages/bio2zarr/cli.py", line 178, in explode
vcf.explode(
File "/home/szabad/bio2zarr_env/lib/python3.9/site-packages/bio2zarr/vcf.py", line 1176, in explode
writer.finalise()
File "/home/szabad/bio2zarr_env/lib/python3.9/site-packages/bio2zarr/vcf.py", line 1142, in finalise
assert total_records == self.metadata.num_records
AssertionError The VCF files are here. I think this is probably the same issue you identified for the older 1000G data, because during conversion, the [W::hts_idx_load3] The index file is older than the data file: data/WGS/chr22.vcf.gz.tbi And indeed, on the website linked to above, it seems that the |
Closed in #164 |
Uh oh!
There was an error while loading. Please reload this page.
I'm using
bio2zarr v0.0.6
and trying to explode 1000G genotype data (NOTE: not recent NYGC WGS data; but older genotype data from ~2013):It seems there's a mismatch between the metadata and the number of records in each of the partitions? This error still comes up when I use a single worker
-p1
, so it's not due to multiprocessing. I printedtotal_records
andself.metadata.num_records
and got the following:Any ideas what might be going on? It seems like it could be an issue in
scan
due to corrupted or outdated VCF format?UPDATE:
This issue also affects the newer NYGC WGS VCF files. Could be a bug that was introduced in recent updates? May be related to #144 .
The text was updated successfully, but these errors were encountered: