Skip to content

Iterate alleles and genotypes at the same time #363

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 29, 2025

Conversation

benjeffery
Copy link
Contributor

@benjeffery benjeffery commented Apr 25, 2025

In preparation for tskit.
Stacked on #362 clean diff at 33d1d31

@coveralls
Copy link
Collaborator

coveralls commented Apr 25, 2025

Coverage Status

coverage: 98.015% (+0.04%) from 97.976%
when pulling f19420c on benjeffery:iterate_alleles_genotypes
into d30940e on sgkit-dev:main.

@jeromekelleher
Copy link
Contributor

Looks good overall, but I don't like the include_genotypes parameter - the only reason we wouldn't include genotypes is if they don't exist, right? So we can just check this, rather than passing responsibility to client code.

@benjeffery
Copy link
Contributor Author

Looks good overall, but I don't like the include_genotypes parameter - the only reason we wouldn't include genotypes is if they don't exist, right? So we can just check this, rather than passing responsibility to client code.

The edge case here is if genotypes exist, but they aren't in the schema. If we're ok with wasting work fetching them in this odd case then we can remove it.

@jeromekelleher
Copy link
Contributor

The edge case here is if genotypes exist, but they aren't in the schema. If we're ok with wasting work fetching them in this odd case then we can remove it.

I'm fine with that

@benjeffery benjeffery force-pushed the iterate_alleles_genotypes branch 2 times, most recently from 5d6c024 to 32d6362 Compare April 29, 2025 13:44
@benjeffery
Copy link
Contributor Author

Ok, should be good to go now.

@benjeffery benjeffery mentioned this pull request Apr 29, 2025
Copy link
Contributor

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're over complicating this - can't you define the VCF case using the existing functions?

bio2zarr/icf.py Outdated
for field in self.metadata.fields:
self.fields[field.full_name] = IntermediateColumnarFormatField(self, field)
if field.name == "GT":
self.gt_field = field
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the continue for?

alt_field = self.fields["ALT"]

for ref, alt, gt in zip(
ref_field.iter_values(start, stop),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too much in one zip - let's break it up into simpler bits

What's wrong with

if self.gt_field is None:
     for alleles in self.iter_alleles(start, stop):
            yield alleles, None, None
else:
        yield from zip(self.iter_alleles(start, stop), self.iter_genotypes(start, stop)):

and just keep the old definitions?

@benjeffery benjeffery force-pushed the iterate_alleles_genotypes branch from 32d6362 to f19420c Compare April 29, 2025 15:40
@benjeffery
Copy link
Contributor Author

Comments addressed!

@jeromekelleher jeromekelleher merged commit 3d8ad3e into sgkit-dev:main Apr 29, 2025
16 checks passed
@benjeffery benjeffery deleted the iterate_alleles_genotypes branch April 29, 2025 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants