Add standard_dimensions for VCZ #389

jeromekelleher · 2025-05-14T16:09:02Z

Fixes #368 and also consolidates dimension handling to some degree

Sets default chunk size to min(size, default) for large dimensions. Closes sgkit-dev#368

coveralls · 2025-05-15T09:33:01Z

coverage: 98.185% (+0.006%) from 98.179%
when pulling c314c2f on jeromekelleher:change-default-chunksize
into 21d0224 on sgkit-dev:main.

Centralise logic around default chunk sizes

jeromekelleher · 2025-05-15T10:31:19Z

Note that this changes the schema output so that we always explicitly list the chunk size for all dimensions:

{
    "format_version": "0.6",
    "dimensions": {
        "variants": {
            "size": 9,
            "chunk_size": 9
        },
        "samples": {
            "size": 3,
            "chunk_size": 3
        },
        "alleles": {
            "size": 4,
            "chunk_size": 4
        },
        "alt_alleles": {
            "size": 3,
            "chunk_size": 3
        },
        "filters": {
            "size": 3,
            "chunk_size": 3
        },
        "ploidy": {
            "size": 2,
            "chunk_size": 2
        },
        "genotypes": {
            "size": 10,
            "chunk_size": 10
        },
        "INFO_AC_dim": {
            "size": 2,
            "chunk_size": 2
        },
        "INFO_AF_dim": {
            "size": 2,
            "chunk_size": 2
        },
        "FORMAT_HQ_dim": {
            "size": 2,
            "chunk_size": 2
        }
    },
    "fields": [
        {
            "name": "variant_contig",
            "dtype": "i1",
            "dimensions": [
                "variants"
            ],
            "description": "An identifier from the reference genome or an angle-bracketed ID string pointing to a contig in the assembly file",
            "compressor": null,
            "filters": null,
            "source": null
        },

I think this is better for now, as the logic around initialisation and defaults was quite tricky. We can always revert back to "no chunk size means chunk_size = size" for just JSON deserialisation later, if we want to make things a bit more concise.

benjeffery

Looks good - I don't think that we need a schema version bump right? These changes look backward compatible.

jeromekelleher · 2025-05-15T11:53:00Z

Looks good - I don't think that we need a schema version bump right? These changes look backward compatible.

Looks like we're still on 0.4 without the dimensions for the released version, so I think we're OK.

Add standard_dimensions for VCZ and fix chunk size

b338e64

Sets default chunk size to min(size, default) for large dimensions. Closes sgkit-dev#368

jeromekelleher force-pushed the change-default-chunksize branch from a7dce49 to b338e64 Compare May 15, 2025 09:28

Simplify logic around Dimension init

c314c2f

Centralise logic around default chunk sizes

jeromekelleher marked this pull request as ready for review May 15, 2025 10:23

benjeffery approved these changes May 15, 2025

View reviewed changes

jeromekelleher added this pull request to the merge queue May 15, 2025

Merged via the queue into sgkit-dev:main with commit 5d2bd3b May 15, 2025
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add standard_dimensions for VCZ #389

Add standard_dimensions for VCZ #389

jeromekelleher commented May 14, 2025

coveralls commented May 15, 2025 •

edited

Loading

jeromekelleher commented May 15, 2025 •

edited

Loading

benjeffery left a comment

jeromekelleher commented May 15, 2025

Add standard_dimensions for VCZ #389

Add standard_dimensions for VCZ #389

Conversation

jeromekelleher commented May 14, 2025

coveralls commented May 15, 2025 • edited Loading

jeromekelleher commented May 15, 2025 • edited Loading

benjeffery left a comment

Choose a reason for hiding this comment

jeromekelleher commented May 15, 2025

coveralls commented May 15, 2025 •

edited

Loading

jeromekelleher commented May 15, 2025 •

edited

Loading