Skip to content

chrom1, chrom2 and pair_type fields are now required in pairs file header #264

@js2264

Description

@js2264
  • Until v1.0.3, pairtools sort allows the header line to list column names chr1 and chr2 (as indicated in official 4DN specs).
  • Starting with v1.1.0, pairtools sort now expects the header line indicating column names to list chrom1 and chrom2, and breaks if the header line is #columns: readID chr1 pos1 chr2 pos2 strand1 strand2.
  • It also seem to require pair_type to be present in the #columns in the header, as well as in a column.

I understand that the chr1/chr2 can be circumvented by specifying -c1 and -c2 fields in CLI, but now if a pair_type column is not included, pairtools sort cannot work. Is this an intended behavior? Sorry if I missed something or if this issue has already been raised.

Reproducible example

  1. Here is an unsorted pairs file I created by hand, with chr1/chr2 in header:
echo -e "## pairs format v1.0
#columns: readID chr1 pos1 chr2 pos2 strand1 strand2
#sorted: readID
#shape: upper triangle
#chromsize: NODE_522 22786
#chromsize: NODE_1404 15015
#chromsize: NODE_1814 13236
NS500150:497:HWH2WBGXC:4:23605:21900:3336\tNODE_1404\t461\tNODE_1404\t246\t --
NS500150:497:HWH2WBGXC:4:23603:4102:4882\tNODE_522\t6855\tNODE_1404\t1035\t--
NS500150:497:HWH2WBGXC:4:23606:10802:17906\tNODE_1404\t1441\tNODE_1814\t4433\t--" > tmp.pairs

This works

pip install pairtools==1.0.3
pairtools sort tmp.pairs 
## pairs format v1.0
#sorted: readID
#shape: upper triangle
#chromsize: NODE_522 22786
#chromsize: NODE_1404 15015
#chromsize: NODE_1814 13236
#columns: readID chr1 pos1 chr2 pos2 strand1 strand2
NS500150:497:HWH2WBGXC:4:23605:21900:3336       NODE_1404       461     NODE_1404       246     --
NS500150:497:HWH2WBGXC:4:23606:10802:17906      NODE_1404       1441    NODE_1814       4433    --
NS500150:497:HWH2WBGXC:4:23603:4102:4882        NODE_522        6855    NODE_1404       1035    --

This fails:

pip install pairtools==1.1.1   ## pairtools 1.1.0 errors with `circular import` 
pairtools sort tmp.pairs 
## pairs format v1.0
#sorted: readID
#shape: upper triangle
#chromsize: NODE_522 22786
#chromsize: NODE_1404 15015
#chromsize: NODE_1814 13236
#columns: readID chr1 pos1 chr2 pos2 strand1 strand2
Traceback (most recent call last):
  File "/home/rsg/micromamba/envs/metator/bin/pairtools", line 8, in <module>
    sys.exit(cli())
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 1082, in main
    rv = self.invoke(ctx)
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/pairtools/cli/__init__.py", line 183, in wrapper
    return func(*args, **kwargs)
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/pairtools/cli/sort.py", line 128, in sort
    sort_py(
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/pairtools/cli/sort.py", line 199, in sort_py
    colindex = int(col) if col.isnumeric() else column_names.index(col) + 1
ValueError: 'chrom1' is not in list
  1. Now, changing the chr1/chr2 to chrom1/chrom2 in the header:
echo -e "## pairs format v1.0
#columns: readID chrom1 pos1 chrom2 pos2 strand1 strand2
#sorted: readID
#shape: upper triangle
#chromsize: NODE_522 22786
#chromsize: NODE_1404 15015
#chromsize: NODE_1814 13236
NS500150:497:HWH2WBGXC:4:23605:21900:3336\tNODE_1404\t461\tNODE_1404\t246\t --
NS500150:497:HWH2WBGXC:4:23603:4102:4882\tNODE_522\t6855\tNODE_1404\t1035\t--
NS500150:497:HWH2WBGXC:4:23606:10802:17906\tNODE_1404\t1441\tNODE_1814\t4433\t--" > tmp2.pairs

This works:

pip install pairtools==1.0.3
pairtools sort tmp2.pairs 
# sorted pairs...

This fails:

pip install pairtools==1.1.1
pairtools sort tmp2.pairs 
## pairs format v1.0
#sorted: readID
#shape: upper triangle
#chromsize: NODE_522 22786
#chromsize: NODE_1404 15015
#chromsize: NODE_1814 13236
#columns: readID chr1 pos1 chr2 pos2 strand1 strand2
Traceback (most recent call last):
  File "/home/rsg/micromamba/envs/metator/bin/pairtools", line 8, in <module>
    sys.exit(cli())
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 1082, in main
    rv = self.invoke(ctx)
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/pairtools/cli/__init__.py", line 183, in wrapper
    return func(*args, **kwargs)
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/pairtools/cli/sort.py", line 128, in sort
    sort_py(
  File "/home/rsg/micromamba/envs/metator/lib/python3.10/site-packages/pairtools/cli/sort.py", line 199, in sort_py
    colindex = int(col) if col.isnumeric() else column_names.index(col) + 1
ValueError: 'pair_type' is not in list
  1. Now, adding pair_type:
echo -e "## pairs format v1.0
#columns: readID chrom1 pos1 chrom2 pos2 strand1 strand2 pair_type
#sorted: readID
#shape: upper triangle
#chromsize: NODE_522 22786
#chromsize: NODE_1404 15015
#chromsize: NODE_1814 13236
NS500150:497:HWH2WBGXC:4:23605:21900:3336\tNODE_1404\t461\tNODE_1404\t246\t --
NS500150:497:HWH2WBGXC:4:23603:4102:4882\tNODE_522\t6855\tNODE_1404\t1035\t--
NS500150:497:HWH2WBGXC:4:23606:10802:17906\tNODE_1404\t1441\tNODE_1814\t4433\t--" > tmp3.pairs

This works:

pip install pairtools==1.0.3
pairtools sort tmp3.pairs 
# sorted pairs...

This works:

pip install pairtools==1.1.1
pairtools sort tmp3.pairs 
## pairs format v1.0
#sorted: readID
#shape: upper triangle
#chromsize: NODE_522 22786
#chromsize: NODE_1404 15015
#chromsize: NODE_1814 13236
#columns: readID chrom1 pos1 chrom2 pos2 strand1 strand2 pair_type
NS500150:497:HWH2WBGXC:4:23605:21900:3336       NODE_1404       461     NODE_1404       246      --
NS500150:497:HWH2WBGXC:4:23606:10802:17906      NODE_1404       1441    NODE_1814       4433    --
NS500150:497:HWH2WBGXC:4:23603:4102:4882        NODE_522        6855    NODE_1404       1035    --

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions