-
Notifications
You must be signed in to change notification settings - Fork 18
Open
Description
Feature request: add an argument to the join_overlap_intersect
function that allows additional overlap based on metadata values.
For some context, here is a hypothetical example:
I have two GRanges objects, one for introns and one for transcripts.
## intron GRanges
intron
# GRanges object with 1 range and 2 metadata columns:
# seqnames ranges strand | type transcript_id
# <Rle> <IRanges> <Rle> | <factor> <character>
# 1 100149098-100152384 - | intron ENST00000370137.6
## transcript GRanges
trans
# GRanges object with 2 range and 3 metadata columns:
# seqnames ranges strand | transcript_name gene_name
# <Rle> <IRanges> <Rle> | <character> <character>
# 1 100148448-100178256 - | ENST00000370137.6 LRRC39
# 1 100133163-100150496 - | ENST00000370141.8 TRMT13
I want to join these GRanges objects so I can annotate the intron
GRanges with gene_name
metadata.
However, when I use join_overlap_left
, the range of the intron
row overlaps both the rows from trans
.
intron <- join_overlap_left(intron, trans)
intron
# GRanges object with 2 range and 3 metadata columns:
# seqnames ranges strand | type transcript_id transcript_name gene_name
# <Rle> <IRanges> <Rle> | <factor> <character> <character> <character>
# 1 100149098-100152384 - | intron ENST00000370137.6 ENST00000370137.6 LRRC39
# 1 100149098-100152384 - | intron ENST00000370137.6 ENST00000370141.8 TRMT13
The desired output would only overlap with the trans
row corresponding to trans$transcript_name == "ENST00000370137.6"
.
Here, the overlap should be based on the range as well as the metadata columns:
- intron$transcript_id
- trans$transcript_name
R session information
Remember to include your full R session information.
options(width = 120)
sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
setting value
version R version 4.3.2 (2023-10-31)
os macOS Sonoma 14.3
system x86_64, darwin20
ui RStudio
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/Toronto
date 2024-02-08
rstudio 2023.06.1+524 Mountain Hydrangea (desktop)
pandoc NA
Metadata
Metadata
Assignees
Labels
No labels