-
Notifications
You must be signed in to change notification settings - Fork 18
Description
Hi there,
I only just discovered your package - I'm excited to start being tidy with my GRanges!
Apologies if I missed something, but I think I am requesting an enhancement.
With tibbles, when I'm grouping on a factor, there's a way to summarize and make sure I include empty groups, by using the .drop=FALSE argument. But for GRanges, I don't see a way to include the empty groups. Again, sorry if I missed it - I have tried searching but didn't see anything.
I've provided code below that I think is a nice small example.
thanks very much,
Janet Young
Malik lab,
Fred Hutch Cancer Research Center,
Seattle, WA
## here's how I include empty groups when summarizing a tibble
library(tidyverse)
fruit_tbl <- data.frame(fruit=factor( c("apple","apple","orange","pear"),
levels=c("apple","orange","pear","banana")),
weight=c(3,4,5,3)) %>%
as_tibble()
# we DO get output for 'banana', the empty group:
fruit_tbl %>%
group_by(fruit, .drop=FALSE) %>%
summarise(numFruits=n(),
mean=mean(weight))
# A tibble: 4 × 3
# fruit numFruits mean
# <fct> <int> <dbl>
# 1 apple 2 3.5
# 2 orange 1 5
# 3 pear 1 3
# 4 banana 0 NaN
But I don't see a way to include empty groups in plyranges. Is that true? Sorry if I missed it. I am using plyranges_1.14.0 (release version). Here's what I tried (after restarting R to make sure tidyverse packages aren't loaded):
library(plyranges)
## make GRanges where not all factor levels are represented (for seqnames, also for regionType)
grng2 <- data.frame(seqnames = sample(c("chr1", "chr2"), 7, replace = TRUE),
strand = sample(c("+", "-"), 7, replace = TRUE),
gc = runif(7),
start = 1:7,
width = 10) %>%
mutate(seqnames=factor(seqnames, levels=c("chr1", "chr2", "chr3"))) %>%
mutate(regionType=factor( sample(c("a", "b"), 7, replace = TRUE),
levels=c("a", "b", "c"))) %>%
as_granges()
## works, but we don't get summaries for the empty levels of seqlevel (chr3) or regionType (c):
grng2 %>%
group_by(seqnames) %>%
summarize(numRegions=n(),
meanGC=mean(gc))
# DataFrame with 2 rows and 3 columns
# seqnames numRegions meanGC
# <Rle> <integer> <numeric>
# 1 chr1 6 0.592756
# 2 chr2 1 0.664616
grng2 %>%
group_by(regionType) %>%
summarize(numRegions=n(),
meanGC=mean(gc))
# DataFrame with 2 rows and 3 columns
# regionType numRegions meanGC
# <factor> <integer> <numeric>
# 1 a 6 0.646677
# 2 b 1 0.341085
## can't use .drop like I would with a tibble
grng2 %>%
group_by(regionType, .drop=FALSE) %>%
summarize(numRegions=n(),
meanGC=mean(gc))
# Error in new_grouping(.data, ...) : Column `.drop` is unknown
and here's my R session information
library(sessioninfo)
sessioninfo::session_info()
─ Session info ─────────────────────────────────────────────────────────────────────────────
setting value
version R version 4.1.2 (2021-11-01)
os macOS Monterey 12.2.1
system x86_64, darwin17.0
ui RStudio
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/Los_Angeles
date 2022-02-23
rstudio 1.4.1717 Juliet Rose (desktop)
pandoc NA
─ Packages ─────────────────────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.1.0)
Biobase 2.54.0 2021-10-26 [1] Bioconductor
BiocGenerics * 0.40.0 2021-10-26 [1] Bioconductor
BiocIO 1.4.0 2021-10-26 [1] Bioconductor
BiocParallel 1.28.3 2021-12-09 [1] Bioconductor
Biostrings 2.62.0 2021-10-26 [1] Bioconductor
bitops 1.0-7 2021-04-24 [1] CRAN (R 4.1.0)
cli 3.1.1 2022-01-20 [1] CRAN (R 4.1.2)
crayon 1.4.2 2021-10-29 [1] CRAN (R 4.1.0)
DBI 1.1.2 2021-12-20 [1] CRAN (R 4.1.1)
DelayedArray 0.20.0 2021-10-26 [1] Bioconductor
dplyr 1.0.7 2021-06-18 [1] CRAN (R 4.1.0)
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0)
fansi 1.0.2 2022-01-14 [1] CRAN (R 4.1.2)
generics 0.1.2 2022-01-31 [1] CRAN (R 4.1.2)
GenomeInfoDb * 1.30.0 2021-10-26 [1] Bioconductor
GenomeInfoDbData 1.2.7 2021-11-16 [1] Bioconductor
GenomicAlignments 1.30.0 2021-10-26 [1] Bioconductor
GenomicRanges * 1.46.1 2021-11-18 [1] Bioconductor
glue 1.6.1 2022-01-22 [1] CRAN (R 4.1.2)
IRanges * 2.28.0 2021-10-26 [1] Bioconductor
lattice 0.20-45 2021-09-22 [1] CRAN (R 4.1.2)
lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.0)
magrittr 2.0.2 2022-01-26 [1] CRAN (R 4.1.2)
Matrix 1.4-0 2021-12-08 [1] CRAN (R 4.1.0)
MatrixGenerics 1.6.0 2021-10-26 [1] Bioconductor
matrixStats 0.61.0 2021-09-17 [1] CRAN (R 4.1.0)
pillar 1.7.0 2022-02-01 [1] CRAN (R 4.1.2)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0)
plyranges * 1.14.0 2021-10-26 [1] Bioconductor
purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.0)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.0)
RCurl 1.98-1.5 2021-09-17 [1] CRAN (R 4.1.0)
restfulr 0.0.13 2017-08-06 [1] CRAN (R 4.1.0)
rjson 0.2.21 2022-01-09 [1] CRAN (R 4.1.2)
rlang 1.0.1 2022-02-03 [1] CRAN (R 4.1.2)
Rsamtools 2.10.0 2021-10-26 [1] Bioconductor
rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0)
rtracklayer 1.54.0 2021-10-26 [1] Bioconductor
S4Vectors * 0.32.3 2021-11-21 [1] Bioconductor
sessioninfo * 1.2.2 2021-12-06 [1] CRAN (R 4.1.0)
SummarizedExperiment 1.24.0 2021-10-26 [1] Bioconductor
tibble 3.1.6 2021-11-07 [1] CRAN (R 4.1.0)
tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.1.0)
utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0)
vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0)
XML 3.99-0.8 2021-09-17 [1] CRAN (R 4.1.0)
XVector 0.34.0 2021-10-26 [1] Bioconductor
yaml 2.2.2 2022-01-25 [1] CRAN (R 4.1.2)
zlibbioc 1.40.0 2021-10-26 [1] Bioconductor
[1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library