Migrate plyr->dplyr #268

MichaelChirico · 2025-05-31T06:46:31Z

{plyr} is long-superseded. The package itself only uses plyr::count(); a vignette also uses round_any().

I was pursuing just dropping {plyr} and re-implementing in base, but gave up for three main reasons:

table(<data.frame>) may fail if there are very few
duplicates and each column is of high cardinality, meaning
table(x) would have a very large number of 0 entries that
need to be computed and dropped (plyr::count() skips them).
We can use something like interaction(..., drop=TRUE) +
tapply() to imitate this, but it's hard to generically
reconstruct the un-interacted levels needed to build an
equivalent data.frame -- basically, we'd need to, for full
generality, use a sep=<str> where <str> is not present in
any of the unique values of any of the columns of x in order
for strsplit(<level>, <sep>) to uniquely map back.
Something like vapply(split(x, x), nrow, integer(1L)) is also
appealingly simple, but split() always drops missing levels
(https://bugs.r-project.org/show_bug.cgi?id=18899) --> we'd
need an onerous/ugly loop over the columns to replace missing
observations with a unique NA-equivalent, end-sorting sentinel.

Thus the move to {dplyr}, despite it being a non-lightweight choice.

I also applied some code quality fixes to nearby lines:

T/F --> TRUE/FALSE.
1:<n> loops replaced by seq_len()/seq_along(), as appropriate.
Loop like x <- c(); for (i in seq_along(y)) x[i] <- foo(y[i]) should pre-initialize x to be length(y).
Move some lines around to avoid creating variables just prior to a possible early return().

jdblischak · 2025-11-03T17:05:05Z

R/MainBar.R

-    for(i in 1:nrow(Freqs)){
-      Freqs$degree[i] <- rowSums(Freqs[ i ,1:num_sets])
+    for(i in seq_len(nrow(Freqs))){
+      Freqs$degree[i] <- rowSums(Freqs[ i ,seq_len(num_sets)])


I think you can remove the for loop altogether. It shouldn't be necessary to iterate through the rows when using rowSums().

I made similar changes back in 2021 in #199 to fix some computational bottlenecks that were causing us problems.

Unfortunately this repository hasn't been updated since 2020 (b14854a), so I don't have any expectation that any of these PRs will ever be merged.

MichaelChirico added 3 commits May 30, 2025 23:43

Migrate plyr->dplyr

eec4e77

move very long comment to GitHub

fa80b8e

also respect row ordering

51ce1b5

MichaelChirico mentioned this pull request May 31, 2025

Refactor plyr::count to use dplyr ggobi/ggally#520

Merged

jdblischak reviewed Nov 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Migrate plyr->dplyr #268

Migrate plyr->dplyr #268

Uh oh!

MichaelChirico commented May 31, 2025 •

edited

Loading

Uh oh!

jdblischak Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Migrate plyr->dplyr #268

Are you sure you want to change the base?

Migrate plyr->dplyr #268

Uh oh!

Conversation

MichaelChirico commented May 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jdblischak Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MichaelChirico commented May 31, 2025 •

edited

Loading