Skip to content

[Docker]: Retooled the Dockerfile to use a base image from r-minimal.#108

Merged
BethanyG merged 2 commits intomainfrom
docker-slimdown-attempt
Feb 22, 2026
Merged

[Docker]: Retooled the Dockerfile to use a base image from r-minimal.#108
BethanyG merged 2 commits intomainfrom
docker-slimdown-attempt

Conversation

@BethanyG
Copy link
Member

@BethanyG BethanyG commented Feb 21, 2026

I think this succeeds in getting the image down to ~210mb. But we probably want to do more testing than the tests in run-tests-in-docker.sh.

Let me know if there are more libs needed than the ones listed. More information on r-minimal can be found here.

Not quite sure it used the latest image which uses R 5.4.2, so we can be explicit about it if needed.

@BethanyG BethanyG requested a review from a team as a code owner February 21, 2026 02:19
@depial
Copy link
Contributor

depial commented Feb 21, 2026

Docker image stuff is where I must defer to @colinleach.

The one thing I can comment on is that the Run Tests in Docker part of the CI / Tests took a really long time, 8m 27s versus the 55s in the last merged PR. Given, I don't know if that is because this is the first time the image was built or if these are different in other ways.

@colinleach
Copy link
Contributor

I'm not into the detail of this yet, but building Tidyverse stuff takes for ever. It relies on g++ to compile what seems like millions of lines of C++ source. I've seen 13 mins or more to install it locally (recent-model Ryzen 9, 32 GB RAM, high-end NVMe drive).

GitHub CI doesn't cache the build between runs.

@BethanyG
Copy link
Member Author

This image takes forever to build from scratch. But from testing on my machine, it takes very little time once the initial image gets done. I think the time is because everything has to be manually compiled and built ... and then all the compilers and other things like RStudio and docs and a bunch of other stuff is removed.

@colinleach
Copy link
Contributor

This sort of size reduction (2.87 GB to 209 MB for my local images) instinctively feels too good to be true! However, so far it has passed every test I've thrown at it.

For comparison, I looked at my ~/R/ directory where R itself and all the packages are installed. That's 341.7 MB on disk, so substantially larger than the entire docker image. However, the biggest part of that is the ragg library, with all the graphics back end drivers, and that isn't included in this PR.

I'll give my approval for now, and keep my fingers crossed as I keep testing...

@colinleach
Copy link
Contributor

colinleach commented Feb 21, 2026

Still looking good!

In case anyone wonders which R packages are available in the image, this is a listing of /usr/local/lib/R/library:

DBI            blob           cpp11          evaluate       googlesheets4  isoband        modelr         ps             rprojroot      sys            tools          xml2
R6             brio           crayon         farver         grDevices      jquerylib      openssl        purrr          rstudioapi     systemfonts    tzdb           yaml
RColorBrewer   broom          curl           fastmap        graphics       jsonlite       parallel       ragg           rvest          tcltk          utf8
S7             bslib          data.table     fontawesome    grid           knitr          pillar         rappdirs       sass           testthat       utils
_cache         cachem         datasets       forcats        gtable         labeling       pkgbuild       readr          scales         textshaping    uuid
askpass        callr          dbplyr         fs             haven          lifecycle      pkgconfig      readxl         selectr        tibble         vctrs
backports      cellranger     desc           gargle         highr          lubridate      pkgload        rematch        splines        tidyr          viridisLite
base           cli            diffobj        generics       hms            magrittr       praise         rematch2       stats          tidyselect     vroom
base64enc      clipr          digest         ggplot2        htmltools      memoise        prettyunits    reprex         stats4         tidyverse      waldo
bit            compiler       dplyr          glue           httr           methods        processx       rlang          stringi        timechange     withr
bit64          conflicted     dtplyr         googledrive    ids            mime           progress       rmarkdown      stringr        tinytex        xfun

There are things there we don't directly need, but I suspect they may be dependencies of important stuff, so removing them will break stuff.

@BethanyG
Copy link
Member Author

BethanyG commented Feb 21, 2026

Happy that the testing is going well! Thrilled that we might be on the right track.
The build time is a concern tho, so I would caution against merging this right away.

I couldn't sleep, so I did some rough research early this morning. There looks to be some techniques for caching binaries and layers to speed up build times for images. I am going to do some reading and see if any of it makes a difference, 4min build + test seems bearable, but 8+ min is ..... sorta ugly for CI if you are going to be iterating on a bunch of changes. And it would be really really nice if CI didn't take more than 2min or less.

Here are some resources I've found:

  1. dockers guidance
  2. OneUptime
  3. Reddit (might not be all that useful, but still)
  4. Stack Overflow (4 years old, but maybe still good?)
  5. Posit (they might have some prebuilt stuff - haven't dug in yet)
  6. renv (not sure its relevant)
  7. a mulit-year JupyterHub discussion on R builds
  8. RStats general package install guidance

@colinleach
Copy link
Contributor

No big surprise, but I also confirmed the image uses the latest : R version 4.5.2 (2025-10-31)

@BethanyG
Copy link
Member Author

It would appear that using a prebuilt binary is the way to cut down on build time. Small problem: both Posit and CRAN do these with glibc. Alpine uses musl, and no one is publishing binaries for that. I had hope with devxy, but they don't look like they're really baked yet.

We can do our own...but .... details, details. Sigh.

@colinleach
Copy link
Contributor

I'm not sure that maintaining our own binaries is really more appealing than accepting 8-minute builds? Base R is pretty stable, but there's quite a bit of development on the dozens of packages we're installing.

@BethanyG
Copy link
Member Author

Yeah. Maintaining the binaries would be ... ugh. And I think that we are building and pushing the image to a repo for production - so the time sink is here in the test-runner repo.

@colinleach
Copy link
Contributor

Assuming that @depial is now happy with the v3 test runner, we might only be rebuilding the docker image every month or two, for dependabot stuff. I can't get too upset about something I can walk away from for a few minutes while it runs.

Though that reminds me, I need to check how dependabot is set up. It regularly updates the Julia test runner with new versions, I think R is less active.

@BethanyG
Copy link
Member Author

ooh. so tempted to try and rip out googledrive and googlesheets from this. But it also ... works? So maybe we don't.

@depial
Copy link
Contributor

depial commented Feb 21, 2026

Assuming that @depial is now happy with the v3 test runner,

Yep! I tried to get the important refactoring out of the way in the last PR for run.R so it will be stable unless I learn something major. So I'm happy with its current form, and I'm going to try to channel my perfectionistic tendencies elsewhere for the time being... 😅

@colinleach
Copy link
Contributor

colinleach commented Feb 21, 2026

so tempted to try and rip out googledrive and googlesheets

A thought I already had! At least they're only 2.2MB and 0.7MB respectively, and I suspect something will break if we remove them.

The Tidyverse collection is designed to all work together, and there is a complex web of dependencies.

@BethanyG
Copy link
Member Author

So I'm happy with its current form, and I'm going to try to channel my perfectionistic tendencies elsewhere for the time being

Wanna write analysis rules for Python exercises? (JOKING)

@colinleach
Copy link
Contributor

The Tidyverse collection is designed to all work together

I should note that this is really quite the innovation in the R world! Traditionally, things just sort of ... happened

@BethanyG
Copy link
Member Author

@colinleach @depial - I'll let one of the two of you do the honors...... 😄

@colinleach
Copy link
Contributor

Bethany, this is your (non-trivial!) achievement. Press the button!

@BethanyG
Copy link
Member Author

Pushing the button....hoping we don't have to revert......

@BethanyG BethanyG merged commit 4149863 into main Feb 22, 2026
1 check passed
@BethanyG
Copy link
Member Author

YAY it works!! 🎉

Putting a note here so we don't forget. One of the major changes that r-minimal makes is to forego worldwide timezone support. So when we get to time and timezones, we need to remember to install/enable more than just UTC and America/NY....or not. But its not there right now. 😄

@colinleach
Copy link
Contributor

Something to explore when we get to writing a dates-times concept. This is now handled best by the lubridate library, and I have no idea how this will be affected.

I'm currently writing up string stuff, some of which is locale-dependent (things like upper/lower case). I don't know if that's also impacted.

@BethanyG
Copy link
Member Author

The best place to check is at the r-minimal project. But the good news is that the container is small enough that if we need to layer in 50mb of date and local stuff ... well, we can. 😄

@keiravillekode
Copy link
Contributor

But we probably want to do more testing than the tests in run-tests-in-docker.sh.

generic-track has a bin/verify-exercises-in-docker which lets us test all a track's exercises against the test runner image.

Examples of how it can be adapted for R and Python:

exercism/r@main...keiravillekode:r:verify-exercises-in-docker

exercism/python@main...keiravillekode:python:verify-exercises-in-docker

@colinleach colinleach mentioned this pull request Feb 22, 2026
@BethanyG
Copy link
Member Author

BethanyG commented Feb 22, 2026

Examples of how it can be adapted for R and Python:

exercism/r@main...keiravillekode:r:verify-exercises-in-docker

exercism/python@main...keiravillekode:python:verify-exercises-in-docker

Thank you! I think we should definitely set something up for R, and your script looks like a good place to start.

The Python track already does this in the content repo CI as well as the Runner CI. 😄 And the runner repo has a run-tests-in-docker.sh that runs specific runner tests to make sure that the JSON output doesn't deviate based on PyTest options. Ditto for the Representer and Analyzer. The Analyzer tests could use some love tho - they're sparse compared to the other tooling. I think the scripts in the content repo predate the generic track script. I do need to add better markdown linting and more Analyzer tests tho, Always work to do..... 😅

One of the issues with verifying exercises in the R CI is that currently the image takes 8 minutes to build, which is an eternity. So then waiting to verify all the exercises is just .... mean. 😂 But triggering things manually with a verify-exercises-in-docker.sh could be good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants