-
Notifications
You must be signed in to change notification settings - Fork 16
Design (+implementation) to switch to support/use OCI containers to produce singularity images #153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
yarikoptic
wants to merge
3
commits into
master
Choose a base branch
from
use-oci
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+1,267
−0
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,19 @@ | ||
| # CLAUDE.md | ||
|
|
||
| This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. | ||
|
|
||
| ## Build/Test Commands | ||
| - Run all tests: `bats -t scripts/tests` | ||
| - Run a single test: `bats -t scripts/tests/test_singularity_cmd.bats` | ||
| - Lint shell scripts: `shellcheck scripts/*` | ||
|
|
||
| ## Code Style Guidelines | ||
| - Follow DataLad/Git-Annex conventions for repository structure | ||
| - Shell scripts should pass shellcheck validation | ||
| - Maintain YODA principles (store all dependencies within the dataset) | ||
| - Tests use the bats framework with helpers in `scripts/tests/test_helpers.bash` | ||
| - Use snake_case for function and variable names | ||
| - Scripts should include proper error handling and validate inputs | ||
| - Document environment variables that affect script behavior | ||
| - Maintain backward compatibility with DataLad commands | ||
| - Follow proper Singularity image naming: `name--version.sing` format |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,267 @@ | ||
| # OCI-Based Container Workflow Migration Guide | ||
|
|
||
| This guide documents the implementation of the OCI-based container workflow as described in [use-oci-1.md](use-oci-1.md). | ||
|
|
||
| ## Overview | ||
|
|
||
| The new workflow migrates from building Singularity containers directly from Docker images to using OCI containers as an intermediate step. This provides better reproducibility and URL-based availability for all container components. | ||
|
|
||
| ## Components | ||
|
|
||
| ### 1. `scripts/oci_cmd` | ||
|
|
||
| A simple wrapper script that passes commands to `apptainer`. This script is registered with DataLad containers as the command wrapper for OCI containers. | ||
|
|
||
| **Usage:** | ||
| ```bash | ||
| scripts/oci_cmd <apptainer-command> [arguments...] | ||
| ``` | ||
|
|
||
| **Example:** | ||
| ```bash | ||
| scripts/oci_cmd run container.oci/ | ||
| scripts/oci_cmd build output.sif input.oci/ | ||
| ``` | ||
|
|
||
| ### 2. `scripts/migrate_to_oci` | ||
|
|
||
| Migration script that converts existing auto-generated Singularity containers to the OCI-based workflow. | ||
|
|
||
| **Features:** | ||
| - Identifies auto-generated Singularity files (marked with "Automagically prepared") | ||
| - Creates OCI images in `images-oci/` subdataset | ||
| - Builds SIF files from OCI images | ||
| - Updates `.datalad/config` to point to new SIF files | ||
| - Removes old Singularity recipe and `.sing` files | ||
| - Verifies all annex files are available from URLs | ||
|
|
||
| **Usage:** | ||
| ```bash | ||
| # Migrate all auto-generated containers | ||
| scripts/migrate_to_oci | ||
|
|
||
| # Migrate specific containers | ||
| scripts/migrate_to_oci images/bids/Singularity.bids-validator--1.2.3 | ||
|
|
||
| # Continue even if some migrations fail | ||
| scripts/migrate_to_oci --skip-failures | ||
|
|
||
| # Log failures to a file | ||
| scripts/migrate_to_oci --log-file migration_failures.log | ||
| ``` | ||
|
|
||
| ## Workflow | ||
|
|
||
| ### Creating New OCI-Based Containers | ||
|
|
||
| 1. **Add OCI container** (in `images-oci/` subdataset): | ||
| ```bash | ||
| cd images-oci/ | ||
| datalad containers-add \ | ||
| --url oci:docker://bids/validator:1.2.3 \ | ||
| -i bids/bids-validator--1.2.3.oci \ | ||
| bids-validator | ||
| ``` | ||
|
|
||
| 2. **Verify annex URLs**: | ||
| ```bash | ||
| git annex find --not --in datalad --and --not --in web bids/bids-validator--1.2.3.oci | ||
| ``` | ||
| This should return empty output (all files have URLs). | ||
|
|
||
| 3. **Build SIF image** (from repository root): | ||
| ```bash | ||
| datalad run \ | ||
| -m "Build SIF image for bids/bids-validator--1.2.3.sif" \ | ||
| --output images/bids/bids-validator--1.2.3.sif \ | ||
| scripts/oci_cmd build \ | ||
| images/bids/bids-validator--1.2.3.sif \ | ||
| images-oci/bids/bids-validator--1.2.3.oci/ | ||
| ``` | ||
|
|
||
| 4. **Register container** (if needed): | ||
| ```bash | ||
| datalad containers-add \ | ||
| bids-validator \ | ||
| -i images/bids/bids-validator--1.2.3.sif \ | ||
| --update \ | ||
| --call-fmt "{img_dspath}/scripts/singularity_cmd run {img} {cmd}" | ||
| ``` | ||
|
|
||
| ### Migrating Existing Containers | ||
|
|
||
| The migration process for a single container involves: | ||
|
|
||
| 1. **Parse Singularity file** - Extract Docker image URL from `From:` line | ||
| 2. **Create OCI image** - Use `datalad containers-add` with `oci:docker://` URL | ||
| 3. **Verify URLs** - Ensure all annex files are available from web | ||
| 4. **Build SIF** - Convert OCI to SIF using `scripts/oci_cmd build` | ||
| 5. **Update config** - Point `.datalad/config` to new SIF file | ||
| 6. **Remove old files** - Delete Singularity recipe and `.sing` file | ||
| 7. **Commit changes** - Create a commit documenting the migration | ||
|
|
||
| **Example migration workflow:** | ||
| ```bash | ||
| # Test on simple cases first | ||
| scripts/migrate_to_oci \ | ||
| images/bids/Singularity.bids-validator--1.2.3 \ | ||
| images/bids/Singularity.bids-rshrf--1.0.0 | ||
|
|
||
| # If successful, migrate all | ||
| scripts/migrate_to_oci --skip-failures --log-file migration.log | ||
| ``` | ||
|
|
||
| ## Repository Structure | ||
|
|
||
| ``` | ||
| . | ||
| ├── images/ # SIF files (final container images) | ||
| │ ├── bids/ | ||
| │ │ ├── bids-validator--1.2.3.sif | ||
| │ │ └── bids-aa--0.2.0.sif | ||
| │ └── neurodesk/ | ||
| │ └── neurodesk-afni--21.2.00.sif | ||
| │ | ||
| ├── images-oci/ # OCI containers (subdataset) | ||
| │ ├── bids/ | ||
| │ │ ├── bids-validator--1.2.3.oci/ | ||
| │ │ └── bids-aa--0.2.0.oci/ | ||
| │ └── neurodesk/ | ||
| │ └── neurodesk-afni--21.2.00.oci/ | ||
| │ | ||
| ├── scripts/ | ||
| │ ├── oci_cmd # Apptainer wrapper | ||
| │ ├── migrate_to_oci # Migration script | ||
| │ └── singularity_cmd # Existing Singularity wrapper | ||
| │ | ||
| └── .datalad/ | ||
| └── config # Container registrations | ||
| ``` | ||
|
|
||
| ## Verification | ||
|
|
||
| After migration, verify that: | ||
|
|
||
| 1. **All annex files have URLs:** | ||
| ```bash | ||
| git annex find --not --in datalad --and --not --in web images-oci/ | ||
| ``` | ||
| Should return empty. | ||
|
|
||
| 2. **SIF files exist:** | ||
| ```bash | ||
| ls -lh images/bids/*.sif | ||
| ``` | ||
|
|
||
| 3. **Container configuration updated:** | ||
| ```bash | ||
| git config -f .datalad/config --get-regexp 'datalad.containers.*.image' | grep '.sif$' | ||
| ``` | ||
|
|
||
| 4. **Old files removed:** | ||
| ```bash | ||
| git log --all -- 'images/*/Singularity.*' | head -20 | ||
| ``` | ||
|
|
||
| ## Testing | ||
|
|
||
| ### Unit Tests | ||
|
|
||
| **BATS tests for `oci_cmd`:** | ||
| ```bash | ||
| bats -t scripts/tests/test_oci_cmd.bats | ||
| ``` | ||
|
|
||
| **Python tests for migration script:** | ||
| ```bash | ||
| python -m pytest scripts/tests/test_migrate_to_oci.py -v | ||
| ``` | ||
|
|
||
| ### Integration Testing | ||
|
|
||
| Test the full workflow on a simple container: | ||
|
|
||
| ```bash | ||
| # Create test OCI container | ||
| cd images-oci/ | ||
| datalad containers-add \ | ||
| --url oci:docker://alpine:latest \ | ||
| -i test/test-alpine.oci \ | ||
| test-alpine | ||
|
|
||
| # Verify URLs | ||
| git annex find --not --in datalad --and --not --in web test/test-alpine.oci | ||
|
|
||
| # Build SIF | ||
| cd .. | ||
| datalad run \ | ||
| -m "Build test SIF" \ | ||
| --output images/test/test-alpine.sif \ | ||
| scripts/oci_cmd build images/test/test-alpine.sif images-oci/test/test-alpine.oci/ | ||
|
|
||
| # Test container | ||
| scripts/oci_cmd exec images-oci/test/test-alpine.oci/ echo "Hello from OCI" | ||
| scripts/singularity_cmd exec images/test/test-alpine.sif echo "Hello from SIF" | ||
| ``` | ||
|
|
||
| ## Troubleshooting | ||
|
|
||
| ### OCI container creation fails | ||
|
|
||
| **Issue:** `datalad containers-add` fails with OCI URL | ||
|
|
||
| **Solution:** Ensure you have: | ||
| - DataLad container extension with OCI support | ||
| - Skopeo installed | ||
| - Network access to Docker Hub | ||
|
|
||
| ### Annex files without URLs | ||
|
|
||
| **Issue:** `git annex find --not --in datalad --and --not --in web` returns files | ||
|
|
||
| **Solution:** | ||
| ```bash | ||
| # For each file, register the URL manually | ||
| git annex registerurl <key> <url> | ||
| ``` | ||
|
|
||
| ### SIF build fails | ||
|
|
||
| **Issue:** `scripts/oci_cmd build` fails | ||
|
|
||
| **Solution:** | ||
| - Ensure apptainer/singularity is installed | ||
| - Check disk space (SIF files can be large) | ||
| - Verify OCI directory exists and is valid | ||
|
|
||
| ### Migration script fails mid-process | ||
|
|
||
| **Issue:** Script fails partway through migration | ||
|
|
||
| **Solution:** | ||
| - Use `--skip-failures` flag to continue past failures | ||
| - Check `--log-file` output for specific errors | ||
| - Manually fix failed migrations and re-run | ||
|
|
||
| ## Benefits of OCI-Based Workflow | ||
|
|
||
| 1. **URL Availability** - All container components are available via URLs (no special remotes needed) | ||
| 2. **Reproducibility** - OCI format is standardized and widely supported | ||
| 3. **Flexibility** - Can use either OCI or SIF format depending on needs | ||
| 4. **Better Tracking** - DataLad tracks all steps of container creation | ||
| 5. **Easier Maintenance** - Updates only need to touch OCI layer, SIF can be rebuilt | ||
|
|
||
| ## Future Enhancements | ||
|
|
||
| 1. **Automated Updates** - Script to check for updated Docker images and rebuild | ||
| 2. **Parallel Migration** - Process multiple containers concurrently | ||
| 3. **Rollback Support** - Ability to revert failed migrations | ||
| 4. **CI/CD Integration** - Automated testing of migrated containers | ||
| 5. **Cache Management** - Tools to manage OCI cache and temporary files | ||
|
|
||
| ## References | ||
|
|
||
| - [Original Design Document](use-oci-1.md) | ||
| - [DataLad Container Documentation](https://docs.datalad.org/projects/container/) | ||
| - [Apptainer Documentation](https://apptainer.org/docs/) | ||
| - [OCI Specification](https://github.com/opencontainers/image-spec) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,68 @@ | ||
| Plan to refactor codebase on how we approach creation of singularity/apptainer containers. | ||
|
|
||
| We want to not create them directly from docker images, but rather first rely | ||
| on functionality in https://github.com/datalad/datalad-container/pull/277 | ||
| (skopeo branch of the https://github.com/yarikoptic/datalad-container/ fork) to | ||
| initiate OCI container locally using `datalad containers-add oci:docker://...` | ||
| under `images-oci/` subdataset, under similar path (e.g. | ||
| repronim/repronim-reproin--0.13.1.oci for | ||
| images/repronim/repronim-reproin--0.13.1.sing in this one), registering it to | ||
| be ran with `{img_dspath}/scripts/oci_cmd run` which we are to provide as well. | ||
| E.g. | ||
| datalad containers-add --url oci:docker://bids/aa:v0.2.0 -i bids/bids-aa--0.2.0.oci bids-aa | ||
|
|
||
| under images-oci// subdataset. | ||
yarikoptic marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| While generating such OCI image we need to ensure that either all produced | ||
| files are under annex with URL or directly in git (if text files), e.g. | ||
|
|
||
| git annex find --not --in datalad --and --not --in web bids/bids-aa--0.2.0.oci | ||
|
|
||
| (could be under web directly or via datalad downloader!) | ||
|
|
||
| `scripts/oci_cmd` could be simple for now: | ||
|
|
||
| #!/bin/bash | ||
|
|
||
| apptainer "$@" | ||
|
|
||
| Then, after generation of OCI image, we would need to produce singularity SIF file using | ||
| (assuming that {image} would be the replacement with portion of path to image file like repronim/repronim-reproin--0.13.1) | ||
|
|
||
| datalad run -m "Build SIF image for {image}.sif" --output images/{image}.sif scripts/oci_cmd build images/{image}.sif images-oci/{image}.oci/ | ||
|
|
||
|
|
||
| After all that done and works, we would need to have a migration | ||
| functionality which would produce .sif to replace all images for which we had Singularity* files but without custom commands, rather just basic wrappers. Full list could be obtained using | ||
|
|
||
| git grep -l 'Automagically prepared' images | ||
|
|
||
| and files would look like | ||
|
|
||
| ❯ head images/bids/Singularity.bids-aa--0.2.0 | ||
| # | ||
| # Automagically prepared for ReproNim/containers distribution. | ||
| # See http://github.com/ReproNim/containers for more info | ||
| # | ||
| Bootstrap: docker | ||
| From: bids/aa:v0.2.0 | ||
|
|
||
| so the goal would be to produce OCI image taking that "From:" as pointing to docker hub, in the above example (ran under images-oci/ subdataset). So the command to "containers-add" would be similar to above example: | ||
|
|
||
| datalad containers-add --url oci:docker://bids/aa:v0.2.0 -i bids/bids-aa--0.2.0.oci bids-aa | ||
|
|
||
| and then verifying that all annex files are available from URLs: | ||
|
|
||
| git annex find --not --in datalad --and --not --in web bids/bids-aa--0.2.0.oci | ||
|
|
||
| should come out empty. (so we need a generic helper function to be used here to reuse) | ||
|
|
||
| Original images, and corresponding recipes, like in this case | ||
| images/bids/Singularity.bids-aa--0.2.0 where "From:" was found, and the corresponding image images/bids/bids-aa--0.2.0.sing should be "git rm"ed and committed with an informative message. Path to the image within .datalad.config should be replaced to point to .sif instead of original .sing version. | ||
|
|
||
| While developing, try migration first on some simpler cases like | ||
|
|
||
| images/bids/bids-validator--1.2.3.sing | ||
| images/bids/bids-rshrf--1.0.0.sing | ||
|
|
||
| For migration, add an option to skip failing, and we would need some log file listing those which failed to convert. | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.