Skip to content

Add DatasetType="project" and rework existing "layout" example into a proper BIDS dataset #1972

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

yarikoptic
Copy link
Collaborator

This PR was initially submitted as #1861 but I made a mistake to combine it with a discussion of transformations of existing projects' layouts into such BIDS project dataset. Please refer to that PR for examples but otherwise let's concentrate here on the discussion of this specific proposed change.

  • Rationale 1 (major): BIDS standard already provides reasonable structure to formalize organization of various components of a neuroscientific data project: where to place code, original (source) data, derivative data, README, CHANGES. Many projects (e.g. nipoppy, YODA, etc) propose similar and often might be even "inspired" templates . If we explicitly allow for BIDS standard to prescribe study level organization, IMHO it would help many people and projects decide on how to organize their studies/projects.
  • Rationale 2: IMHO BIDS standard should describe only what standard prescribe and not recommend some potential "non-standardized" layouts. That is why I "reworked" that example into a legitimate BIDS dataset merely by adding dataset_description.json.

TODOs:

@yarikoptic yarikoptic added the opinions wanted Please read and offer your opinion on this matter label Oct 29, 2024
@yarikoptic yarikoptic requested a review from effigies October 29, 2024 21:01
@yarikoptic yarikoptic requested a review from tsalo October 29, 2024 21:01
@yarikoptic yarikoptic changed the title Enh project datasettype Add DatasetType="project" and rework existing "layout" example into a proper BIDS dataset Oct 30, 2024
@yarikoptic yarikoptic force-pushed the enh-project-datasettype branch from d0d5c37 to fb4f5a4 Compare January 23, 2025 02:40
Copy link

codecov bot commented Jan 23, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.45%. Comparing base (f0e14a2) to head (fb4f5a4).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1972   +/-   ##
=======================================
  Coverage   82.45%   82.45%           
=======================================
  Files          17       17           
  Lines        1499     1499           
=======================================
  Hits         1236     1236           
  Misses        263      263           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@yarikoptic
Copy link
Collaborator Author

yarikoptic commented Mar 1, 2025

FWIW, we conversed with @effigies and he brought up an interesting argument, although IMHO not contradicting this one per se, is that ATM any BIDS dataset (raw or derivative) which already contains some subdatasets under derivatives/ could be considered to be a "project" BIDS dataset. Indeed it is true. But IMHO it is just a legacy side-effect of BIDS recommendation to stick derivatives under derivatives/ which somewhat invalidate the claim that the BIDS dataset as a whole is a "raw" dataset.

edit: related linked below is #2103 highlighting the same situation with "raw" dataset containing "derivatives/"

@yarikoptic
Copy link
Collaborator Author

@effigies I wonder if we should extend DatasetType to become a list or replace or compliment with DatasetTypes which could then indeed be just "project" when containing other derivative bids-datasets or sourcedata, or ["raw", "project"] when also a raw and project, or could be also ["derivative", "project"] when derivative but points to sourcedata/ original "raw" ... ?

@effigies
Copy link
Collaborator

I'm skeptical of that need. I would expect your ["raw", "project"] to look like:

project/
  dataset_description.json  # "DatasetType": "project"
  rawdata/
    dataset_description.json  # "DatasetType": "raw"
  derivatives/
    preproc/
      dataset_description.json  # "DatasetType": "derivative"

And ["derivative", "project"]:

project/
  dataset_description.json  # "DatasetType": "project"
  sourcedata/
    dataset_description.json  # "DatasetType": "raw"
  rawdata/
    dataset_description.json  # "DatasetType": "derivative"
  deriv/
    analysis/
      dataset_description.json  # "DatasetType": "derivative"

Subdatasets should be validatable BIDS datasets in their own right, avoiding the need for a top-level dataset_description.json to modify how they are intended to be validated.

@effigies
Copy link
Collaborator

I think this overall needs more specification. What are valid directories in a project-type dataset? We should add them to https://github.com/bids-standard/bids-specification/blob/master/src/schema/rules/directories.yaml.

I think a project dataset is barely worth specifying if we don't validate at least the raw data subdataset. Possibly we should have rules for indicating where validators should look for subdatasets.

In OpenNeuroDerivatives, we use sourcedata/* as a mixture of BIDS and non-BIDS (FreeSurfer) inputs. Nipoppy uses derivatives/<pipeline>/<version>/<output> as derivative datasets; presumably this could also be a mixture of BIDS and non-BIDS.

@yarikoptic
Copy link
Collaborator Author

yet to "process" but a quick side idea inspired by #1928 --- I wonder if there is a hierarchy here: project (everything common) -> raw (current default, requires having sub- folder(s)) -> derivative (more stuff could be added), as every next level adds capabilities but includes all of the prior one as derivative could include raw in it? or we have already something which invalidates that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
opinions wanted Please read and offer your opinion on this matter
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants