Add DatasetType="project" and rework existing "layout" example into a proper BIDS dataset #1972

yarikoptic · 2024-10-29T21:01:22Z

This PR was initially submitted as #1861 but I made a mistake to combine it with a discussion of transformations of existing projects' layouts into such BIDS project dataset. Please refer to that PR for examples but otherwise let's concentrate here on the discussion of this specific proposed change.

Rationale 1 (major): BIDS standard already provides reasonable structure to formalize organization of various components of a neuroscientific data project: where to place code, original (source) data, derivative data, README, CHANGES. Many projects (e.g. nipoppy, YODA, etc) propose similar and often might be even "inspired" templates . If we explicitly allow for BIDS standard to prescribe study level organization, IMHO it would help many people and projects decide on how to organize their studies/projects.
- See Add DatasetType="project" and rework existing "layout" example into a proper BIDS dataset #1861 on some examples and initial discussions on possible "transformations". But this PR has merit regardless whether individual existing projects adopt it partially or fully.
Rationale 2: IMHO BIDS standard should describe only what standard prescribe and not recommend some potential "non-standardized" layouts. That is why I "reworked" that example into a legitimate BIDS dataset merely by adding dataset_description.json.

TODOs:

provided more relation to existing approaches (mostly by @snastase in Add DatasetType="project" and rework existing "layout" example into a proper BIDS dataset #1861)
crafted example for bids-examples : Add examples with DatasetType = "project" bids-examples#451
- ensure bids-validator with modified schema passes its validation

This reverts commit a3c12f8 where I have tried to introduce it in bids-standard#1741 but it required a little more of further detailing.

codecov · 2025-01-23T02:44:24Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.45%. Comparing base (f0e14a2) to head (fb4f5a4).

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #1972   +/-   ##
=======================================
  Coverage   82.45%   82.45%           
=======================================
  Files          17       17           
  Lines        1499     1499           
=======================================
  Hits         1236     1236           
  Misses        263      263

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

yarikoptic · 2025-03-01T01:41:16Z

FWIW, we conversed with @effigies and he brought up an interesting argument, although IMHO not contradicting this one per se, is that ATM any BIDS dataset (raw or derivative) which already contains some subdatasets under derivatives/ could be considered to be a "project" BIDS dataset. Indeed it is true. But IMHO it is just a legacy side-effect of BIDS recommendation to stick derivatives under derivatives/ which somewhat invalidate the claim that the BIDS dataset as a whole is a "raw" dataset.

edit: related linked below is #2103 highlighting the same situation with "raw" dataset containing "derivatives/"

yarikoptic · 2025-04-23T14:55:40Z

@effigies I wonder if we should extend DatasetType to become a list or replace or compliment with DatasetTypes which could then indeed be just "project" when containing other derivative bids-datasets or sourcedata, or ["raw", "project"] when also a raw and project, or could be also ["derivative", "project"] when derivative but points to sourcedata/ original "raw" ... ?

effigies · 2025-04-25T12:56:23Z

I'm skeptical of that need. I would expect your ["raw", "project"] to look like:

project/
  dataset_description.json  # "DatasetType": "project"
  rawdata/
    dataset_description.json  # "DatasetType": "raw"
  derivatives/
    preproc/
      dataset_description.json  # "DatasetType": "derivative"

And ["derivative", "project"]:

project/
  dataset_description.json  # "DatasetType": "project"
  sourcedata/
    dataset_description.json  # "DatasetType": "raw"
  rawdata/
    dataset_description.json  # "DatasetType": "derivative"
  deriv/
    analysis/
      dataset_description.json  # "DatasetType": "derivative"

Subdatasets should be validatable BIDS datasets in their own right, avoiding the need for a top-level dataset_description.json to modify how they are intended to be validated.

effigies · 2025-04-25T13:07:38Z

I think this overall needs more specification. What are valid directories in a project-type dataset? We should add them to https://github.com/bids-standard/bids-specification/blob/master/src/schema/rules/directories.yaml.

I think a project dataset is barely worth specifying if we don't validate at least the raw data subdataset. Possibly we should have rules for indicating where validators should look for subdatasets.

In OpenNeuroDerivatives, we use sourcedata/* as a mixture of BIDS and non-BIDS (FreeSurfer) inputs. Nipoppy uses derivatives/<pipeline>/<version>/<output> as derivative datasets; presumably this could also be a mixture of BIDS and non-BIDS.

yarikoptic · 2025-04-25T20:25:27Z

yet to "process" but a quick side idea inspired by #1928 --- I wonder if there is a hierarchy here: project (everything common) -> raw (current default, requires having sub- folder(s)) -> derivative (more stuff could be added), as every next level adds capabilities but includes all of the prior one as derivative could include raw in it? or we have already something which invalidates that?

yarikoptic added the opinions wanted Please read and offer your opinion on this matter label Oct 29, 2024

yarikoptic requested a review from effigies October 29, 2024 21:01

yarikoptic requested review from erdalkaraca and DimitriPapadopoulos as code owners October 29, 2024 21:01

yarikoptic requested a review from tsalo October 29, 2024 21:01

yarikoptic changed the title ~~Enh project datasettype~~ Add DatasetType="project" and rework existing "layout" example into a proper BIDS dataset Oct 30, 2024

yarikoptic added 2 commits January 22, 2025 21:40

Add the notion that example layout can in fact be a valid BIDS dataset

009c8b6

This reverts commit a3c12f8 where I have tried to introduce it in bids-standard#1741 but it required a little more of further detailing.

Move and extend description and definition of DatasetType "project"

fb4f5a4

yarikoptic force-pushed the enh-project-datasettype branch from d0d5c37 to fb4f5a4 Compare January 23, 2025 02:40

This was referenced Apr 16, 2025

DOC: derivatives datasets vs derivatives data #2103

Open

[ENH] Formalize presence of optional docs/ folder #2104

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add DatasetType="project" and rework existing "layout" example into a proper BIDS dataset #1972

Add DatasetType="project" and rework existing "layout" example into a proper BIDS dataset #1972

yarikoptic commented Oct 29, 2024

codecov bot commented Jan 23, 2025

yarikoptic commented Mar 1, 2025 •

edited

Loading

yarikoptic commented Apr 23, 2025

effigies commented Apr 25, 2025

effigies commented Apr 25, 2025

yarikoptic commented Apr 25, 2025

Add DatasetType="project" and rework existing "layout" example into a proper BIDS dataset #1972

Are you sure you want to change the base?

Add DatasetType="project" and rework existing "layout" example into a proper BIDS dataset #1972

Conversation

yarikoptic commented Oct 29, 2024

codecov bot commented Jan 23, 2025

Codecov Report

yarikoptic commented Mar 1, 2025 • edited Loading

yarikoptic commented Apr 23, 2025

effigies commented Apr 25, 2025

effigies commented Apr 25, 2025

yarikoptic commented Apr 25, 2025

yarikoptic commented Mar 1, 2025 •

edited

Loading