Skip to content

Setup RTD and Documentation Infrastructure #288

@srivarra

Description

@srivarra

This issue tracks requirements and feedback needed before implementing Read the Docs & Sphinx. Some necessary changes will impact multiple areas of the repo, so we need alignment on key decisions first.

Package Structure & Documentation Files

Scripts folder location: Currently the scripts folder is bundled within the viscy package (VisCy/viscy/scripts). Should we move it to the root level (VisCy/scripts) instead? This would separate utility scripts from the core package.

Documentation file locations:

  • data_organization.md currently lives in viscy/data_organization.md
  • preprocessing.md currently lives in viscy/preprocessing/preprocessing.md

Docstring Standardization

We already use ruff for line length and import sorting. Should we extend it to enforce docstring standards using lint.pydocstyle? We'd need to choose between:

  • numpy (numpydoc) convention
  • google style convention

The codebase has mixed docstring styles. For example:

  • NumPy style

VisCy/viscy/data/hcs.py

Lines 282 to 329 in a2186d6

class HCSDataModule(LightningDataModule):
"""
Lightning data module for a preprocessed HCS NGFF Store.
Parameters
----------
data_path : str
Path to the data store.
source_channel : str or Sequence[str]
Name(s) of the source channel, e.g. 'Phase'.
target_channel : str or Sequence[str]
Name(s) of the target channel, e.g. ['Nuclei', 'Membrane'].
z_window_size : int
Z window size of the 2.5D U-Net, 1 for 2D.
split_ratio : float, optional
Split ratio of the training subset in the fit stage,
e.g. 0.8 means an 80/20 split between training/validation,
by default 0.8.
batch_size : int, optional
Batch size, defaults to 16.
num_workers : int, optional
Number of data-loading workers, defaults to 8.
target_2d : bool, optional
Whether the target is 2D (e.g. in a 2.5D model),
defaults to False.
yx_patch_size : tuple[int, int], optional
Patch size in (Y, X), defaults to (256, 256).
normalizations : list of MapTransform, optional
MONAI dictionary transforms applied to selected channels,
defaults to ``[]`` (no normalization).
augmentations : list of MapTransform, optional
MONAI dictionary transforms applied to the training set,
defaults to ``[]`` (no augmentation).
caching : bool, optional
Whether to decompress all the images and cache the result,
will store in `/tmp/$SLURM_JOB_ID/` if available,
defaults to False.
ground_truth_masks : Path or None, optional
Path to the ground truth masks,
used in the test stage to compute segmentation metrics,
defaults to None.
persistent_workers : bool, optional
Whether to keep the workers alive between fitting epochs,
defaults to False.
prefetch_factor : int or None, optional
Number of samples loaded in advance by each worker during fitting,
defaults to None (2 per PyTorch default).
"""

  • Google style

VisCy/viscy/data/hcs.py

Lines 99 to 109 in a2186d6

class SlidingWindowDataset(Dataset):
"""Torch dataset where each element is a window of
(C, Z, Y, X) where C=2 (source and target) and Z is ``z_window_size``.
:param list[Position] positions: FOVs to include in dataset
:param ChannelMap channels: source and target channel names,
e.g. ``{'source': 'Phase', 'target': ['Nuclei', 'Membrane']}``
:param int z_window_size: Z window size of the 2.5D U-Net, 1 for 2D
:param DictTransform | None transform:
a callable that transforms data, defaults to None
"""

Picking one will be necessary for consistent documentation generation. There's already a relevant issue #133 here, and I lean towards NumPy style docstrings like the rest of the projects.

Examples & Tutorials Organization

Duplicate content: The examples/virtual_staining/ directory contains both Jupyter notebooks and their converted Python scripts. Do we need both formats, or should we keep just one?

Directory structure: Should applications be merged into examples, or do they serve different purposes that warrant separate directories?


There may be some things I've missed so far, feel free to add them / let me know.

Sub-issues

Metadata

Metadata

Assignees

Labels

documentationImprovements or additions to documentationmaintenanceMaintenance work

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions