-
Notifications
You must be signed in to change notification settings - Fork 12
Description
This issue tracks requirements and feedback needed before implementing Read the Docs & Sphinx. Some necessary changes will impact multiple areas of the repo, so we need alignment on key decisions first.
Package Structure & Documentation Files
Scripts folder location: Currently the scripts folder is bundled within the viscy package (VisCy/viscy/scripts). Should we move it to the root level (VisCy/scripts) instead? This would separate utility scripts from the core package.
Documentation file locations:
data_organization.mdcurrently lives inviscy/data_organization.mdpreprocessing.mdcurrently lives inviscy/preprocessing/preprocessing.md
Docstring Standardization
We already use ruff for line length and import sorting. Should we extend it to enforce docstring standards using lint.pydocstyle? We'd need to choose between:
numpy(numpydoc) conventiongooglestyle convention
The codebase has mixed docstring styles. For example:
- NumPy style
Lines 282 to 329 in a2186d6
class HCSDataModule(LightningDataModule): """ Lightning data module for a preprocessed HCS NGFF Store. Parameters ---------- data_path : str Path to the data store. source_channel : str or Sequence[str] Name(s) of the source channel, e.g. 'Phase'. target_channel : str or Sequence[str] Name(s) of the target channel, e.g. ['Nuclei', 'Membrane']. z_window_size : int Z window size of the 2.5D U-Net, 1 for 2D. split_ratio : float, optional Split ratio of the training subset in the fit stage, e.g. 0.8 means an 80/20 split between training/validation, by default 0.8. batch_size : int, optional Batch size, defaults to 16. num_workers : int, optional Number of data-loading workers, defaults to 8. target_2d : bool, optional Whether the target is 2D (e.g. in a 2.5D model), defaults to False. yx_patch_size : tuple[int, int], optional Patch size in (Y, X), defaults to (256, 256). normalizations : list of MapTransform, optional MONAI dictionary transforms applied to selected channels, defaults to ``[]`` (no normalization). augmentations : list of MapTransform, optional MONAI dictionary transforms applied to the training set, defaults to ``[]`` (no augmentation). caching : bool, optional Whether to decompress all the images and cache the result, will store in `/tmp/$SLURM_JOB_ID/` if available, defaults to False. ground_truth_masks : Path or None, optional Path to the ground truth masks, used in the test stage to compute segmentation metrics, defaults to None. persistent_workers : bool, optional Whether to keep the workers alive between fitting epochs, defaults to False. prefetch_factor : int or None, optional Number of samples loaded in advance by each worker during fitting, defaults to None (2 per PyTorch default). """
- Google style
Lines 99 to 109 in a2186d6
class SlidingWindowDataset(Dataset): """Torch dataset where each element is a window of (C, Z, Y, X) where C=2 (source and target) and Z is ``z_window_size``. :param list[Position] positions: FOVs to include in dataset :param ChannelMap channels: source and target channel names, e.g. ``{'source': 'Phase', 'target': ['Nuclei', 'Membrane']}`` :param int z_window_size: Z window size of the 2.5D U-Net, 1 for 2D :param DictTransform | None transform: a callable that transforms data, defaults to None """
Picking one will be necessary for consistent documentation generation. There's already a relevant issue #133 here, and I lean towards NumPy style docstrings like the rest of the projects.
Examples & Tutorials Organization
Duplicate content: The examples/virtual_staining/ directory contains both Jupyter notebooks and their converted Python scripts. Do we need both formats, or should we keep just one?
Directory structure: Should applications be merged into examples, or do they serve different purposes that warrant separate directories?
There may be some things I've missed so far, feel free to add them / let me know.