Setup RTD and Documentation Infrastructure

This issue tracks requirements and feedback needed before implementing Read the Docs & Sphinx. Some necessary changes will impact multiple areas of the repo, so we need alignment on key decisions first.

## Package Structure & Documentation Files

**Scripts folder location**: Currently the `scripts` folder is bundled within the `viscy` package (`VisCy/viscy/scripts`). Should we move it to the root level (`VisCy/scripts`) instead? This would separate utility scripts from the core package.

**Documentation file locations**:
- `data_organization.md` currently lives in `viscy/data_organization.md`
- `preprocessing.md` currently lives in `viscy/preprocessing/preprocessing.md`

## Docstring Standardization

We already use `ruff` for line length and import sorting. Should we extend it to enforce [docstring standards](https://docs.astral.sh/ruff/settings/#lint_pydocstyle_convention) using [`lint.pydocstyle`](https://docs.astral.sh/ruff/rules/#pydocstyle-d)? We'd need to choose between:
- `numpy` (numpydoc) convention
- `google` style convention

The codebase has mixed docstring styles. For example:
- NumPy style
 > https://github.com/mehta-lab/VisCy/blob/a2186d65fc9d7b7887288555703ee4f2aa8d7a81/viscy/data/hcs.py#L282-L329
- Google style 
 > https://github.com/mehta-lab/VisCy/blob/a2186d65fc9d7b7887288555703ee4f2aa8d7a81/viscy/data/hcs.py#L99-L109

Picking one will be necessary for consistent documentation generation. There's already a relevant issue #133 here, and I lean towards `NumPy` style docstrings like the rest of the projects.

## Examples & Tutorials Organization

**Duplicate content**: The `examples/virtual_staining/` directory contains both Jupyter notebooks and their converted Python scripts. Do we need both formats, or should we keep just one?

**Directory structure**: Should `applications` be merged into `examples`, or do they serve different purposes that warrant separate directories?

---

There may be some things I've missed so far, feel free to add them / let me know.

	class HCSDataModule(LightningDataModule):
	"""
	Lightning data module for a preprocessed HCS NGFF Store.

	Parameters
	----------
	data_path : str
	Path to the data store.
	source_channel : str or Sequence[str]
	Name(s) of the source channel, e.g. 'Phase'.
	target_channel : str or Sequence[str]
	Name(s) of the target channel, e.g. ['Nuclei', 'Membrane'].
	z_window_size : int
	Z window size of the 2.5D U-Net, 1 for 2D.
	split_ratio : float, optional
	Split ratio of the training subset in the fit stage,
	e.g. 0.8 means an 80/20 split between training/validation,
	by default 0.8.
	batch_size : int, optional
	Batch size, defaults to 16.
	num_workers : int, optional
	Number of data-loading workers, defaults to 8.
	target_2d : bool, optional
	Whether the target is 2D (e.g. in a 2.5D model),
	defaults to False.
	yx_patch_size : tuple[int, int], optional
	Patch size in (Y, X), defaults to (256, 256).
	normalizations : list of MapTransform, optional
	MONAI dictionary transforms applied to selected channels,
	defaults to ``[]`` (no normalization).
	augmentations : list of MapTransform, optional
	MONAI dictionary transforms applied to the training set,
	defaults to ``[]`` (no augmentation).
	caching : bool, optional
	Whether to decompress all the images and cache the result,
	will store in `/tmp/$SLURM_JOB_ID/` if available,
	defaults to False.
	ground_truth_masks : Path or None, optional
	Path to the ground truth masks,
	used in the test stage to compute segmentation metrics,
	defaults to None.
	persistent_workers : bool, optional
	Whether to keep the workers alive between fitting epochs,
	defaults to False.
	prefetch_factor : int or None, optional
	Number of samples loaded in advance by each worker during fitting,
	defaults to None (2 per PyTorch default).
	"""

	class SlidingWindowDataset(Dataset):
	"""Torch dataset where each element is a window of
	(C, Z, Y, X) where C=2 (source and target) and Z is ``z_window_size``.

	:param list[Position] positions: FOVs to include in dataset
	:param ChannelMap channels: source and target channel names,
	e.g. ``{'source': 'Phase', 'target': ['Nuclei', 'Membrane']}``
	:param int z_window_size: Z window size of the 2.5D U-Net, 1 for 2D
	:param DictTransform \| None transform:
	a callable that transforms data, defaults to None
	"""

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Setup RTD and Documentation Infrastructure #288

Package Structure & Documentation Files

Docstring Standardization

Examples & Tutorials Organization

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Setup RTD and Documentation Infrastructure #288

Description

Package Structure & Documentation Files

Docstring Standardization

Examples & Tutorials Organization

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions