ClarID currently provides a reference implementation through ClarID-Tools, implemented as a CLI in Perl 5.
While this works well as a reference implementation, feedback received during peer review suggested that many users in biomedical data science primarily work in Python-based environments (e.g., notebooks, pipelines, ETL workflows). Providing a Python implementation could lower the barrier for adoption and facilitate integration with existing tools.
Proposed scope
A Python implementation could reproduce the core functionality of the current CLI:
- Encode ClarID identifiers
- Decode identifiers
- Validate identifiers
- Support both human and stub formats
- Support both subject and biosample entities
Possible design
- Python library implementing the ClarID specification
- Optional CLI wrapper (e.g.,
argparse / typer)
- YAML codebook parsing
- Base62 transformations for stub encoding
- Batch processing from CSV/TSV
Notes
This would not replace the Perl implementation, which remains the reference implementation, but would provide an additional language implementation to improve usability and integration.
Note: Given that this repository is structured as a Perl distribution (CPAN, CI workflows), it may be preferable to develop a Python implementation in a separate repository (e.g., clarid-python) rather than within this one.
ClarID currently provides a reference implementation through ClarID-Tools, implemented as a CLI in Perl 5.
While this works well as a reference implementation, feedback received during peer review suggested that many users in biomedical data science primarily work in Python-based environments (e.g., notebooks, pipelines, ETL workflows). Providing a Python implementation could lower the barrier for adoption and facilitate integration with existing tools.
Proposed scope
A Python implementation could reproduce the core functionality of the current CLI:
Possible design
argparse/typer)Notes
This would not replace the Perl implementation, which remains the reference implementation, but would provide an additional language implementation to improve usability and integration.