Skip to content

Python implementation of the ClarID specification #1

@mrueda

Description

@mrueda

ClarID currently provides a reference implementation through ClarID-Tools, implemented as a CLI in Perl 5.

While this works well as a reference implementation, feedback received during peer review suggested that many users in biomedical data science primarily work in Python-based environments (e.g., notebooks, pipelines, ETL workflows). Providing a Python implementation could lower the barrier for adoption and facilitate integration with existing tools.

Proposed scope

A Python implementation could reproduce the core functionality of the current CLI:

  • Encode ClarID identifiers
  • Decode identifiers
  • Validate identifiers
  • Support both human and stub formats
  • Support both subject and biosample entities

Possible design

  • Python library implementing the ClarID specification
  • Optional CLI wrapper (e.g., argparse / typer)
  • YAML codebook parsing
  • Base62 transformations for stub encoding
  • Batch processing from CSV/TSV

Notes

This would not replace the Perl implementation, which remains the reference implementation, but would provide an additional language implementation to improve usability and integration.

Note: Given that this repository is structured as a Perl distribution (CPAN, CI workflows), it may be preferable to develop a Python implementation in a separate repository (e.g., clarid-python) rather than within this one.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions