-
Notifications
You must be signed in to change notification settings - Fork 9
Connecting units to slots with annotations #2599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Connecting units to slots with annotations #2599
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds storage_units
annotations to hundreds of measurement slots to provide unambiguous unit specifications for data storage. The changes ensure that slots with clearly defined measurement units have consistent UCUM-compatible storage unit annotations.
- Adds
storage_units
annotations to 115+ measurement slots across the schema with unambiguous unit requirements - Updates core schema slots for nitrogen measurements with explicit storage unit specifications
- Documents the systematic approach for adding storage units to slots with single, uncontested preferred units
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
File | Description |
---|---|
src/schema/mixs.yaml | Adds storage_units annotations to measurement slots with clear unit requirements |
src/schema/core.yaml | Adds storage_unit annotations to ammonium_nitrogen, nitrate_nitrogen, and nitrite_nitrogen slots |
assets/yq-for-mixs_subset_modified.txt | Documents yq commands for systematically adding storage_units annotations |
assets/pref-unit-slots-claude-ucum-expanded-curated-with-has-problem.tsv | Reference data showing unit analysis and problem identification |
|
This The query results are in the next comment box. No further action is required.
|
See the This list of historical annotations was used to construct No further action is required.
the class the |
slots with a MIxS
|
|
@@ -404,30 +414,46 @@ enums: | |||
description: The Unified Code for Units of Measure (UCUM) representation of atmosphere. | |||
mV: | |||
description: The Unified Code for Units of Measure (UCUM) representation of millivolt. | |||
PSU: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should removing this PV have a migrator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And all other PV changes.
@samobermiller to help write a migrator for this |
waiting to work on this until additional schema changes mentioned in slack channel by @turbomam can be discussed at next meeting |
…tyvalue-slots-that-only-have-one-clear-cut-mixs---ucum-mapping
Cleaned up test data file by removing obsolete JGI DNA properties that were causing validation failures after recent schema changes: - dna_cont_well: C2 - dna_absorb2: 2.02 - dna_absorb1: 2.02 These properties were removed from the schema in commits 2558ab9 "remove jgi only slots" but remained in the test data causing validation errors. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Recent schema changes added storage_units annotations to multiple slots: - duration - final_concentration - input_mass - lbc_thirty - lbceq - mass - source_concentration - subsurface_depth - temperature - volume Updated test data files to comply with new storage_units constraints: - lbc_thirty and lbceq: changed from mg/kg to [ppm] in both Biosample-possibly-exhaustive.yaml and Database-interleaved.yaml - final_concentration: changed from ug/uL to umol/L in Extraction-proteomics.yaml and ChemicalConversionProcess-digest.yaml Test data for other annotated slots (duration, input_mass, mass, source_concentration, subsurface_depth, temperature, volume) was already compliant with the new storage_units constraints. Resolves unit validation test failures and ensures all test data aligns with the storage_units feature implementation. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
…idation This commit implements the storage_units feature for slots with clear MIXS→UCUM mappings: Schema changes: - Add storage_units annotations to 16 QuantityValue slots in core.yaml: * zinc, manganese, ammonium_nitrogen, nitrate_nitrogen, nitrite_nitrogen: [ppm]|mg/kg|mg/L * lbc_thirty, lbceq: [ppm] * subsurface_depth: m * mass: g * final_concentration: %|mmol/L|umol/L * source_concentration: %|mmol/L * duration: h|min * temperature: Cel * volume: mL|uL * bulk_elect_conductivity: mS/cm * input_mass: g Test data fixes: - Fix unit validation errors caused by new storage_units constraints: * lbc_thirty/lbceq: changed from mg/kg to [ppm] * final_concentration: changed from ug/uL to umol/L * photon_flux: changed from umol/s/m2 to umol/m2/s * tot_org_carb: changed from % to mg/L All tests now pass (25/25) with proper unit validation enforcement.
I don't think it's really possible to assert a unit for compound slots like these
So I have added a new annotation yq '.slots | to_entries | map(select(.value.annotations.preferred_unit and (.value.annotations.storage_units == null) and (.value.annotations.not_measurement_like == null))) | map(.key)' nmdc_schema/nmdc_materialized_patterns.yaml | sort
That yq command has been implemented as |
🔴 Major MIxS Specification Problems:
🟡 Database vs MIxS Discrepancies:
✅ MIxS specifies No NMDC for either of these yet
|
TO DO
One option for dealing with legacy data using units that we just don't want to approve of: a bertron-style key/value slot
DONE
annotations.storage_units
to slots where MIxS specifies a single preferred unitannotations.storage_units
to slots where MIxS specifies multiple preferred unitsannotations.storage_units
values are inUnitEnum
src/data/valid/
and checks the assertedhas_unit
values against theannotations.storage_units
value(s) for that the owning slottitle
s relating a display unit like 'years' for cryptic UCUM units likea
. THis PR is not thorough on that task yet.WON'T DO