Skip to content

Conversation

jpmckinney
Copy link
Member

@jpmckinney jpmckinney commented Dec 15, 2024

Notes:

  • get_schema_fields() can gain 5-15% performance by supporting only exactly what OCDS uses.
  • get_schema_fields() uses less than one order of magnitude more memory than libcove (to be expected as it returns the schema of each field).

Expected warnings when creating the versioned release schema in manage.py:

… name, deprecated_self, pattern, merge_by_id to Field.

- Rename definition_path to definition
- Remove support for null schema, which are invalid JSON Schema and prohibited by OCDS
- Remove definition_pointer, definition_pointer_components, definition_path_components from Field
- Remove support for adding fields to Field.__dict__

test: Add benchmark and regression tests. Add manage.py script to generate regression scenarios.
@jpmckinney
Copy link
Member Author

Here's the version of get_schema_fields that supported only what OCDS uses:

    multilingual = set()
    nonmultilingual = []

    if definitions := schema.get('definitions'):
        for name, subschema in definitions.items():
            # Not all definitions set ``properties``, e.g. ``FiscalBreakdownFieldMapping`` in budget_and_spend.
            yield from get_schema_fields(subschema, pointer=f'/definitions/{name}', definition=name)

    if pattern_properties := schema.get('patternProperties'):
        for pattern, subschema in pattern_properties.items():
            # The pattern might have an extra set of parentheses (OCDS 1.1). Assumes the final character is $.
            for offset in (2, 1):
                end = -LANGUAGE_CODE_SUFFIX_LEN - offset
                # The pattern must be anchored and the suffix must occur at the end.
                if (
                    pattern[end:-offset] == LANGUAGE_CODE_SUFFIX
                    and pattern[:offset] == '^('[:offset]
                    and pattern[-offset:] == ')$'[-offset:]
                ):
                    multilingual.add(pattern[offset:end])
                    break
            # Set ``multilingual`` on corresponding ``properties``, instead of yielding these ``patternProperties``.
            else:
                # ``patternProperties`` never sets ``patternProperties``, ``properties``, ``items`` or ``oneOf``.
                deprecated_self = _deprecated(subschema)
                nonmultilingual.append(
                    Field(
                        name=pattern,
                        schema=subschema,
                        pointer=f'{pointer}/patternProperties/{pattern}',
                        path_components=(*path_components, pattern),
                        definition=definition,
                        deprecated_self=deprecated_self,
                        deprecated=deprecated or deprecated_self,
                        pattern=True,
                    )
                )

    if properties := schema.get('properties'):
        for name, subschema in properties.items():
            prop_pointer = f'{pointer}/properties/{name}'
            prop_path_components = (*path_components, name)
            prop_deprecated_self = _deprecated(subschema)
            prop_deprecated = deprecated or prop_deprecated_self
            prop_whole_list_merge = whole_list_merge or subschema.get('wholeListMerge', False)

            yield Field(
                name=name,
                schema=subschema,
                pointer=prop_pointer,
                path_components=prop_path_components,
                definition=definition,
                deprecated_self=prop_deprecated_self,
                deprecated=prop_deprecated,
                multilingual=name in multilingual,
                required=name in schema.get('required', []),
                merge_by_id=name == 'id' and array and not prop_whole_list_merge,
            )

            # If an extension removes a property.
            if subschema is None:
                continue

            # This guard isn't necessary, but it improves performance.
            if 'properties' in subschema or 'patternProperties' in subschema:
                yield from get_schema_fields(
                    subschema, prop_pointer, prop_path_components, definition, prop_deprecated, prop_whole_list_merge
                )

            # To date, ``definitions``, ``patternProperties`` and ``items`` never set ``items``.
            if (items := subschema.get('items')) and ('properties' in items or 'patternProperties' in items):
                yield from get_schema_fields(
                    items,
                    f'{prop_pointer}/items',
                    prop_path_components,
                    definition,
                    prop_deprecated,
                    prop_whole_list_merge,
                    array=True,
                )

            # To date, ``definitions``, ``patternProperties`` and ``items`` never set ``oneOf``.
            if one_ofs := subschema.get('oneOf'):
                for i, one_of in enumerate(one_ofs):
                    # To date, ``oneOf`` never sets ``patternProperties`` or ``properties``.
                    if (items := one_of.get('items')) and ('properties' in items or 'patternProperties' in items):
                        yield from get_schema_fields(
                            items,
                            f'{prop_pointer}/anyOf/{i}/items',
                            prop_path_components,
                            definition,
                            prop_deprecated,
                            prop_whole_list_merge,  # To date, ``oneOf`` never sets ``wholeListMerge``
                            array=True,
                        )

    # Yield ``patternProperties`` last, to be interpreted in the context of ``properties``.
    for field in nonmultilingual:
        yield field

@coveralls
Copy link

coveralls commented Dec 15, 2024

Coverage Status

coverage: 98.212% (+0.8%) from 97.458%
when pulling 8e26a41 on get-schema-fields
into b026e18 on main.

@jpmckinney jpmckinney merged commit b753cab into main Dec 15, 2024
28 checks passed
@jpmckinney jpmckinney deleted the get-schema-fields branch December 15, 2024 06:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants