Skip to content

[IMPROVEMENT] Improving the readability of the code by adding @overload #1112

@Laurynas-Jagutis

Description

@Laurynas-Jagutis

Some functions in the PGM code base allow multiple different types for a single input parameter, often because of that there are multiple available output parameter types. Currently, our functions use single signatures. We would like to start using @overload where possible, since it greatly improves readability of the code by adding more function signatures to a single function.

Some examples are presented below.

Initial version:

def _create_contents_buffer(shape, dtype, columns: list[AttributeType] | None) -> SingleComponentData | DenseBatchData:
    if columns is None:
        return np.empty(shape=shape, dtype=dtype)

    return {attribute: np.empty(shape=shape, dtype=dtype[attribute]) for attribute in columns}

Suggested solution:

@overload
def _create_contents_buffer(shape, dtype, columns: None) -> SingleArray | DenseBatchArray: ...
@overload
def _create_contents_buffer(
    shape, dtype, columns: list[AttributeType]
) -> SingleColumnarData | DenseBatchColumnarData: ...
def _create_contents_buffer(shape, dtype, columns):
    if columns is None:
        return np.empty(shape=shape, dtype=dtype)

    return {attribute: np.empty(shape=shape, dtype=dtype[attribute]) for attribute in columns}

Initial version:

def _check_sparse_dense(component_data: ComponentData, err_msg_suffixed: str) -> ComponentData:
    if is_sparse(component_data):
        indptr = component_data["indptr"]
        if not isinstance(indptr, np.ndarray):
            raise TypeError(err_msg_suffixed.format(f"Invalid indptr type {type(indptr).__name__}. "))
        sub_data = component_data["data"]
    elif isinstance(component_data, dict) and ("indptr" in component_data or "data" in component_data):
        missing_element = "indptr" if "indptr" not in component_data else "data"
        raise KeyError(err_msg_suffixed.format(f"Missing '{missing_element}' in sparse batch data. "))
    else:
        sub_data = component_data
    return sub_data

Suggested solution:

@overload
def _check_sparse_dense(component_data: SingleArray, err_msg_suffixed: str) -> SingleArray: ...
@overload
def _check_sparse_dense(component_data: DenseBatchArray, err_msg_suffixed: str) -> DenseBatchArray: ...  # type: ignore[overload-cannot-match]
@overload
def _check_sparse_dense(component_data: DenseBatchColumnarData, err_msg_suffixed: str) -> DenseBatchColumnarData: ...  # type: ignore[overload-overlap]
@overload
def _check_sparse_dense(component_data: SparseBatchArray, err_msg_suffixed: str) -> SingleArray: ...  # type: ignore[overload-cannot-match, overload-overlap]
@overload
def _check_sparse_dense(component_data: SingleColumnarData, err_msg_suffixed: str) -> SingleColumnarData: ...  # type: ignore[overload-cannot-match]
@overload
def _check_sparse_dense(component_data: SparseBatchColumnarData, err_msg_suffixed: str) -> SingleColumnarData: ...
def _check_sparse_dense(component_data, err_msg_suffixed):
    if is_sparse(component_data):
        indptr = component_data["indptr"]
        if not isinstance(indptr, np.ndarray):
            raise TypeError(err_msg_suffixed.format(f"Invalid indptr type {type(indptr).__name__}. "))
        sub_data = component_data["data"]
    elif isinstance(component_data, dict) and ("indptr" in component_data or "data" in component_data):
        missing_element = "indptr" if "indptr" not in component_data else "data"
        raise KeyError(err_msg_suffixed.format(f"Missing '{missing_element}' in sparse batch data. "))
    else:
        sub_data = component_data
    return sub_data

If you pick up this issue, you can ignore the #type: ignore comments, since they should be taken care of in #1110. You yourself might need to use such comments to ignore issues that Mypy might raise and that is allowed if #1110 has not been completed yet. More information on why Mypy complains can be found in #1110.

Recommendation on how to start:
Browse through the code base looking for functions that have individual input parameters that can take many types and depending on those types you can see that the type of the return also changes.
Best places to start with are function signatures that have "|" in them, which means that it is a union and can take multiple types, however we also have some aliases that represent a union of multiple types, such as ComponentData. However, keep in mind that even if you find such features in function signatures, it might not be possible to use @overload there.

We expect that you will deliver multiple @overload implementations just as shown in the examples above. Make sure that when you do the changes Mypy and Pytest still pass without any issues.

Metadata

Metadata

Assignees

No one assigned

    Labels

    good first issueIndicates a good issue for first-time contributorsimprovementImprovement on internal implementation

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions