[IMPROVEMENT] Improving the readability of the code by adding @overload

Some functions in the PGM code base allow multiple different types for a single input parameter, often because of that there are multiple available output parameter types. Currently, our functions use single signatures. We would like to start using [@overload](https://typing.python.org/en/latest/spec/overload.html) where possible, since it greatly improves readability of the code by adding more function signatures to a single function.

Some examples are presented below.

Initial version:
```
def _create_contents_buffer(shape, dtype, columns: list[AttributeType] | None) -> SingleComponentData | DenseBatchData:
    if columns is None:
        return np.empty(shape=shape, dtype=dtype)

    return {attribute: np.empty(shape=shape, dtype=dtype[attribute]) for attribute in columns}
```
Suggested solution:
```
@overload
def _create_contents_buffer(shape, dtype, columns: None) -> SingleArray | DenseBatchArray: ...
@overload
def _create_contents_buffer(
    shape, dtype, columns: list[AttributeType]
) -> SingleColumnarData | DenseBatchColumnarData: ...
def _create_contents_buffer(shape, dtype, columns):
    if columns is None:
        return np.empty(shape=shape, dtype=dtype)

    return {attribute: np.empty(shape=shape, dtype=dtype[attribute]) for attribute in columns}
```

Initial version:
```
def _check_sparse_dense(component_data: ComponentData, err_msg_suffixed: str) -> ComponentData:
    if is_sparse(component_data):
        indptr = component_data["indptr"]
        if not isinstance(indptr, np.ndarray):
            raise TypeError(err_msg_suffixed.format(f"Invalid indptr type {type(indptr).__name__}. "))
        sub_data = component_data["data"]
    elif isinstance(component_data, dict) and ("indptr" in component_data or "data" in component_data):
        missing_element = "indptr" if "indptr" not in component_data else "data"
        raise KeyError(err_msg_suffixed.format(f"Missing '{missing_element}' in sparse batch data. "))
    else:
        sub_data = component_data
    return sub_data
```
Suggested solution:
```
@overload
def _check_sparse_dense(component_data: SingleArray, err_msg_suffixed: str) -> SingleArray: ...
@overload
def _check_sparse_dense(component_data: DenseBatchArray, err_msg_suffixed: str) -> DenseBatchArray: ...  # type: ignore[overload-cannot-match]
@overload
def _check_sparse_dense(component_data: DenseBatchColumnarData, err_msg_suffixed: str) -> DenseBatchColumnarData: ...  # type: ignore[overload-overlap]
@overload
def _check_sparse_dense(component_data: SparseBatchArray, err_msg_suffixed: str) -> SingleArray: ...  # type: ignore[overload-cannot-match, overload-overlap]
@overload
def _check_sparse_dense(component_data: SingleColumnarData, err_msg_suffixed: str) -> SingleColumnarData: ...  # type: ignore[overload-cannot-match]
@overload
def _check_sparse_dense(component_data: SparseBatchColumnarData, err_msg_suffixed: str) -> SingleColumnarData: ...
def _check_sparse_dense(component_data, err_msg_suffixed):
    if is_sparse(component_data):
        indptr = component_data["indptr"]
        if not isinstance(indptr, np.ndarray):
            raise TypeError(err_msg_suffixed.format(f"Invalid indptr type {type(indptr).__name__}. "))
        sub_data = component_data["data"]
    elif isinstance(component_data, dict) and ("indptr" in component_data or "data" in component_data):
        missing_element = "indptr" if "indptr" not in component_data else "data"
        raise KeyError(err_msg_suffixed.format(f"Missing '{missing_element}' in sparse batch data. "))
    else:
        sub_data = component_data
    return sub_data
```
If you pick up this issue, you can ignore the `#type: ignore` comments, since they should be taken care of in #1110. You yourself might need to use such comments to ignore issues that Mypy might raise and that is allowed if #1110 has not been completed yet. More information on why Mypy complains can be found in #1110.


Recommendation on how to start:
Browse through the code base looking for functions that have individual input parameters that can take many types and depending on those types you can see that the type of the return also changes. 
Best places to start with are function signatures that have "|" in them, which means that it is a union and can take multiple types, however we also have some aliases that represent a union of multiple types, such as `ComponentData`. However, keep in mind that even if you find such features in function signatures, it might not be possible to use @overload there.


We expect that you will deliver multiple @overload implementations just as shown in the examples above. Make sure that when you do the changes Mypy and Pytest still pass without any issues.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[IMPROVEMENT] Improving the readability of the code by adding @overload #1112

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[IMPROVEMENT] Improving the readability of the code by adding @overload #1112

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions