-
Couldn't load subscription status.
- Fork 44
Description
Some functions in the PGM code base allow multiple different types for a single input parameter, often because of that there are multiple available output parameter types. Currently, our functions use single signatures. We would like to start using @overload where possible, since it greatly improves readability of the code by adding more function signatures to a single function.
Some examples are presented below.
Initial version:
def _create_contents_buffer(shape, dtype, columns: list[AttributeType] | None) -> SingleComponentData | DenseBatchData:
if columns is None:
return np.empty(shape=shape, dtype=dtype)
return {attribute: np.empty(shape=shape, dtype=dtype[attribute]) for attribute in columns}
Suggested solution:
@overload
def _create_contents_buffer(shape, dtype, columns: None) -> SingleArray | DenseBatchArray: ...
@overload
def _create_contents_buffer(
shape, dtype, columns: list[AttributeType]
) -> SingleColumnarData | DenseBatchColumnarData: ...
def _create_contents_buffer(shape, dtype, columns):
if columns is None:
return np.empty(shape=shape, dtype=dtype)
return {attribute: np.empty(shape=shape, dtype=dtype[attribute]) for attribute in columns}
Initial version:
def _check_sparse_dense(component_data: ComponentData, err_msg_suffixed: str) -> ComponentData:
if is_sparse(component_data):
indptr = component_data["indptr"]
if not isinstance(indptr, np.ndarray):
raise TypeError(err_msg_suffixed.format(f"Invalid indptr type {type(indptr).__name__}. "))
sub_data = component_data["data"]
elif isinstance(component_data, dict) and ("indptr" in component_data or "data" in component_data):
missing_element = "indptr" if "indptr" not in component_data else "data"
raise KeyError(err_msg_suffixed.format(f"Missing '{missing_element}' in sparse batch data. "))
else:
sub_data = component_data
return sub_data
Suggested solution:
@overload
def _check_sparse_dense(component_data: SingleArray, err_msg_suffixed: str) -> SingleArray: ...
@overload
def _check_sparse_dense(component_data: DenseBatchArray, err_msg_suffixed: str) -> DenseBatchArray: ... # type: ignore[overload-cannot-match]
@overload
def _check_sparse_dense(component_data: DenseBatchColumnarData, err_msg_suffixed: str) -> DenseBatchColumnarData: ... # type: ignore[overload-overlap]
@overload
def _check_sparse_dense(component_data: SparseBatchArray, err_msg_suffixed: str) -> SingleArray: ... # type: ignore[overload-cannot-match, overload-overlap]
@overload
def _check_sparse_dense(component_data: SingleColumnarData, err_msg_suffixed: str) -> SingleColumnarData: ... # type: ignore[overload-cannot-match]
@overload
def _check_sparse_dense(component_data: SparseBatchColumnarData, err_msg_suffixed: str) -> SingleColumnarData: ...
def _check_sparse_dense(component_data, err_msg_suffixed):
if is_sparse(component_data):
indptr = component_data["indptr"]
if not isinstance(indptr, np.ndarray):
raise TypeError(err_msg_suffixed.format(f"Invalid indptr type {type(indptr).__name__}. "))
sub_data = component_data["data"]
elif isinstance(component_data, dict) and ("indptr" in component_data or "data" in component_data):
missing_element = "indptr" if "indptr" not in component_data else "data"
raise KeyError(err_msg_suffixed.format(f"Missing '{missing_element}' in sparse batch data. "))
else:
sub_data = component_data
return sub_data
If you pick up this issue, you can ignore the #type: ignore comments, since they should be taken care of in #1110. You yourself might need to use such comments to ignore issues that Mypy might raise and that is allowed if #1110 has not been completed yet. More information on why Mypy complains can be found in #1110.
Recommendation on how to start:
Browse through the code base looking for functions that have individual input parameters that can take many types and depending on those types you can see that the type of the return also changes.
Best places to start with are function signatures that have "|" in them, which means that it is a union and can take multiple types, however we also have some aliases that represent a union of multiple types, such as ComponentData. However, keep in mind that even if you find such features in function signatures, it might not be possible to use @overload there.
We expect that you will deliver multiple @overload implementations just as shown in the examples above. Make sure that when you do the changes Mypy and Pytest still pass without any issues.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status