Skip to content

_sparse_nanmean is inefficient #1894

Open
@ivirshup

Description

@ivirshup

_sparse_nanmean makes two copies of the data matrix and performs a set index operation on a sparse array. It could be much faster by not doing this things.

Noticed while reviewing #1890.

possible solution
from numba import njit, prange
import numpy as np

@njit(parallel=True)
def nanmean_lowlevel(data, indices, indptr, shape):
    N, M = shape
    sums = np.zeros(N, dtype=np.float64)
    nans = np.zeros(N, dtype=np.int64)
    for i in prange(N):
        start = indptr[i]
        stop = indptr[i+1]
        window = data[start:stop]
        n_nan = np.int64(0)
        i_sum = np.float64(0.)
        for j_val in window:
            if np.isnan(j_val):
                n_nan += 1
            else:
                i_sum += j_val
        sums[i] = i_sum
        nans[i] = n_nan
    sums /= (M - nans)
    return sums

Has more error from dense reference compared to current solution, not sure why. Something about the sums being different.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions