Skip to content

_sparse_nanmean is inefficient #1894

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ivirshup opened this issue Jun 22, 2021 · 0 comments · May be fixed by #3570
Open

_sparse_nanmean is inefficient #1894

ivirshup opened this issue Jun 22, 2021 · 0 comments · May be fixed by #3570
Labels
Area – Performance 🐌 good first issue easy first issue to get started in OSS community contribution!

Comments

@ivirshup
Copy link
Member

ivirshup commented Jun 22, 2021

_sparse_nanmean makes two copies of the data matrix and performs a set index operation on a sparse array. It could be much faster by not doing this things.

Noticed while reviewing #1890.

possible solution
from numba import njit, prange
import numpy as np

@njit(parallel=True)
def nanmean_lowlevel(data, indices, indptr, shape):
    N, M = shape
    sums = np.zeros(N, dtype=np.float64)
    nans = np.zeros(N, dtype=np.int64)
    for i in prange(N):
        start = indptr[i]
        stop = indptr[i+1]
        window = data[start:stop]
        n_nan = np.int64(0)
        i_sum = np.float64(0.)
        for j_val in window:
            if np.isnan(j_val):
                n_nan += 1
            else:
                i_sum += j_val
        sums[i] = i_sum
        nans[i] = n_nan
    sums /= (M - nans)
    return sums

Has more error from dense reference compared to current solution, not sure why. Something about the sums being different.

@ivirshup ivirshup added enhancement Area – Performance 🐌 good first issue easy first issue to get started in OSS community contribution! labels Jun 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area – Performance 🐌 good first issue easy first issue to get started in OSS community contribution!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants