Skip to content

RCTree cannot handle when the data consists of only one unique value #88

@kongwilson

Description

@kongwilson

I ran into issues when a subset of my sample data points only contain ONE unique value. How should we handle such an exception?

The error message basically suggests a NaN value for probability (caused by division by zero). I tried to turn this into a uniform distribution, but it caused subsequent issue after a cut the right side contains no values. I think this violates the principle of the RRCF algo. Do we have better way of resolving such cases?

File "<ipython-input-2-b3a957a401e5>", line 139, in <listcomp>
    rrcf.RCTree(x[ix], index_labels=ix) for ix in ixs]
  File "C:\ProgramData\Anaconda3\lib\site-packages\rrcf-0.4.3-py3.8.egg\rrcf\rrcf.py", line 106, in __init__
    self._mktree(X, S, N, I, parent=self)
  File "C:\ProgramData\Anaconda3\lib\site-packages\rrcf-0.4.3-py3.8.egg\rrcf\rrcf.py", line 177, in _mktree
    S1, S2, branch = self._cut(X, S, parent=parent, side=side)
  File "C:\ProgramData\Anaconda3\lib\site-packages\rrcf-0.4.3-py3.8.egg\rrcf\rrcf.py", line 159, in _cut
    q = self.rng.choice(self.ndim, p=l)
  File "mtrand.pyx", line 928, in numpy.random.mtrand.RandomState.choice
ValueError: probabilities contain NaN

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions