-
Notifications
You must be signed in to change notification settings - Fork 113
Open
Description
I ran into issues when a subset of my sample data points only contain ONE unique value. How should we handle such an exception?
The error message basically suggests a NaN value for probability (caused by division by zero). I tried to turn this into a uniform distribution, but it caused subsequent issue after a cut the right side contains no values. I think this violates the principle of the RRCF algo. Do we have better way of resolving such cases?
File "<ipython-input-2-b3a957a401e5>", line 139, in <listcomp>
rrcf.RCTree(x[ix], index_labels=ix) for ix in ixs]
File "C:\ProgramData\Anaconda3\lib\site-packages\rrcf-0.4.3-py3.8.egg\rrcf\rrcf.py", line 106, in __init__
self._mktree(X, S, N, I, parent=self)
File "C:\ProgramData\Anaconda3\lib\site-packages\rrcf-0.4.3-py3.8.egg\rrcf\rrcf.py", line 177, in _mktree
S1, S2, branch = self._cut(X, S, parent=parent, side=side)
File "C:\ProgramData\Anaconda3\lib\site-packages\rrcf-0.4.3-py3.8.egg\rrcf\rrcf.py", line 159, in _cut
q = self.rng.choice(self.ndim, p=l)
File "mtrand.pyx", line 928, in numpy.random.mtrand.RandomState.choice
ValueError: probabilities contain NaN
Metadata
Metadata
Assignees
Labels
No labels