Add option to resample features at nodes without replacement

Hello, thanks for the nice package.

I was working on an application where I wanted perfect prediction in a classification task and found that I was unable to do that with `partial_frac = 1.0`, which I did not expect. After some investigation it appears that instances are sampled with repetition when constructing forests. As a result, though N samples are included in each individual tree fit, they almost always include duplicates and are missing other values. See e.g.:

https://github.com/JuliaAI/DecisionTree.jl/blob/master/src/regression/main.jl#L104

```
julia> rand(1:5, 5)
5-element Vector{Int64}:
 5
 5
 2
 2
 3
 ```

I think it would be preferable if sampling was performed without repetition, ensuring that the `partial_frac = 1.0` limit is exact. I don't know if this is the standard convention for random forests, though.

 I would be happy to contribute a PR if it's agreed that non-repeated sampling is preferred.
 
 Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add option to resample features at nodes without replacement #192

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add option to resample features at nodes without replacement #192

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions