Skip to content

Some minor differences in random forest implementations #160

@tecosaur

Description

@tecosaur

I've been comparing some random forest implementations recently (https://github.com/tecosaur/TreeComparison), one of the results of which is #159, but I also have some other information which may be of interest.

For starters, here's the colour coding I use:
image

Error rates mostly converged among the different implementations I tested, however sometimes ranger does a little bit better:
image

image

Precision-recall and ROC curves generally look near-identical, as they should.
image

I've also noticed some larger differences in the depth and size of the random trees created. Across a number of datasets DecisionTrees.jl and randomForest produce narrower/deeper trees than ranger and sklearn.

image

image

image

image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions