Skip to content

Using multiple classifier visualizers #829

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
amueller opened this issue May 2, 2019 · 4 comments
Closed

Using multiple classifier visualizers #829

amueller opened this issue May 2, 2019 · 4 comments
Labels
type: question more information is required

Comments

@amueller
Copy link

amueller commented May 2, 2019

I feel like I'm still missing something wrt the interface.
It looks like PrecisionRecallCurve and ROCAUC fit the model before plotting, and the convenience function does so as well.
How would I plot both of them for a given classifier?

Thanks!

@bbengfort bbengfort added the type: question more information is required label May 3, 2019
@bbengfort
Copy link
Member

This is still one of our primary pain points that we're trying to resolve (e.g. #297, #623, #498, visual pipelines, model reports, etc.) and we haven't quite figured it out yet, but this is absolutely on our radar to handle soon! Many of our visualizers do require access to the training data in order to make visualization decisions and we haven't had the opportunity to dive in depth into the check_is_fitted logic to see how that would work for visualizers.

There are a couple of workarounds that I do in practice, though they tend to be about specific visualizers. For example, in the case of PrecisionRecallCurve and ROCAUC - the PR curve needs access to fit, and ROCAUC does not, therefore you could simply bypass fit and go directly to score e.g.:

_, axes = plot.subplots(ncols=2)

prcurve = PrecisionRecallCurve(clf, ax=axes[0])
rocauc = ROCAUC(clf, ax=axes[1])

prcurve.fit(X_train, y_train)
prcurve.score(X_test, y_test)
rocauc.score(X_test, y_test)

prcurve.finalize()
rocauc.finalize()
plt.show()

Of course, this requires some knowledge of what fit in each visualizer does, so that's not ideal, nor a long term solution. Another hack that I use fairly often is to just create a fitted model wrapper (and yes, this is not another long term solution):

class FittedEstimator(object):

    def __init__(self, model):
        self.model = model

    def fit(self, X, y):
        return self

    def predict(self, X):
        return self.model.predict(X)

    def score(self, X, y):
        return self.model.score(X, y)

    def get_params(self):
        return self.model.get_params()

But potentially they may be helpful in the interim!

Our ideal solutions are:

  1. Visualizers check if the model is fitted before fitting it using check_is_fitted
  2. Visual pipelines compose multiple visualizers for one model evaluation

However, if you think that this is critical, we could add an is_fitted=True flag onto the ModelVisualizers and quick methods as a temporary solution.

@amueller
Copy link
Author

amueller commented May 3, 2019

Thanks for your input. Can you say why the PrecisionRecallCurve needs access to fit? I guess the current behavior is just not very coherent with my mental model of plotting. I don't see why you decided to include fitting into the plotting process in the first place. I'm trying to see if I can reuse some of yellowbrick for dabl but it seems easier to rewrite everything given the limitations of the API.

@amueller
Copy link
Author

amueller commented May 3, 2019

The interface I would use for any plotting function would be fitted_estimator, X_val, y_val. The main use-case I care about it not so much computing multiple metrics (though that's an important one), it's just plotting anything on a fitted estimator. I don't see why you would decide on the visualization before the fitting, or why you would decide to use a single visualizer.

@amueller
Copy link
Author

amueller commented May 3, 2019

I think this can be closed as a duplicate of #297. Thanks!

@amueller amueller closed this as completed May 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: question more information is required
Projects
None yet
Development

No branches or pull requests

2 participants