-
Couldn't load subscription status.
- Fork 6
Description
Hi, as far as I understand, Datascope is compatible with any scikit-learn pipeline. I'm using PyTorch and skorch (library that wraps PyTorch) to make my classifier scikit-learn compatible.
I'm currently getting the following error when trying to compute the score:
ValueError Traceback (most recent call last)
[<ipython-input-49-2e03ddd68d36>](https://localhost:8080/#) in <module>()
----> 1 importances.score(test_data, test_labels)
3 frames
[/usr/local/lib/python3.7/dist-packages/datascope-0.0.3-py3.7-linux-x86_64.egg/datascope/importance/importance.py](https://localhost:8080/#) in score(self, X, y, **kwargs)
38 if isinstance(y, DataFrame):
39 y = y.values
---> 40 return self._score(X, y, **kwargs)
[/usr/local/lib/python3.7/dist-packages/datascope-0.0.3-py3.7-linux-x86_64.egg/datascope/importance/shapley.py](https://localhost:8080/#) in _score(self, X, y, **kwargs)
285 units = np.delete(units, np.where(units == -1))
286 world = kwargs.get("world", np.zeros_like(units, dtype=int))
--> 287 return self._shapley(self.X, self.y, X, y, self.provenance, units, world)
288
289 def _shapley(
[/usr/local/lib/python3.7/dist-packages/datascope-0.0.3-py3.7-linux-x86_64.egg/datascope/importance/shapley.py](https://localhost:8080/#) in _shapley(self, X, y, X_test, y_test, provenance, units, world)
314 )
315 elif self.method == ImportanceMethod.NEIGHBOR:
--> 316 return self._shapley_neighbor(X, y, X_test, y_test, provenance, units, world, self.nn_k, self.nn_distance)
317 else:
318 raise ValueError("Unknown method '%s'." % self.method)
[/usr/local/lib/python3.7/dist-packages/datascope-0.0.3-py3.7-linux-x86_64.egg/datascope/importance/shapley.py](https://localhost:8080/#) in _shapley_neighbor(self, X, y, X_test, y_test, provenance, units, world, k, distance)
507 assert isinstance(X_test, spmatrix)
508 X_test = X_test.todense()
--> 509 distances = distance(X, X_test)
510
511 # Compute the utilitiy values between training and test labels.
sklearn/metrics/_dist_metrics.pyx in sklearn.metrics._dist_metrics.DistanceMetric.pairwise()
ValueError: Buffer has wrong number of dimensions (expected 2, got 4)
Here's a snippet of my code:
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score
net = reset_model(seed = 0) # gives scikit-learn compatible skorch model
pipeline = Pipeline([("model", net)])
pipeline.fit(train_dataset, train_labels)
y_pred = pipeline.predict(test_dataset)
plot_loss(net)
accuracy_dirty = accuracy_score(y_pred, test_labels)
print("Pipeline accuracy in the beginning:", accuracy_dirty)
The above works fine, and I'm able to compute the accuracy of my baseline model.
However, when trying to run importances.score(test_data, test_labels) I'm getting the error mentioned above.
from datascope.importance.common import SklearnModelAccuracy
from datascope.importance.shapley import ShapleyImportance
net = reset_model(seed = 0)
pipeline = Pipeline([("model", net)])
utility = SklearnModelAccuracy(pipeline)
importance = ShapleyImportance(method="neighbor", utility=utility)
importances = importance.fit(train_data, train_labels)
importances.score(test_data, test_labels)
Here's the shape of my data:
train_data.shape, train_labels.shape
((2067, 3, 224, 224), (2067,))
test_data.shape, test_labels.shape
((813, 3, 224, 224), (813,))
Would be happy is someone could point me in the right direction! Not sure if this error is skorch related or the images are not supported yet? Thanks :)