Description
Motivation
@TobyBoyne recently added the support for categorical dimensions to optimize_mixed_alternating
. This implementation assumes that the categorical dimensions are integer encoded. As consequence, it is directly usable with the MixedSingleTaskGP, but not with GPs that for example assume one-hot encoded categoricals. A solution for this case would be to have a NumericToOneHot
input transform that encodes the categorical feature(s) with in the model.
Currently, botorch features a OneHotToNumeric
input transform including a untransform
functionality. It should be relatively straight-forward to come up with a NumericToOneHot
transform based on this. This would solve the issue of models which expect one-hot encoded features, but not other possible encodings. For example, in chemistry one often uses descriptor encodings, in which one transforms the categorical feature in some kind of descriptors space based on chemical descriptors. From a software engineering point of view, these descriptor encodings of categoricals are very much the same as one-hot encodings: one transforms a vector into a matrix, just the mapping is different. For this reason, I would like to implement a generic NumericToCategoricalEnconding
input transform which takes upon instantiation information regarding the dimensionality of the encoding space and a (non-differentiable) callable that performs the transformation. This would allow to use optimize_mixed_alternating
with any kind of categorical encoding.
What do you think? Does this sounds reasonable to you?
Best,
Johannes
Describe the solution you'd like to see implemented in BoTorch.
see above
Describe any alternatives you've considered to the above solution.
No response
Is this related to an existing issue in BoTorch or another repository? If so please include links to those Issues here.
No response
Pull Request
Yes
Code of Conduct
- I agree to follow BoTorch's Code of Conduct