Skip to content

"Parameters" are reported for non-parametric distribution (GaussianKDE); it is just a copy of the data #470

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
npatki opened this issue May 14, 2025 · 0 comments
Labels
bug There is an error in the code that needs to be fixed

Comments

@npatki
Copy link

npatki commented May 14, 2025

Environment Details

  • Copulas version: 0.12.2
  • Python version: 3.11
  • Operating System: Linux

Error Description

As first described in #469, it seems that whenever Copulas is asked to print parameters for a fitted GaussianKDE distribution, it just prints out a copy of the data that was fitted.

In the code below, the final column (column z) is fitted to a GaussianKDE distribution.

from copulas.datasets import sample_trivariate_xyz
from copulas.multivariate import GaussianMultivariate

data = sample_trivariate_xyz()
dist = GaussianMultivariate()
dist.fit(data)
parameters = dist.to_dict()
univariates = parameters['univariates']
print(univariates[2])
{'dataset': [0.638689008563623, 1.058121237066397, 0.3725063445214631, 0.687369594994837, -0.8810681732344304, -0.7121672205062004, 5.050261904362624, ...
  'type': 'copulas.univariate.gaussian_kde.GaussianKDE'

The data seems to be just be the exact values in column z

Expected Behavior

It's unexpected that the entire column's data would be reported at this step.

I would expect that when printing out the distribution, it would only show the 'type' of distribution and nothing else.

print(univariates[2])
{ 'type': ''copulas.univariate.gaussian_kde.GaussianKDE' }

It seems like the "parameters" are set to the data in fit portion:

def _fit(self, X):
if self._sample_size:
X = gaussian_kde(X, bw_method=self.bw_method, weights=self.weights).resample(
self._sample_size
)
self._params = {'dataset': X.tolist()}
self._model = self._get_model()

Ideally, the _params assigned to the GaussianKDE should be None, GaussianKDE is non-parametric distribution. Whatever info we need to save the state of the GassianKDE should be saved under a different name and not exposed as parameters.

@npatki npatki added the bug There is an error in the code that needs to be fixed label May 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug There is an error in the code that needs to be fixed
Projects
None yet
Development

No branches or pull requests

1 participant