Skip to content

Conversation

@lei-1126
Copy link

Hi, I have used the package copulas when I need to simulate real data. I used GaussianMultivariate to fit, and then sampled data by GaussianMultivariate.sample(),but I found it was very slow when I used 20000 samples(10 features) for fitting to generate 20000 simulation data. I found that a lot time was spent in calculating the cumulative distribution function. So I modified some source code in the module gaussian_kde.py to improve the speed. In the end, my simulation speed increased by about 100 times.

Of course, as my data are all integers, so for a variable, there are many samples with the same value and I can count each value to redefine the weight. Although copula is used for continuous data, In real situations, data of int type is often used for fitting, and most of the values are the same for one variable, especially when there is lot of training data. In such situation, the simulation speed will increase a lot if we use the modified code.

best wishes

@lei-1126 lei-1126 closed this Apr 2, 2021
@lei-1126 lei-1126 reopened this Apr 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant