Large standard deviation when reproducing experiment results

Has anyone tried this model on benchmark datasets like Arrhythmia or Thyroid ?

I use ten different seeds [000, 111, 222, ..., 999], and evaluate the performance of DAGMM (Structure of autoencoder, learning rate, batch size, are exactly the same). Below is the AUC and Precision results on Thyroid:

AUC: 0.5562	0.5546	0.9403	0.9439	0.5592	0.6733	0.9156	0.7703	0.6353	0.8264
Precision: 0.0968	0.0108	0.6129	0.4301	0.0538	0.3226	0.4731	0.2366	0.1505	0.2366

It is clear that three precision records are close to the one reported in raw paper, even better. However, the standard deviation over 10 independent trials is quite large...

I'm not sure whether there is something wrong with my experiment code, or the model is inherently unstable. 

Therefore, I would like to ask that has anyone else also observed such large standard deviation. 

Thanks :-)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Large standard deviation when reproducing experiment results #18

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Large standard deviation when reproducing experiment results #18

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions