Hi!
I have a binary classification dataset with highly imbalanced label distributions (pos : neg == 1 : 200)
I was trying to apply the BERT code in Neural Network Quick Start Tutorial
directly on this dataset, with val metric set to "Macro-F1", but the trained model would mostly produce all negatives in this case.
I am wondering if there are parameters or configurations I could tune in LibMultiLabel for such an imbalanced dataset to improve the model's performance?
For your reference:
I also tried the linear method, where I saw using train_cost_sensitive instead of train_1vsrest improved noticeably on this issue. (with train_cost_sensitive, the model predicts 4 times more positive samples than with train_1vsrest. Although both methods have 'Micro-F1 and 'P@1' close to 0.99 (due to dominating negative samples) and Macro-F1 around 0.5)
Thanks!