-
Notifications
You must be signed in to change notification settings - Fork 38
Some confusions about AUCM loss #67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thank you for your interest in our library. Let me answer your questions below.
Please let us know if you have any additional questions. |
Thanks for your reply. As you said, the gradient of the mini-batch loss is not an unbiased estimator for the true gradient, and converting this into a SPP will help address this issue. Is there any theoretical support of that. I mean why the gradient of mini-batch loss is not an unbiased estimator? And how SPP works here? In addition, it has been proved that the square loss is convergent in paper One-Pass AUC Optimization without conversion to SPP. And in the last term, the Thanks for your patient reply again. |
Hi Rickey,
I am sorry for the late reply.
The reason is this: the third term in the objective is (E f(n) - E f(p) + c)^2 where E denote the expectation, n is random negative data and p is a random positive data, c is a constant margin. When you calculate the gradient, you have (E f(n) - E f(p) + c) *\nabla (E f(n) - E f(p) + c). You need the minibatch to estimate the expectation. But if you use the same batch, you have a biased estimator. You can address this by using different batch for the two expectations (before and after the *). But this reduce the effective batch size.
If we reformulate it into max, we have max_alpha 2*alpha *(E f(n) - E f(p) + c) - alpha^2, then you will not have this issue any more as you will update alpha using its unbiased gradient estimator and the update the w same way.
You may want to check this paper https://arxiv.org/abs/2202.12396.
See below for answer to the other question.
Regards
Tianbao
On Jan 7, 2025, at 10:05 PM, Rickey ***@***.***> wrote:
Thanks for your reply. As you said, the gradient of the mini-batch loss is not an unbiased estimator for the true gradient, and converting this into a SPP will help address this issue. Is there any theoretical support of that. I mean why the
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
ZjQcmQRYFpfptBannerEnd
Thanks for your reply.
As you said, the gradient of the mini-batch loss is not an unbiased estimator for the true gradient, and converting this into a SPP will help address this issue. Is there any theoretical support of that. I mean why the gradient of mini-batch loss is not an unbiased estimator? And how SPP works here?
And in the last term, the $m+E(f(x_j))-E(f(x_i))$, don't $E(f(x_j))$ and $E(f(x_i))$ mean the scores of positive and negative samples?
This is correct.
Thanks for your patient reply again.
—
Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/Optimization-AI/LibAUC/issues/67*issuecomment-2576703447__;Iw!!KwNVnqRv!Ew01fFMG9E7oxqid3YmiZwSkMfQSlP11c3OwdxshxGmCk9B4JzGqVrAE0yWHr-09Cwz2pcOkFoyZFIV1BFZSlVbTkgWi$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/BAF3BBCZL4SA7HCJL2P5HU32JSPZTAVCNFSM6AAAAABURJNTLSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNZWG4YDGNBUG4__;!!KwNVnqRv!Ew01fFMG9E7oxqid3YmiZwSkMfQSlP11c3OwdxshxGmCk9B4JzGqVrAE0yWHr-09Cwz2pcOkFoyZFIV1BFZSlR3xEylA$>.
You are receiving this because you modified the open/close state.Message ID: ***@***.***>
|
Dear developers,
After learning about your work, I have the following confusions:
loss = self.mean((y_pred - self.a)**2*pos_mask) + self.mean((y_pred - self.b)**2*neg_mask) + (self.margin + self.mean(y_pred*neg_mask) - self.mean(y_pred*pos_mask))**2
self.a
andself.b
, whileself.mean(y_pred*neg_mask) - self.mean(y_pred*pos_mask)
. I would like to know the reason for that.loss = self.mean((y_pred - self.a)**2*pos_mask) + self.mean((y_pred - self.b)**2*neg_mask) + \ 2*self.alpha*(self.margin + self.mean(y_pred*neg_mask) - self.mean(y_pred*pos_mask)) - self.alpha**2
Can I regard that the design of margin loss is to transform the square loss$(m+f(x_j)-f(x_i))^2$ into $max[0, (m+f(x_j)-f(x_i))]^2$ to allow $m+f(x_j)$ to be equal or less than $f(x_i)$ while the square loss only seeks to be equal to $f(x_i)$ ? When the loss function has a value of 0, is there a potential problem that the gradient cannot be updated?
In the demo you provided, the AUC test score of AUCM based on PESG on CIFAR10 can reach 0.9245, while the value quoted in your paper Large-scale Robust Deep AUC Maximization is 0.715±0.008. Is this because some content has been updated?
I would be grateful if you could reply as soon as possible. Wish you a happy new year.
The text was updated successfully, but these errors were encountered: