Pull Request Notifier Plugin Blocking Threads on all https nodes #290
Description
Hi Support Team
We are using the Pull Request Notifier Plugin on our Bitbucket (8 nodes cluster environment). We have 4 nodes dedicated for the https traffic and other 4 for the ssh traffic. We had production outage today on the https nodes on June 12 around 5 am EDT. We restarted our nodes in rolling fashion multiple times but it didn't help. It was only after disabling the plugin we had a positive result and the number of thread decreased immediately.
We were working with Atlassian for the RCA and found most of our threads were in the locked state.
- waiting to lock <0x00000004cb556c08> (a se.bjurr.prnfb.service.SettingsService)
at se.bjurr.prnfb.service.SettingsService.getPrnfbSettings(SettingsService.java:192)
at se.bjurr.prnfb.service.SettingsService.findButton(SettingsService.java:115)
at se.bjurr.prnfb.service.SettingsService.getButton(SettingsService.java:123)
at se.bjurr.prnfb.service.ButtonsService.doGetButtons(ButtonsService.java:58)
at se.bjurr.prnfb.service.ButtonsService.getButtons(ButtonsService.java:130)
at se.bjurr.prnfb.presentation.ButtonServlet.get(ButtonServlet.java:146)
t se.bjurr.prnfb.service.SettingsService$7.perform(SettingsService.java:318)
at com.atlassian.stash.internal.user.DefaultEscalatedSecurityContext.call(DefaultEscalatedSecurityContext.java:51)
at se.bjurr.prnfb.service.SettingsService.inSynchronizedTransaction(SettingsService.java:314)
- locked <0x00000004cb556c08> (a se.bjurr.prnfb.service.SettingsService)
at se.bjurr.prnfb.service.SettingsService.getPrnfbSettings(SettingsService.java:192)
at se.bjurr.prnfb.service.SettingsService.findButton(SettingsService.java:115)
at se.bjurr.prnfb.service.SettingsService.getButton(SettingsService.java:123)
at se.bjurr.prnfb.service.ButtonsService.doGetButtons(ButtonsService.java:58)
at se.bjurr.prnfb.service.ButtonsService.getButtons(ButtonsService.java:130)
at se.bjurr.prnfb.presentation.ButtonServlet.get(ButtonServlet.java:146)
i am attaching here thread dumps and the logs.
- Plugin version used.- 2.63
- Bitbucket Server version used- 4.12.1
Atlassian also suspect that the plugin store data in the DB. This is probably how the Add-on was able to cause this problem after a reboot. We would like to know if there is a table with jobs in the DB that can be cleared out to allow you to use this Add-on again. We would like to get a complete RCA for this issue. Currently the plugin is in disabled state and lot of users are affected. Thanks,
Ankush Gupta
JPMorgan Chase