Skip to content

Conversation

servesh
Copy link
Contributor

@servesh servesh commented Mar 26, 2025

On Aurora when using kdreg2 the logs are flooded with region database full messages. This eventually causes our backend log aggregation system to be overwhelmed. So proposing to ratelimit these functions.

If there are other parts of the code such behavior is expected, would be good to ratelimit them as well.

d314159 and others added 2 commits January 30, 2025 12:21
Signed-off-by: dennis-c-josifovich <dennis.c.josifovich@hpe.com>
If the module load directry does not exist, then
the rpm post scriptlet will fail. Create the directory.

Signed-off-by: James Swaro <james.swaro@hpe.com>
@servesh
Copy link
Contributor Author

servesh commented Mar 26, 2025

Sample messages being seen:

[Fri Mar 21 01:32:07 2025] [156430] kdreg2:kdreg2_monitor_region: Region database full, rejecting request to monitor region.
[Fri Mar 21 01:32:07 2025] [156426] kdreg2:kdreg2_monitor_region: Region database full, rejecting request to monitor region.
[Fri Mar 21 01:32:07 2025] [156430] kdreg2:kdreg2_ioctl: KDREG2_IOCTL_MONITOR: failure -28
[Fri Mar 21 01:32:07 2025] [156426] kdreg2:kdreg2_ioctl: KDREG2_IOCTL_MONITOR: failure -28
[Fri Mar 21 01:32:07 2025] [156433] kdreg2:kdreg2_monitor_region: Region database full, rejecting request to monitor region.
[Fri Mar 21 01:32:07 2025] [156433] kdreg2:kdreg2_ioctl: KDREG2_IOCTL_MONITOR: failure -28
[Fri Mar 21 01:32:07 2025] [156430] kdreg2:kdreg2_monitor_region: Region database full, rejecting request to monitor region.
[Fri Mar 21 01:32:07 2025] [156430] kdreg2:kdreg2_ioctl: KDREG2_IOCTL_MONITOR: failure -28
[Fri Mar 21 01:32:07 2025] [156433] kdreg2:kdreg2_monitor_region: Region database full, rejecting request to monitor region.
[Fri Mar 21 01:32:07 2025] [156433] kdreg2:kdreg2_ioctl: KDREG2_IOCTL_MONITOR: failure -28

@servesh
Copy link
Contributor Author

servesh commented Mar 27, 2025

Additional messages that are seen that needs to be rate limited as well,

kdreg2:kdreg2_context_resize: resize to 1024 entities
kdreg2:kdreg2_open: Instance opened.
kdreg2:kdreg2_detect_fork: Fork() detected - monitoring not supported in child
kdreg2:kdreg2_release: Instance closed.

@iziemba
Copy link

iziemba commented Apr 10, 2025

Hi Servesh. Thanks for opening this. HPE is working on incorporating in on the main branch. Once landed on main, it will be cherry-picked to release/shs-12.0.

jswaro and others added 3 commits April 28, 2025 17:45
cleanWs was removed, and checkout can be performed at a higher
level within Jenkins. Remove it from the pipeline

Signed-off-by: James Swaro <james.swaro@hpe.com>
Signed-off-by: dennis-c-josifovich <dennis.c.josifovich@hpe.com>
@servesh servesh force-pushed the fix-ratelimit-monitor-region branch from 2bbdc29 to 1c8ab68 Compare April 30, 2025 17:25
@servesh
Copy link
Contributor Author

servesh commented Apr 30, 2025

Noticed that the main branch didn't have further changes noted above in comments. So updated the branch with the new commit that addresses further messages identified on Aurora.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants