-
-
Notifications
You must be signed in to change notification settings - Fork 213
fix: Thread safety issues in nfprofile with TLS implementation (phaag/nfsen#40) #621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
- Add thread_local.h header with portable TLS macro - Replace static buffer in GetSubDir() with thread-local storage - Make localtime() calls thread-safe using localtime_r() - Create thread-local copies of filenames in InitChannels() - Update Makefile.am to include new header This fixes race conditions that caused incorrect path/filename handling when multiple threads access these resources concurrently. Fixes phaag/nfsen#40
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
Thanks for this pull request.
Digging into the code, I cannot understand this patch, as all the code you changed is touched only by one thread at the time.
Nfprofile threads are multiple separate workers, which filter data blocks for individual channels. The file handling for all the channels as well as open/close - subdir names etc. are non concurrent and done before and after the filtering.
I suspect some building differences in the code. Do you or the package you install, has changed the building options, such as using OpenMP or other libraries/options, which do some artificial concurrent threading? If the thread local vars fix the problem, then this must be some parallel code execution, which was not intended by the original code .
That would explain the behaviour.
I am happy, to fix and improve the code, but I need to understand why something goes wrong.
The thread local storage is anyway a good hint, which I can add at some locations, for future use and code improvement, even if it's not needed right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As C11 is minimum standard to compile the code,, this would actually not being needed. _Thread_local should work everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The #if RRD_NEEDSCONST
needs to be reverted. Most likely this overlapped with your tests, as it was introduced recently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Hi,
for my understanding the InitChannels function runs outside the worker threads and processes the filenames strings before any threading begins. However, these filenames are then accessed by multiple worker threads when they process their assigned channels.
Not that i'm aware of, nfdump with nfsens is running inside a ubuntu 24.04 docker container and build as follows:
|
_Thread_local is enough
The worker threads only touch the
The effect you describe happens only, if I can propose some code changes, which cleanup old code anyway. |
Hi,
I'm happy to test the code changes, and I'll also try compiling the current code with older compilers on older distributions. Hopefully, this might reveal something that explains the my issue. Could you let me know if you are using nfdump with nfsen and nfprofiles, and which compiler and version were used to compile it? |
I am running just one instance for compatibility checking on a Debian, and using the clang compiler. In order to exclude other side effects, it is recommended to set the number of profilers in nfsen.conf to 1. This option resulted from old nfdump-1.6.x without threading. |
I also tried this while debugging, but unfortunately, it didn’t help. Next, I’ll try setting up two Bookworm containers: one with the number of profilers set to 1 and the other with it set to 2. |
Problem
nfprofile experiences sporadic file handling issues when running with multiple threads. These manifest as:
These issues were first introduced in commit 9289f6a when worker threads were added to nfprofile
Solution
This PR implements Thread Local Storage (TLS) for critical shared resources:
New
thread_local.h
header: Provides a portable TLS macro that works with both C11's_Thread_local
and GCC/Clang's__thread
Thread-safe static buffers:
GetSubDir()
with a TLS versionThread-safe string handling:
filename
andfilterfile
inInitChannels()
Thread-safe library functions:
localtime()
with thread-safelocaltime_r()
Testing
Related Issues