Skip to content

fix: Thread safety issues in nfprofile with TLS implementation (phaag/nfsen#40) #621

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Takalele
Copy link
Contributor

Problem

nfprofile experiences sporadic file handling issues when running with multiple threads. These manifest as:

  • Truncated filenames (e.g., "nfcapd.2025" instead of "nfcapd.202505090900")
  • "File not found" errors when files exist but at a different path
  • Subdirectory path issues where files are looked for in the wrong location

These issues were first introduced in commit 9289f6a when worker threads were added to nfprofile

Solution

This PR implements Thread Local Storage (TLS) for critical shared resources:

  1. New thread_local.h header: Provides a portable TLS macro that works with both C11's _Thread_local and GCC/Clang's __thread

  2. Thread-safe static buffers:

    • Replace the static buffer in GetSubDir() with a TLS version
    • Prevents corruption of subdirectory paths by concurrent threads
  3. Thread-safe string handling:

    • Create thread-local copies of filename and filterfile in InitChannels()
    • Pass these thread-local copies to functions called from within threads
  4. Thread-safe library functions:

    • Replace unsafe localtime() with thread-safe localtime_r()

Testing

  • Verifying that file paths and names are correctly handled under load

Related Issues

- Add thread_local.h header with portable TLS macro
- Replace static buffer in GetSubDir() with thread-local storage
- Make localtime() calls thread-safe using localtime_r()
- Create thread-local copies of filenames in InitChannels()
- Update Makefile.am to include new header

This fixes race conditions that caused incorrect path/filename handling
when multiple threads access these resources concurrently.

Fixes phaag/nfsen#40
Copy link
Owner

@phaag phaag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,
Thanks for this pull request.

Digging into the code, I cannot understand this patch, as all the code you changed is touched only by one thread at the time.
Nfprofile threads are multiple separate workers, which filter data blocks for individual channels. The file handling for all the channels as well as open/close - subdir names etc. are non concurrent and done before and after the filtering.

I suspect some building differences in the code. Do you or the package you install, has changed the building options, such as using OpenMP or other libraries/options, which do some artificial concurrent threading? If the thread local vars fix the problem, then this must be some parallel code execution, which was not intended by the original code .
That would explain the behaviour.

I am happy, to fix and improve the code, but I need to understand why something goes wrong.

The thread local storage is anyway a good hint, which I can add at some locations, for future use and code improvement, even if it's not needed right now.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As C11 is minimum standard to compile the code,, this would actually not being needed. _Thread_local should work everywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The #if RRD_NEEDSCONST needs to be reverted. Most likely this overlapped with your tests, as it was introduced recently

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@Takalele
Copy link
Contributor Author

Hi,

Digging into the code, I cannot understand this patch, as all the code you changed is touched only by one thread at the time.
Nfprofile threads are multiple separate workers, which filter data blocks for individual channels. The file handling for all the channels as well as open/close - subdir names etc. are non concurrent and done before and after the filtering.

for my understanding the InitChannels function runs outside the worker threads and processes the filenames strings before any threading begins. However, these filenames are then accessed by multiple worker threads when they process their assigned channels.

I suspect some building differences in the code. Do you or the package you install, has changed the building options, such as using OpenMP or other libraries/options, which do some artificial concurrent threading? If the thread local vars fix the problem, then this must be some parallel code execution, which was not intended by the original code .
That would explain the behaviour.

Not that i'm aware of, nfdump with nfsens is running inside a ubuntu 24.04 docker container and build as follows:

sh autogen.sh

./configure --libdir=/usr/lib/ --enable-nsel --enable-nfprofile --enable-nftrack --enable-maxmind --enable-tor

----------------------------------
 Build Settings for nfdump v1.7.6
----------------------------------
  host type          = linux-gnu
  install dir        = /usr/local
  CC                 = gcc
  CFLAGS             =  -g -O3 -std=gnu17 -Wall -Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations -Wmissing-noreturn -fno-strict-aliasing -DNSEL -DBUILDTOR -pthread
  CPPFLAGS           =
  LDFLAGS            =
  LIBS               = -lpthread -latomic  -lrrd -llz4 -lbz2 -lzstd -lresolv
  Enable liblz4      = yes
  Enable libbz2      = yes
  Enable libzstd     = yes
  Enable ja4         = no
  Build geolookup    = yes
  Build torlookup    = yes
  Build sflow        = no
  Build nfpcapd      = no - with gzip pcap-reader: yes
  Build nfprofile    = yes
  Build ft2nfdump    = no
----------------------------------

make
make install

@phaag
Copy link
Owner

phaag commented May 19, 2025

Hi,

Digging into the code, I cannot understand this patch, as all the code you changed is touched only by one thread at the time.
Nfprofile threads are multiple separate workers, which filter data blocks for individual channels. The file handling for all the channels as well as open/close - subdir names etc. are non concurrent and done before and after the filtering.

for my understanding the InitChannels function runs outside the worker threads and processes the filenames strings before any threading begins. However, these filenames are then accessed by multiple worker threads when they process their assigned channels.

The worker threads only touch the channel struct for data processing - filtering the source data record against individual channel filters. The workers only do record processing. When all data records are processed, the worker threats exit themselves.
The main thread call InitChannels as well as does open all input/output files - and - after all records are processed, closes the files. The worker threads never touch the filenames, nor use them somewhere. The filenames could even propagated down as const (which I should change). The worker threads only need worker_param_t for their job.

I suspect some building differences in the code. Do you or the package you install, has changed the building options, such as using OpenMP or other libraries/options, which do some artificial concurrent threading? If the thread local vars fix the problem, then this must be some parallel code execution, which was not intended by the original code .
That would explain the behaviour.

Not that i'm aware of, nfdump with nfsens is running inside a ubuntu 24.04 docker container and build as follows:

sh autogen.sh

./configure --libdir=/usr/lib/ --enable-nsel --enable-nfprofile --enable-nftrack --enable-maxmind --enable-tor

----------------------------------
 Build Settings for nfdump v1.7.6
----------------------------------
  host type          = linux-gnu
  install dir        = /usr/local
  CC                 = gcc
  CFLAGS             =  -g -O3 -std=gnu17 -Wall -Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations -Wmissing-noreturn -fno-strict-aliasing -DNSEL -DBUILDTOR -pthread
  CPPFLAGS           =
  LDFLAGS            =
  LIBS               = -lpthread -latomic  -lrrd -llz4 -lbz2 -lzstd -lresolv
  Enable liblz4      = yes
  Enable libbz2      = yes
  Enable libzstd     = yes
  Enable ja4         = no
  Build geolookup    = yes
  Build torlookup    = yes
  Build sflow        = no
  Build nfpcapd      = no - with gzip pcap-reader: yes
  Build nfprofile    = yes
  Build ft2nfdump    = no
----------------------------------

make
make install

The effect you describe happens only, if InitChannels() -> SetupProfileChannels() is run concurrently, as all the file name processing, and file handling happens there. After that point in code, filenames have been set, files are open and processed. There is no touching of filenames after InitChannels().
Therefore I still do not understand that this fixes the problem.

I can propose some code changes, which cleanup old code anyway.

@Takalele
Copy link
Contributor Author

Hi,

I can propose some code changes, which cleanup old code anyway.

I'm happy to test the code changes, and I'll also try compiling the current code with older compilers on older distributions. Hopefully, this might reveal something that explains the my issue.

Could you let me know if you are using nfdump with nfsen and nfprofiles, and which compiler and version were used to compile it?

@phaag
Copy link
Owner

phaag commented May 20, 2025

I am running just one instance for compatibility checking on a Debian, and using the clang compiler.

In order to exclude other side effects, it is recommended to set the number of profilers in nfsen.conf to 1. This option resulted from old nfdump-1.6.x without threading.
By setting the number of profilers to 1, it's guaranteed, that only one instance is running, although the code should theoretically work with multiple profilers .. just to make sure.

@Takalele
Copy link
Contributor Author

I am running just one instance for compatibility checking on a Debian, and using the clang compiler.
bookworm?

In order to exclude other side effects, it is recommended to set the number of profilers in nfsen.conf to 1. This option resulted from old nfdump-1.6.x without threading.
By setting the number of profilers to 1, it's guaranteed, that only one instance is running, although the code should theoretically work with multiple profilers .. just to make sure.

I also tried this while debugging, but unfortunately, it didn’t help. Next, I’ll try setting up two Bookworm containers: one with the number of profilers set to 1 and the other with it set to 2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

nfcapd / nfprofile nfcapd.JJJJMMDDHHMM: File not found!
2 participants