Skip to content

Two slight differences when compiling code as C and C++ #1126

@malkia

Description

@malkia

Hi @daanx,

First thanks again for this awesome library!

My team (COD devs @ Activsion) have spotted some issues with hanging akin to #890 but on Windows. Very rare, real hard to repro, but given our fleet of test infra it happens quite a lot.

We found that it maybe due to improper use of atomics, and compiling to C++ possibly fixes it (comparing the assembly generated .cod files from MSVC) shows the C++ version would add memory barriers, and the nop (npad 1) instruction, and possibly other changes.

So this was hopeful, but wanted to check what else changes by moving to C++, and I've found that now MSVCP140.dll loads too (which is not unexpected in general, but before it wasn't). And here are the two functions it needed.

Image

(we have special custom build of mimalloc with switchable flavors at runtime controlled by env, but essentially each "mimalloc-flavor-xxxx.dll" is a mimalloc compiled with special flags, in case you are wondering about the naming).

So from the MSVCP140.dll it needs _Thrd_yield an std::get_new_handler - now the latter is expected I guess, but I got curious about the former.
Why would changing to C++ would need this now, and not before, and this led me to this discovery:

#if defined(__cplusplus)

#if defined(__cplusplus)
#include <thread>
static inline void mi_atomic_yield(void) {
  std::this_thread::yield();
}
#elif defined(_WIN32)
static inline void mi_atomic_yield(void) {
  YieldProcessor();
}
#elif defined(__SSE2__)
#include <emmintrin.h>
...

and then to the std::this_thread::yield implementation -
https://github.com/microsoft/STL/blob/5f8b52546480a01d1d9be6c033e31dfce48d4f13/stl/src/cthread.cpp#L86

which seems to call SwitchToThread

_CRTIMP2_PURE void __cdecl _Thrd_yield() noexcept { // surrender remainder of timeslice
    SwitchToThread();
}

So the difference is that: YieldProcessor emits pause (same as what _mm_pause would do), while std::this_thread::yield would call SwitchToProcessor() that may sometimes call into kernel to reschedule the current thread. I've asked ChatGPT here - https://chatgpt.com/share/689beeff-69c0-800a-be43-c16de7a405e3

I didn't want to change too much the behavior, so I've decided to stick for these two cases the way "C" did it (and also did not want to load the MSVC140.dll - it gets loaded anyway, but didn't want to disturb this part).

But it raises the question - what is really intended to be called here - The "pause" (YieldProcessor), or std::this_thread::yield SwitchToProcessor?

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions