Skip to content

itest: use notification profiler API instead of main profiler slot#232

Merged
korniltsev-grafanista merged 10 commits intomainfrom
vk/39cb-in-the-integrati
Mar 10, 2026
Merged

itest: use notification profiler API instead of main profiler slot#232
korniltsev-grafanista merged 10 commits intomainfrom
vk/39cb-in-the-integrati

Conversation

@korniltsev-grafanista
Copy link
Contributor

@korniltsev-grafanista korniltsev-grafanista commented Feb 27, 2026

What

Switch the integration test setup from registering Pyroscope as the main CLR profiler to registering it as a notification profiler.

Changed files:

  • itest.Dockerfile — Docker-based integration test environment
  • IntegrationTest/Makefile — local make run target

Before

CORECLR_ENABLE_PROFILING=1
CORECLR_PROFILER={BD1A650D-AC5D-4896-B64F-D6FA25D6B26A}
CORECLR_PROFILER_PATH=/path/to/Pyroscope.Profiler.Native.so

After

CORECLR_ENABLE_NOTIFICATION_PROFILERS=1
CORECLR_NOTIFICATION_PROFILERS=/path/to/Pyroscope.Profiler.Native.so={BD1A650D-AC5D-4896-B64F-D6FA25D6B26A}

Why

The .NET runtime only allows a single main profiler (CORECLR_PROFILER). Using the notification profiler API (CORECLR_NOTIFICATION_PROFILERS) allows Pyroscope to coexist with other profilers or tracers that already occupy the main profiler slot (e.g. OpenTelemetry, Datadog tracer). This makes the integration tests reflect real-world deployment scenarios more accurately.

LD_PRELOAD for Pyroscope.Linux.ApiWrapper.x64.so is unchanged — it is still required for the CPU/wall-time sampling to work correctly on Linux.

Replace CORECLR_ENABLE_PROFILING / CORECLR_PROFILER / CORECLR_PROFILER_PATH
with CORECLR_ENABLE_NOTIFICATION_PROFILERS / CORECLR_NOTIFICATION_PROFILERS
(path={GUID} format) in both itest.Dockerfile and IntegrationTest/Makefile.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@korniltsev-grafanista korniltsev-grafanista changed the title in the integration tests, replace using profiler as the main profiler and instead use as (vibe-kanban) itest: use notification profiler API instead of main profiler slot Feb 27, 2026
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 71cb235968

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…ofiler loading

The notification profiler API requires the profiler to implement
ICorProfilerCallback11 and return TRUE from LoadAsNotificationOnly.
Without this the runtime refuses to load the profiler via
CORECLR_NOTIFICATION_PROFILERS.

- Inherit CorProfilerCallback from ICorProfilerCallback11 (superset of 10)
- Add ICorProfilerCallback11 to QueryInterface
- Implement LoadAsNotificationOnly returning TRUE

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@korniltsev-grafanista
Copy link
Contributor Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6a89bade85

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


HRESULT STDMETHODCALLTYPE CorProfilerCallback::LoadAsNotificationOnly(BOOL* pbNotificationOnly)
{
*pbNotificationOnly = TRUE;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Return notification mode only when explicitly needed

This now marks every profiler load as notification-only, not just the new integration-test path. Existing deployment scripts still load the profiler as the main profiler via CORECLR_PROFILER (profiler/src/ProfilerEngine/DeployResources/DDProf-SetEnv.sh:15-17), and those runs can request capabilities like COR_PRF_ENABLE_REJIT when managed code cache is enabled (CorProfilerCallback.cpp:1517-1520); however COR_PRF_ENABLE_REJIT is not included in COR_PRF_ALLOWABLE_NOTIFICATION_PROFILER (shared/src/native-lib/coreclr/src/inc/corprof.idl:597-617). Forcing notification mode here can therefore make event-mask setup fail and stop profiling in non-itest configurations.

Useful? React with 👍 / 👎.

aleks-p added a commit that referenced this pull request Mar 4, 2026
Add an integration test that verifies Pyroscope works as a notification
profiler when OTEL .NET auto-instrumentation occupies the classic
profiler slot. This proves the coexistence story for PR #232.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@aleks-p
Copy link
Collaborator

aleks-p commented Mar 4, 2026

@korniltsev-grafanista

I tested this with OTEL auto-instrumentation running as the primary profiler and Pyroscope through the notification system. It worked, here it the testing code if you wanted to add it:

https://github.com/grafana/pyroscope-dotnet/compare/vk/39cb-in-the-integrati...test/otel-coexistence?expand=1

@korniltsev-grafanista
Copy link
Contributor Author

Thanks <3 more tests

I opened a draft from your branch just incase so that we do not forget #236

I recently prompted this integration test change #235

I willlikely merge #236 after #235 but before #232

korniltsev-grafanista and others added 3 commits March 5, 2026 12:42
* itest: add dual-profiler integration test (OTEL + Pyroscope)

Add an integration test that verifies Pyroscope works as a notification
profiler when OTEL .NET auto-instrumentation occupies the classic
profiler slot. This proves the coexistence story for PR #232.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Adapt OTEL integration test to Go testcontainers pattern

- Add OTEL support to itest/integration_test.go:
  - Add isOTEL() helper to check OTEL env var
  - Update rideshareImage() and serviceName() to append -otel suffix when OTEL=true
  - Add TestRideshareProfilesWithOTEL() function

- Rewrite itest/Makefile with explicit targets:
  - Remove flexible FLAVOUR/DOTNET_VERSION variables
  - Add explicit targets for all 8 combinations:
    - 4 main-profiler targets (glibc/musl × 6.0/8.0)
    - 4 notification-profiler targets (glibc/musl × 6.0/8.0)
  - Each target builds appropriate image and runs specific test

- Update .github/workflows/integration_test.yml:
  - Keep main-profiler matrix job (4 combinations)
  - Add notification-profiler matrix job (4 combinations)
  - Both use explicit make targets

- Remove deprecated bash-based infrastructure:
  - Delete Dockerfile.load-generator and load-generator.py
  - Mark docker-compose-itest.yml as manual-only

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* Restore load generator files (used by docker-compose)

The load-generator.py and Dockerfile.load-generator files are still
referenced in docker-compose-itest.yml for manual testing scenarios.
Keep them available even though the Go tests use an in-process goroutine
instead of a Docker container for load generation.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* profiler: support notification profiler mode (OTEL coexistence) (#238)

* itest: upload pprofs per vehicle type as GitHub Actions artifacts

After label verification, download a merged pprof for each vehicle
(bike, car, scooter) via profilecli query merge and save them to a
per-matrix directory. The integration_test workflow then uploads that
directory as a named artifact (pprofs-{flavour}-net{version}) so
profiles are available for inspection after each run.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* itest: rewrite integration tests in Go with stacktrace assertions (#235)

* itest: rewrite integration tests in Go with stacktrace assertions

Replace the bash-based verify_profiles.sh with a Go test suite that:
- Queries Pyroscope via the Connect API (SelectMergeStacktraces)
- Deserializes tree-format profiles into collapsed stacks
- Asserts expected function names appear in stacktraces per vehicle label
  (OrderService.FindNearestVehicle for all, CheckDriverAvailability for car)
- Manages docker-compose lifecycle (up/down) from the test itself
- Provides a self-contained Makefile (build-profiler, build-app, itest)

The CI workflow is simplified to: checkout, setup Go, make itest.
Tree/minheap helpers are copied from grafana/otel-profiling-java.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* itest: switch to testcontainers-go, fix stacktrace assertions

Replace docker-compose orchestration with testcontainers-go so the test
manages container lifecycle directly (same pattern as otel-profiling-java).
This eliminates the need for docker-compose-itest.yml at test time.

Also fix stacktrace assertions: the .NET profiler emits frames in the format
  |ct:ClassName |cg:... |fn:MethodName |fg:... |sg:(...)
not as simple "ClassName.MethodName" strings. Add frameContains() helper
that finds a frame matching both class and method name within the same
semicolon-delimited entry.

Increase polling timeout to 3 minutes and go test timeout to 15 minutes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* itest: address review comments

- Replace load generator container with Go goroutine that cycles through
  /bike, /scooter, /car HTTP requests directly from the test process.
  No Docker build needed for the load generator.

- Simplify frameContains: split collapsed stacks by ";" and use a regex
  to match frames containing both |ct:ClassName and |fn:MethodName.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* itest: increase app startup timeout for .NET 6.0

Switch from HTTP health check on /bike to port-listening check with
120s timeout. The /bike endpoint is CPU-intensive and .NET 6.0 with
the profiler attached can take over 60s to become ready.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Adapt OTEL integration test to Go testcontainers pattern

- Add OTEL support to itest/integration_test.go:
  - Add isOTEL() helper to check OTEL env var
  - Update rideshareImage() and serviceName() to append -otel suffix when OTEL=true
  - Add TestRideshareProfilesWithOTEL() function

- Rewrite itest/Makefile with explicit targets:
  - Remove flexible FLAVOUR/DOTNET_VERSION variables
  - Add explicit targets for all 8 combinations:
    - 4 main-profiler targets (glibc/musl × 6.0/8.0)
    - 4 notification-profiler targets (glibc/musl × 6.0/8.0)
  - Each target builds appropriate image and runs specific test

- Update .github/workflows/integration_test.yml:
  - Keep main-profiler matrix job (4 combinations)
  - Add notification-profiler matrix job (4 combinations)
  - Both use explicit make targets

- Remove deprecated bash-based infrastructure:
  - Delete Dockerfile.load-generator and load-generator.py
  - Mark docker-compose-itest.yml as manual-only

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* Restore load generator files (used by docker-compose)

The load-generator.py and Dockerfile.load-generator files are still
referenced in docker-compose-itest.yml for manual testing scenarios.
Keep them available even though the Go tests use an in-process goroutine
instead of a Docker container for load generation.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Tolya Korniltsev <anatoly.korniltsev@grafana.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* Revert "profiler: support notification profiler mode (OTEL coexistence) (#238)"

This reverts commit 94ee37d.

---------

Co-authored-by: Aleksandar Petrov <8142643+aleks-p@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Tolya Korniltsev YOLO vibecoder <anatoly.korniltsev+githubvibecoder@grafana.com>
* Remove unused COR_PRF_ENABLE_REJIT from event mask

COR_PRF_ENABLE_REJIT is not allowed for notification profilers
(not in COR_PRF_ALLOWABLE_NOTIFICATION_PROFILER). The profiler never
calls RequestReJIT/RequestRevert — the managed code cache only
passively observes JITCompilationFinished callbacks, which are
covered by COR_PRF_MONITOR_JIT_COMPILATION alone.

Without this fix, SetEventMask2 returns E_INVALIDARG when running as
a notification profiler with UseManagedCodeCache enabled (default on
ARM64), causing the profiler to fail to initialize.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add comment explaining COR_PRF_ENABLE_REJIT removal

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
@korniltsev-grafanista-yolo-vibecoder239
Copy link
Contributor

I've reviewed the changes in the context of the CoreCLR runtime source code.

The implementation of ICorProfilerCallback11 and LoadAsNotificationOnly looks correct for enabling the "Notification-Only" mode. This allows the profiler to be loaded alongside a main profiler (like a tracing APM) by signaling pbNotificationOnly = TRUE.

However, there is a potential blocking issue on ARM64.

1. ARM64 Incompatibility

On ARM64, the profiler forces UseManagedCodeCache to true in Configuration.cpp:

// Configuration.cpp
    bool defaultUseManagedCodeCache =
    #if ARM64
        true;
    #else
        false;
    #endif

When UseManagedCodeCache is enabled, CorProfilerCallback::Initialize adds COR_PRF_ENABLE_REJIT to the event mask:

// CorProfilerCallback.cpp
    if (_pConfiguration->UseManagedCodeCache())
    {
        eventMask |= COR_PRF_MONITOR_JIT_COMPILATION | COR_PRF_ENABLE_REJIT;
    }

The Problem:
In the CoreCLR runtime, COR_PRF_ENABLE_REJIT is not included in the allowable flags for a notification-only profiler (COR_PRF_ALLOWABLE_NOTIFICATION_PROFILER).

When SetEventMask is called by a notification-only profiler with COR_PRF_ENABLE_REJIT, the runtime will return E_INVALIDARG, and initialization will likely fail.

2. x64/x86 Safety

On x64/x86, UseManagedCodeCache defaults to false. The flags used (COR_PRF_MONITOR_THREADS, COR_PRF_ENABLE_STACK_SNAPSHOT, etc.) are all within the allowed set for notification profilers.
The profiler also correctly avoids calling restricted methods like SetILFunctionBody or SetEnterLeaveFunctionHooks.

Recommendation

You should verify the behavior on ARM64. If ManagedCodeCache (and thus COR_PRF_ENABLE_REJIT) is strictly required for ARM64 as the code suggests, this profiler cannot run as a notification-only profiler on that architecture.

You might need to add logic to LoadAsNotificationOnly to return FALSE on ARM64, or adjust the feature flags if possible.

@korniltsev-grafanista
Copy link
Contributor Author

@codex review

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. Swish!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@korniltsev-grafanista korniltsev-grafanista merged commit f88f310 into main Mar 10, 2026
21 of 23 checks passed
@korniltsev-grafanista korniltsev-grafanista deleted the vk/39cb-in-the-integrati branch March 10, 2026 02:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants