Skip to content

Reset write lock state to init after closing write #12215

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 13, 2025

Conversation

JosiahWI
Copy link
Contributor

@JosiahWI JosiahWI commented Apr 28, 2025

This commit mitigates the crash reported in #11700. That crash happens when a redirect is issued on a state machine that has already cached a response and closed the cache write VC.

After this patch, the state machine will likely open a new cache VC to cache the response from the origin it was redirected to. We will refer to the original origin as A, and the origin the state machine was redirected to after the response from A as B.

We have not yet reproduced this locally - the exact sequence of events that gets the state machine into this state are still unknown.

Some things to pay attention to for review:

  • Normal behavior is to cache B's response under A's URI. In the edge case this patch mitigates, A and B's responses will both be cached, possibly with B's response overwriting A's, or maybe not... this is still untested.
  • This does a second cache write when normally only one cache write occurs during a state machine's lifetime. Are both writes independent from each other's state, and properly cleaned up to prevent memory leaks?
  • Are there other places where the write lock should also be reset?
  • Can the escalate plugin force a redirect to happen after kill_this() has been called (we tried to detect this with a release assert, and did not).

Resolves #11700.

This commit mitigates the crash reported in apache#11700. That crash happens when
a redirect is issued on a state machine that has already cached a response
and closed the cache write VC.

After this patch, the state machine will likely open a new cache VC to cache
the response from the origin it was redirected to. We will refer to the
original origin as A, and the origin the state machine was redirected to after
the response from A as B.

We have not yet reproduced this locally - the exact sequence of events that
gets the state machine into this state are still unknown.

Some things to pay attention to for review:

* Normal behavior is to cache B's response under A's URI. In the edge case
this patch mitigates, A and B's responses will both be cached, possibly with
B's response overwriting A's, or maybe not... this is still untested.
* This does a second cache write when normally only one cache write occurs
during a state machine's lifetime. Are both writes independent from each
other's state, and properly cleaned up to prevent memory leaks?
* Are there other places where the write lock should also be reset?
* Can the escalate plugin force a redirect to happen after `kill_this()`
has been called (we tried to detect this with a release assert, and did not).
@JosiahWI JosiahWI requested a review from Copilot April 28, 2025 21:28
@JosiahWI JosiahWI self-assigned this Apr 28, 2025
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses a crash related to cache write closures by ensuring that the write lock state is reset to its initial value (CACHE_WL_INIT) after closing the cache write path.

  • Reset the write lock state after closing cache writes in two different functions.
  • Aims to mitigate crashes when a state machine reopens a cache VC after a redirect.
Comments suppressed due to low confidence (2)

src/proxy/http/HttpSM.cc:6463

  • Adding the write lock state reset immediately after closing the cache write appears correct, but please ensure that all code paths that perform a cache write closure are updated similarly to avoid any inconsistent state issues.
t_state.cache_info.write_lock_state = HttpTransact::CACHE_WL_INIT;

src/proxy/http/HttpSM.cc:6522

  • The write lock state reset here should be verified against all related code paths. Consider consolidating the write lock reset logic into a shared helper if it is repeated elsewhere to improve maintainability.
t_state.cache_info.write_lock_state = HttpTransact::CACHE_WL_INIT;

@bryancall bryancall added this to the 10.2.0 milestone Apr 28, 2025
@bryancall bryancall added the Bug label Apr 28, 2025
@bryancall bryancall requested a review from masaori335 April 28, 2025 22:12
Copy link
Contributor

@masaori335 masaori335 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The root cause of the crash is a bit unclear, but reseting the write_lock_state after HttpCacheSM::close_write() seems reasonable.

@masaori335
Copy link
Contributor

I'd point out one thing, if this only happens on following redirect, resetting the write_lock_state in HttpSM::redirect_request like object_read is another idea. However, it might have other side effects. Obviously, following redirect needs refactoring.

https://github.com/apache/trafficserver/blob/master/src/proxy/http/HttpSM.cc#L8437-L8439

@JosiahWI JosiahWI merged commit 5c0aaf2 into apache:master May 13, 2025
15 checks passed
@github-project-automation github-project-automation bot moved this to For v10.1.0 in ATS v10.1.x May 13, 2025
@JosiahWI JosiahWI deleted the fix/redirect-on-closed-cache-vc branch May 13, 2025 00:37
@cmcfarlen cmcfarlen moved this from For v10.1.0 to picked v10.1.0 in ATS v10.1.x May 15, 2025
@cmcfarlen cmcfarlen modified the milestones: 10.2.0, 10.1.0 May 15, 2025
@cmcfarlen
Copy link
Contributor

Cherry-picked to 10.1.x branch

cmcfarlen pushed a commit that referenced this pull request May 15, 2025
This commit mitigates the crash reported in #11700. That crash happens when
a redirect is issued on a state machine that has already cached a response
and closed the cache write VC.

After this patch, the state machine will likely open a new cache VC to cache
the response from the origin it was redirected to. We will refer to the
original origin as A, and the origin the state machine was redirected to after
the response from A as B.

We have not yet reproduced this locally - the exact sequence of events that
gets the state machine into this state are still unknown.

(cherry picked from commit 5c0aaf2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: picked v10.1.0
Development

Successfully merging this pull request may close these issues.

10.1: Crash HttpSM::setup_cache_write_transfer while handling redirect
4 participants