Skip to content

Conversation

@huww98
Copy link
Contributor

@huww98 huww98 commented Oct 24, 2025

With the added test in the first commit:

% go test -race -trimpath ./schema
==================
WARNING: DATA RACE
Read at 0x00c0002a4078 by goroutine 25:
  sigs.k8s.io/structured-merge-diff/v6/schema.(*Map).CopyInto()
      sigs.k8s.io/structured-merge-diff/v6/schema/elements.go:178 +0x1d0
  sigs.k8s.io/structured-merge-diff/v6/schema.TestMapCopyInto.func2()
      sigs.k8s.io/structured-merge-diff/v6/schema/elements_test.go:235 +0x44

Previous write at 0x00c0002a4078 by goroutine 24:
  sigs.k8s.io/structured-merge-diff/v6/schema.(*Map).FindField.func1()
      sigs.k8s.io/structured-merge-diff/v6/schema/elements.go:154 +0x60
  sync.(*Once).doSlow()
      sync/once.go:78 +0x94
  sync.(*Once).Do()
      sync/once.go:69 +0x40
  sigs.k8s.io/structured-merge-diff/v6/schema.(*Map).FindField()
      sigs.k8s.io/structured-merge-diff/v6/schema/elements.go:153 +0x7c
  sigs.k8s.io/structured-merge-diff/v6/schema.TestMapCopyInto()
      sigs.k8s.io/structured-merge-diff/v6/schema/elements_test.go:238 +0x2f0
  testing.tRunner()
      testing/testing.go:1934 +0x164
  testing.(*T).Run.gowrap1()
      testing/testing.go:1997 +0x3c

Goroutine 25 (running) created at:
  sigs.k8s.io/structured-merge-diff/v6/schema.TestMapCopyInto()
      sigs.k8s.io/structured-merge-diff/v6/schema/elements_test.go:234 +0x2dc
  testing.tRunner()
      testing/testing.go:1934 +0x164
  testing.(*T).Run.gowrap1()
      testing/testing.go:1997 +0x3c

Goroutine 24 (finished) created at:
  testing.(*T).Run()
      testing/testing.go:1997 +0x6e0
  testing.runTests.func1()
      testing/testing.go:2477 +0x74
  testing.tRunner()
      testing/testing.go:1934 +0x164
  testing.runTests()
      testing/testing.go:2475 +0x734
  testing.(*M).Run()
      testing/testing.go:2337 +0xaf4
  main.main()
      _testmain.go:59 +0x100
==================
==================
WARNING: DATA RACE
Read at 0x00c000284090 by goroutine 25:
  runtime.mapaccess1_faststr()
      internal/runtime/maps/runtime_faststr_swiss.go:103 +0x28c
  sigs.k8s.io/structured-merge-diff/v6/schema.(*Map).FindField()
      sigs.k8s.io/structured-merge-diff/v6/schema/elements.go:159 +0xb8
  sigs.k8s.io/structured-merge-diff/v6/schema.TestMapCopyInto.func2()
      sigs.k8s.io/structured-merge-diff/v6/schema/elements_test.go:236 +0x58

Previous write at 0x00c000284090 by goroutine 24:
  runtime.mapaccess2_faststr()
      internal/runtime/maps/runtime_faststr_swiss.go:162 +0x29c
  sigs.k8s.io/structured-merge-diff/v6/schema.(*Map).FindField.func1()
      sigs.k8s.io/structured-merge-diff/v6/schema/elements.go:156 +0x15c
  sync.(*Once).doSlow()
      sync/once.go:78 +0x94
  sync.(*Once).Do()
      sync/once.go:69 +0x40
  sigs.k8s.io/structured-merge-diff/v6/schema.(*Map).FindField()
      sigs.k8s.io/structured-merge-diff/v6/schema/elements.go:153 +0x7c
  sigs.k8s.io/structured-merge-diff/v6/schema.TestMapCopyInto()
      sigs.k8s.io/structured-merge-diff/v6/schema/elements_test.go:238 +0x2f0
  testing.tRunner()
      testing/testing.go:1934 +0x164
  testing.(*T).Run.gowrap1()
      testing/testing.go:1997 +0x3c

Goroutine 25 (running) created at:
  sigs.k8s.io/structured-merge-diff/v6/schema.TestMapCopyInto()
      sigs.k8s.io/structured-merge-diff/v6/schema/elements_test.go:234 +0x2dc
  testing.tRunner()
      testing/testing.go:1934 +0x164
  testing.(*T).Run.gowrap1()
      testing/testing.go:1997 +0x3c

Goroutine 24 (finished) created at:
  testing.(*T).Run()
      testing/testing.go:1997 +0x6e0
  testing.runTests.func1()
      testing/testing.go:2477 +0x74
  testing.tRunner()
      testing/testing.go:1934 +0x164
  testing.runTests()
      testing/testing.go:2475 +0x734
  testing.(*M).Run()
      testing/testing.go:2337 +0xaf4
  main.main()
      _testmain.go:59 +0x100
==================
==================
WARNING: DATA RACE
Read at 0x00c0002a6020 by goroutine 25:
  sigs.k8s.io/structured-merge-diff/v6/schema.(*Map).FindField()
      sigs.k8s.io/structured-merge-diff/v6/schema/elements.go:159 +0xc8
  sigs.k8s.io/structured-merge-diff/v6/schema.TestMapCopyInto.func2()
      sigs.k8s.io/structured-merge-diff/v6/schema/elements_test.go:236 +0x58

Previous write at 0x00c0002a6020 by goroutine 24:
  sigs.k8s.io/structured-merge-diff/v6/schema.(*Map).FindField.func1()
      sigs.k8s.io/structured-merge-diff/v6/schema/elements.go:156 +0x16c
  sync.(*Once).doSlow()
      sync/once.go:78 +0x94
  sync.(*Once).Do()
      sync/once.go:69 +0x40
  sigs.k8s.io/structured-merge-diff/v6/schema.(*Map).FindField()
      sigs.k8s.io/structured-merge-diff/v6/schema/elements.go:153 +0x7c
  sigs.k8s.io/structured-merge-diff/v6/schema.TestMapCopyInto()
      sigs.k8s.io/structured-merge-diff/v6/schema/elements_test.go:238 +0x2f0
  testing.tRunner()
      testing/testing.go:1934 +0x164
  testing.(*T).Run.gowrap1()
      testing/testing.go:1997 +0x3c

Goroutine 25 (running) created at:
  sigs.k8s.io/structured-merge-diff/v6/schema.TestMapCopyInto()
      sigs.k8s.io/structured-merge-diff/v6/schema/elements_test.go:234 +0x2dc
  testing.tRunner()
      testing/testing.go:1934 +0x164
  testing.(*T).Run.gowrap1()
      testing/testing.go:1997 +0x3c

Goroutine 24 (finished) created at:
  testing.(*T).Run()
      testing/testing.go:1997 +0x6e0
  testing.runTests.func1()
      testing/testing.go:2477 +0x74
  testing.tRunner()
      testing/testing.go:1934 +0x164
  testing.runTests()
      testing/testing.go:2475 +0x734
  testing.(*M).Run()
      testing/testing.go:2337 +0xaf4
  main.main()
      _testmain.go:59 +0x100
==================
FAIL
FAIL    sigs.k8s.io/structured-merge-diff/v6/schema     0.549s
FAIL

This is discovered in https://prow.k8s.io/view/gs/kubernetes-ci-logs/logs/ci-kubernetes-integration-race-master/1979871352273768448

The second commit fixed this. And the third commit optimized the memory comsumption a little by the way. And here is the benchmark result from my laptop:

goos: darwin
goarch: arm64
pkg: sigs.k8s.io/structured-merge-diff/v6/schema
cpu: Apple M2 Pro
                   │  base.txt   │            fix-race.txt             │             pointer.txt             │
                   │   sec/op    │   sec/op     vs base                │   sec/op     vs base                │
FindFieldCached-12   8.785n ± 2%   6.630n ± 1%  -24.53% (p=0.000 n=10)   6.630n ± 0%  -24.53% (p=0.000 n=10)
FindFieldNew-12      245.8n ± 2%   245.9n ± 2%        ~ (p=0.362 n=10)   194.0n ± 1%  -21.07% (p=0.000 n=10)
MakeMap-12           93.01n ± 1%   94.45n ± 1%   +1.55% (p=0.014 n=10)   92.64n ± 2%        ~ (p=0.382 n=10)
geomean              58.56n        53.60n        -8.48%                  49.21n       -15.97%

                   │   base.txt    │             fix-race.txt             │             pointer.txt              │
                   │     B/op      │    B/op      vs base                 │    B/op     vs base                  │
FindFieldCached-12    0.000 ± 0%      0.000 ± 0%       ~ (p=1.000 n=10) ¹   0.000 ± 0%        ~ (p=1.000 n=10) ¹
FindFieldNew-12      1216.0 ± 0%     1208.0 ± 0%  -0.66% (p=0.000 n=10)     648.0 ± 0%  -46.71% (p=0.000 n=10)
MakeMap-12            400.0 ± 0%      384.0 ± 0%  -4.00% (p=0.000 n=10)     384.0 ± 0%   -4.00% (p=0.000 n=10)
geomean                          ²                -1.57%                ²               -20.02%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                   │   base.txt   │             fix-race.txt             │             pointer.txt              │
                   │  allocs/op   │ allocs/op   vs base                  │ allocs/op   vs base                  │
FindFieldCached-12   0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹   0.000 ± 0%        ~ (p=1.000 n=10) ¹
FindFieldNew-12      7.000 ± 0%     8.000 ± 0%  +14.29% (p=0.000 n=10)     8.000 ± 0%  +14.29% (p=0.000 n=10)
MakeMap-12           5.000 ± 0%     5.000 ± 0%        ~ (p=1.000 n=10) ¹   5.000 ± 0%        ~ (p=1.000 n=10) ¹
geomean                         ²                +4.55%                ²                +4.55%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

We got both significant speed and allocate bytes improvements. Only downside is one extra allotation op for new map allocation (presumed due to the pointer to map used in atomic.Pointer). But this is not at the hot path, so I think it should be fine.

The test reveals the race condition and prepares for refactor.
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 24, 2025
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 24, 2025
}

go func() {
s.CopyInto(&theCopy)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CopyInto documents the function is not thread safe ... do we really call it in multi-threaded contexts normally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CopyInto documents the function is not thread safe

Didn't see that. Will also remove the documents in this PR

do we really call it in multi-threaded contexts normally

I think yes. From the original logs of integration test: APIServer will call sigs.k8s.io/structured-merge-diff/v6/typed.ParseableType.FromStructured() for each request in different goroutine. Please see the link in the PR description for detail.


once sync.Once
m map[string]TypeDef
m atomic.Pointer[map[string]*TypeDef]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we end up trying to make CopyInto thread-safe, let's keep this change as minimal as possible ... keep the once behavior and semantics, and just switch m to being an atomic.Pointer to the original type which we can Load()/Store() safely lock-free. Let's not change the value type to a pointer in this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will remove the last commit from this PR then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think my original proposal is better. But as you wish, now I'm keeping the once.

goos: darwin
goarch: arm64
pkg: sigs.k8s.io/structured-merge-diff/v6/schema
cpu: Apple M2 Pro
                   │  base.txt   │            fix+once.txt            │
                   │   sec/op    │   sec/op     vs base               │
FindFieldCached-12   8.542n ± 1%   8.698n ± 2%  +1.82% (p=0.000 n=10)
FindFieldNew-12      246.8n ± 5%   259.8n ± 2%  +5.27% (p=0.011 n=10)
MakeMap-12           92.38n ± 1%   93.34n ± 1%       ~ (p=0.190 n=10)
geomean              57.97n        59.53n       +2.69%

                   │    base.txt    │             fix+once.txt              │
                   │      B/op      │     B/op      vs base                 │
FindFieldCached-12     0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
FindFieldNew-12      1.188Ki ± 0%     1.195Ki ± 0%  +0.66% (p=0.000 n=10)
MakeMap-12             400.0 ± 0%       400.0 ± 0%       ~ (p=1.000 n=10) ¹
geomean                           ²                 +0.22%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                   │   base.txt   │             fix+once.txt             │
                   │  allocs/op   │ allocs/op   vs base                  │
FindFieldCached-12   0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
FindFieldNew-12      7.000 ± 0%     8.000 ± 0%  +14.29% (p=0.000 n=10)
MakeMap-12           5.000 ± 0%     5.000 ± 0%        ~ (p=1.000 n=10) ¹
geomean                         ²                +4.55%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

This change now have no significant performance impact.

@huww98 huww98 force-pushed the fix-race branch 2 times, most recently from c1ae363 to 9cd5e06 Compare October 28, 2025 15:54
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 28, 2025
@huww98 huww98 requested a review from liggitt October 28, 2025 16:04
// CopyInto this instance of Map into the other
// If other is nil this method does nothing.
// If other is already initialized, overwrites it with this instance
// Warning: Not thread safe
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't think this is thread-safe with respect to dst

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, if you mean it is not safe to read dst when we are copying, then yes. Updated doc to clarify this.

Use an additional atomic.Pointer
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 30, 2025
@liggitt
Copy link
Contributor

liggitt commented Oct 30, 2025

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 30, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: huww98, liggitt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 30, 2025
@k8s-ci-robot k8s-ci-robot merged commit 3392408 into kubernetes-sigs:master Oct 30, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants