-
Notifications
You must be signed in to change notification settings - Fork 140
CBG-4712 /_sgcollect_info endpoint to work with non default port #7601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates the /_sgcollect_info
endpoint to support non‐default ports and random port assignment during Go integration tests by centralizing URL resolution, passing the admin URL to the Python script, always honoring a user‐specified executable, cleaning up leaked goroutines, and improving test isolation.
- Refactor Python
sgcollect.py
to:- Introduce
get_sg_url
andcan_connect_to_sg_url
for robust URL selection - Always use
--sync-gateway-executable
first, remove brokenget_absolute_path
- Close stdout/stderr pipes to avoid goroutine leaks
- Introduce
- Enhance Go
rest
package to:- Replace global singleton with per‐ServerContext
sgCollect
instance - Add
sgCollectOutputStream
to manage and close pipes cleanly - Pass the resolved admin URL into
sgcollect_info
and wire through options
- Replace global singleton with per‐ServerContext
- Improve tests to expect
SystemExit(0)
, scope instances to contexts, and cover new code paths
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
tools/sgcollect.py | Added URL‐selection helpers, removed path helper, tightened auth |
tools-tests/upload_test.py | Updated tests to expect SystemExit(0) |
tools-tests/sgcollect_info_test.py | Expanded coverage for new functions and edge cases |
rest/sgcollect.go | Introduced sgCollectOutputStream , per‐context instance, cleanup |
rest/sgcollect_test.go | Added long‐running integration test, imported missing packages |
rest/admin_api.go | Wired through adminURL , switched to per‐context sgcollect |
rest/server_context.go | Initialized sgcollect in ServerContext |
Comments suppressed due to low confidence (2)
tools/sgcollect.py:786
- The docstring says this function returns
None
on failure, but the code actually returns an empty string. Update the documentation to reflect that it returns an empty string when no path is found.
Return the path to the sync gateway binary, returns None if the path is not found.
rest/sgcollect.go:365
- [nitpick] This comment is unclear and contains a fragment (
where is it ,
). Consider rewriting to clearly explain thatsgCollectPaths
returns the SG binary, a slice of strings for the script path, and an error if not found.
// The sgcollect_info return value is allowed to be a list of strings for testing, where is it , or an error if not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One suggested change to log output from a /_sgcollect_info
REST request, otherwise looks good and tested fine
Co-authored-by: Ben Brooks <ben.brooks@couchbase.com>
rest/sgcollect_test.go
Outdated
@@ -222,3 +226,62 @@ func TestSgcollectOptionsArgs(t *testing.T) { | |||
}) | |||
} | |||
} | |||
|
|||
func TestSGCollectIntegration(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've probably already seen - but something is a bit screwed up on Windows in this test. It eventually failed after 420 seconds
2025-06-24T14:11:44.0631070Z 2025-06-24T14:10:15.092Z [INF] HTTP: c:#5360 GET /_sgcollect_info (as ADMIN)
2025-06-24T14:11:44.0631219Z 2025-06-24T14:10:17.092Z [INF] HTTP: c:#5361 GET /_sgcollect_info (as ADMIN)
2025-06-24T14:11:44.0631335Z sgcollect_test.go:272:
2025-06-24T14:11:44.0631645Z Error Trace: D:/a/sync_gateway/sync_gateway/rest/sgcollect_test.go:276
2025-06-24T14:11:44.0632218Z C:/hostedtoolcache/windows/go/1.24.4/x64/src/runtime/asm_amd64.s:1700
2025-06-24T14:11:44.0632339Z Error: Not equal:
2025-06-24T14:11:44.0632530Z expected: "stopped"
2025-06-24T14:11:44.0632720Z actual : "running"
2025-06-24T14:11:44.0632832Z
2025-06-24T14:11:44.0632999Z Diff:
2025-06-24T14:11:44.0633170Z --- Expected
2025-06-24T14:11:44.0633314Z +++ Actual
2025-06-24T14:11:44.0633471Z @@ -1 +1 @@
2025-06-24T14:11:44.0633613Z -stopped
2025-06-24T14:11:44.0633746Z +running
2025-06-24T14:11:44.0633860Z sgcollect_test.go:272:
2025-06-24T14:11:44.0634159Z Error Trace: D:/a/sync_gateway/sync_gateway/rest/sgcollect_test.go:272
2025-06-24T14:11:44.0634329Z Error: Condition never satisfied
2025-06-24T14:11:44.0634514Z Test: TestSGCollectIntegration
2025-06-24T14:11:44.0634736Z Messages: sgcollect_info did not stop running in time
2025-06-24T14:11:44.0634907Z 2025-06-24T14:10:19.091Z [INF] HTTP: c:#5362 GET /_sgcollect_info (as ADMIN)
2025-06-24T14:11:44.0635072Z 2025-06-24T14:10:19.092Z [INF] HTTP: c:#5363 DELETE /_sgcollect_info (as ADMIN)
2025-06-24T14:11:44.0635318Z 2025-06-24T14:10:19.092Z [INF] HTTP: t:TestSGCollectIntegration Closing HTTP Server: 127.0.0.1:54226
2025-06-24T14:11:44.0635555Z 2025-06-24T14:10:19.093Z [INF] HTTP: t:TestSGCollectIntegration Closing HTTP Server: 127.0.0.1:54227
2025-06-24T14:11:44.0635792Z 2025-06-24T14:10:19.093Z [INF] HTTP: t:TestSGCollectIntegration Closing HTTP Server: 127.0.0.1:54228
2025-06-24T14:11:44.0636042Z 2025-06-24T14:10:19.093Z [INF] HTTP: t:TestSGCollectIntegration Closing HTTP Server: 127.0.0.1:54225
2025-06-24T14:11:44.0636144Z --- FAIL: TestSGCollectIntegration (420.12s)
tools/tasks.py
Outdated
@@ -178,7 +178,7 @@ def execute(self, fp): | |||
p = subprocess.Popen( | |||
self.command, | |||
bufsize=-1, | |||
stdin=subprocess.PIPE, | |||
stdin=subprocess.DEVNULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was surprised to see this is not a change already made in cbcollect_info if it's actually required, and I am also not sure quite why this change?
Surely sgcollect_info was not completely broken on Windows up until this point?
The commit message isn't really indicative of the change being made. There's nothing Windows or CI specific about this change, and we're not closing stdin.
Will this change impact interactive password prompts in future? I guess maybe not since that's not stdin, and it's instead just a prompt from the program.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We close stdin right after launching this, but this makes sure it is already closed and is more correct. I wondered if there was a race condition. sgcollect isn't designed to collect any data from stdin for any diagnostic programs it runs.
Shows wmic
running at:
2025-06-24T14:03:48.830Z [INF] c:SGCollect-5382 sgcollect_info: Computer system (wmic computersystem) - OK
And then the test runs until:
2025-06-24T14:10:39.220Z [INF] HTTP: c:#5592 GET /_sgcollect_info (as ADMIN)
which is 7 minutes. I guessed on two reasons for this:
- wmic is slow - it would have been running
WindowsTask("Installed software", "wmic product get name, version"),
for 7 minutes, which seems weird, because that seems like it would normally be fast. That's why I thought it could be asking for something from stdin. - wmic is also known to be slow, when getting the event log. That's the most useful behavior it can do, and also the slowest. So this is expensive for the last 30 days, but on a github actions machine it doesn't stay running for very long. This is the equivalent of the systemd/dmesg logs for windows.
wmic is in fact deprecated and going to be removed - https://learn.microsoft.com/en-us/windows/win32/wmisdk/wmic
That said, I'm going to have another commit for moving the tests to a separate package so it doesn't bang up against the 10 minute timeout for the rest package. When running the tests via github, this will also increase parallelism since packages are run simultaneously.
CBG-4712 /_sgcollect_info endpoint to work with non default port
This fix was driven by the desire to have integration testing from go, and doing integration testing in go requires listening on ports that are not :4985 - it uses :0 and picks a port randomly.
Fixes made:
--sync-gateway-url
from/_sgcollect_info
--sync-gateway-url
,http://127.0.0.1:4985
,https://127.0.0.1:4985
)--sync-gateway-executable
first if present, rather than picking up default locationsget_absolute_path
function since this would always throw an exception since upgrade to python3 (found with parsing log output for errors, then using python typing). This function mixes up bytes and str.Testing only improvements:
sgCollect
instance to theServerContext
. In practice, this is the same for a sync gateway executable but for testing it means not worrying about cleaning up globals since we can modify the ServerContext and it is torn down between tests. This also allowssgCollectPaths
to use a ctx with logging information on it, for filtering.sgCollectOutputStream
struct to make handling output a bit easier. This needs to be closed if the command doesn't start (e.g. path not found). For testing, we need to wait for the output to be closed before reading it. Because it is buffered output, it lasts a bit longer thancmd.Wait()
and the race detector is tripped.sgCollect.stderr
/sgCollect.stdout
to monitor errors from sgcollect output.sgCollectBinary
as a []string{} so it can be triggered viapython sgcollect_info
. This is only needed for testing, in normal production this will be a single executable that self extracts a python installation.I think I'd like to set a build flag or environment variable so this test doesn't run as a part of standard dev runs, it is just too slow.
Pre-review checklist
fmt.Print
,log.Print
, ...)base.UD(docID)
,base.MD(dbName)
)docs/api
Integration Tests
GSI=true,xattrs=true
https://jenkins.sgwdev.com/job/SyncGateway-Integration/3179/