Skip to content

CBG-4687 have sgcollect_info use a token #7596

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

CBG-4687 have sgcollect_info use a token #7596

wants to merge 1 commit into from

Conversation

torcolvin
Copy link
Collaborator

CBG-4687 have sgcollect_info use a token

  • Created a field on handlers that marks if is sgcollect that indicates whether sgcollect will be called.
  • Create a specific token for performing these operations
  • Switch sgcollect code pass auth headers rather than explicit username/password

Risk mitigation/testing strategy:

  • Modifying the python code comes with some inherent risk due to type safety. I set up mypy to be able to run type checking and uv to freeze the environment. I can pull these changes out of this PR.
  • Created two types of integration testing:
    • test endpoints with specific tokens
    • create an integration test that runs with rosmar/couchbase server on all platforms.
  • Integration test for /_sgcollect_info comes at some risk and complexity, but it was useful to find bugs:
  • Test seams:
    • modify sgcollectPath and sgcollectPathErr for testing. I don't know why these are globals in the first place. To run the tests on windows, you need to run python tools/sgcollect_info instead of tools/sgcollect_info since executing the file works on mac/linux by reading the shebang line. This lead to the weird modification of sgcollectPath to be a []string{} to fit in for testing. In production sgcollect_info is a self extracting binary that runs python so this is not used in a non testing path.
    • I wanted to test to see if all the endpoints were hit and return 200. I don't know a way to test this using stats on the admin API, so I created sgCollect.stdout for testing only! I think there's some value in looking for errors, I found an exception we should handle in python get_paths_from_expvars if there was no file named *.json in the command line.
  • Actually running /_sgcollect_info is really useful, but it's really slow. It's about 1 minute locally (the bulk of the time is collecting the pprofs) and 6 minutes in github actions. I think this is is also subject to failure if there isn't python on the OS, or there's odd configurations. We run sgcollect_info in prod on python 3.9 (but it runs on 3.9 -> newest). I do not like having a slow test here. I considered running it only in jenkins, but I think this is actually some code where platform matter since there are "real paths" here. I do not think this is generally useful to run for the amount of time it takes.

Possible future refactors:

  • Removing sgcollectPath globals, this is really weird and if you are executing a subprocess doing a path manipulation is so fast. If we really did want to optimize it, you could add a sync.Once to ServerContext.
  • Rewriting sgcollect in go. This sounds really big, but this is a real mess to test and we could potentially share code in a library where /_sgcollect_info endpoint and an independent binary work together.

For the reviewer:

  • Input on the testing strategy, and whether we should drop some of this from CI
  • Manual testing with real SG and /_sgcollect_info endpoint and sgcollect_info executable

TODO

  • fix the -race condition that occurs because I try to buffer stdout to a channel in the test instance
  • Related to above, I think setting cmd.Stdout to io.Pipe might cause dangling go routines

Pre-review checklist

  • Removed debug logging (fmt.Print, log.Print, ...)
  • Logging sensitive data? Make sure it's tagged (e.g. base.UD(docID), base.MD(dbName))
  • Updated relevant information in the API specifications (such as endpoint descriptions, schemas, ...) in docs/api

Integration Tests

Comment on lines +1676 to +1685
addr, err := h.server.getServerAddr(adminServer)
if err != nil {
return base.HTTPErrorf(http.StatusInternalServerError, "Error getting admin server address: %v", err)
}
if h.server.Config.API.HTTPS.TLSCertPath != "" {
addr = "https://" + addr
} else {
params.adminURL = "http://" + addr
}
params.adminURL = addr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whilst I agree this might be useful to do to avoid the sgcollect_info script iterating to find a working localhost URL as it does today, this is not related to the token feature and we should evaluate and implement each separately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try this URL and fall back to localhost

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved this into #7601 to be able to write the tests

@@ -2277,3 +2280,31 @@ func (sc *ServerContext) getClusterUUID(ctx context.Context) (string, error) {
}
return base.ParseClusterUUID(output)
}

type sgcollectCookieManager struct {
tokens map[string]struct{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we ever expect to support more than 1 valid token at any one time?

AFAIK since /_sgcollect_info is a singleton and has sync mechanisms in place to avoid multiple runs, this isn't and won't be required.

I think this is adding complexity by maintaining a map, since it's more likely that we can unintentionally build up a set of valid/usable tokens.

It might be better with a single token string that has a timestamp to also allow expiry after a period of time. This covers the case where the sgcollect info script hangs indefinitely and keeps the token alive, or any situations where we may be unintentionally not deleting the token from the map.

uv run -- pytest
- name: Run mypy
run: |
uv run -- mypy tools/sgcollect_info tools
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not super keen on changing dependency management for the sake of it but if we do want to do it - again we should do it independent of feature PRs.

return args
}

// sgCollectPaths attempts to return the absolute paths to Sync Gateway and to sgcollect_info binaries.
func sgCollectPaths() (sgBinary, sgCollectBinary string, err error) {
func sgCollectPaths() (sgBinary string, sgCollectBinary []string, err error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what circumstances does sgCollectBinary need to be a slice? I'm confused...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is ["python", "tools/sgcollect_info"] when called in tests (only!) but always just sgcollect_info which is a self extracting binary in prod.

@@ -28,6 +28,10 @@ import (
"github.com/pkg/errors"
)

const (
adminAPITokenHeader = "SG_BEARER_TOKEN"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Header->EnvVar

return {"Authorization": f"Bearer {session_token}"}
elif username:
base64string = base64.b64encode(
(f"{username}: {password}").encode("utf-8")
Copy link
Member

@bbrks bbrks Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leading space is breaking password here

session_token_name = "SG_BEARER_TOKEN"
session_token = os.environ.get(session_token_name)
if session_token:
return {"Authorization": f"Bearer {session_token}"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace Bearer with SGCollectToken

"""
Return the headers for authentication.
"""
session_token_name = "SG_BEARER_TOKEN"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG_BEARER_TOKEN -> SG_COLLECT_TOKEN

@bbrks
Copy link
Member

bbrks commented Jun 23, 2025

@torcolvin ready for a rebase now the other changes have been split out and merged (#7597)

@torcolvin
Copy link
Collaborator Author

Split into:

#7597
#7601
#7604

@torcolvin torcolvin closed this Jun 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants