Skip to content

bktec upload: upload test results to Test Engine#278

Open
pda wants to merge 9 commits intomainfrom
bktec-upload
Open

bktec upload: upload test results to Test Engine#278
pda wants to merge 9 commits intomainfrom
bktec-upload

Conversation

@pda
Copy link
Copy Markdown
Member

@pda pda commented Mar 14, 2025

Teach bktec to upload test results to Test Engine.

$ bktec upload junit.xml
+++ Buildkite Test Engine Client: bktec dev


______ ______ _____
___  /____  /___  /____________
__  __ \_  //_/  __/  _ \  ___/
_  /_/ /  ,<  / /_ /  __/ /__
/_.___//_/|_| \__/ \___/\___/

2025/03/14 17:47:47 INFO Uploading key=01959384-2d0d-7318-8af7-4833b6eaff32 format=junit filename=test.xml
2025/03/14 17:47:48 INFO Upload successful upload_id=01959384-2d8a-7dad-9b97-37c1b4a0e67e upload_url=https://buildkite.com/organizations/example/analytics/suites/example/uploads/01959384-2d8a-7dad-9b97-37c1b4a0e67e

@pda pda requested a review from a team March 14, 2025 07:23
Comment thread main.go Outdated
Comment thread internal/upload/upload.go Outdated
Comment thread internal/upload/upload.go Outdated
Comment thread internal/upload/upload.go Outdated
Copy link
Copy Markdown
Contributor

@nprizal nprizal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rule them all

Base automatically changed from refactor-env to main March 25, 2025 07:15
@pda
Copy link
Copy Markdown
Member Author

pda commented Apr 29, 2026

🤖 Rebased bktec-upload onto latest origin/main and force-pushed.

The rebase had non-trivial conflicts because main has moved on substantially:

  1. main.go / cli.gomain now uses urfave/cli/v3 with proper subcommands (run, plan, tools backfill-commit-metadata). The original PR added an ad-hoc if flag.Arg(0) == "upload" branch in main. I replaced that with a real upload subcommand wired through the cli/v3 framework: a new uploadAction in main.go and an upload entry in cliCommand.Commands in cli.go.

  2. internal/env package was removed on main (commit 30534b0 "Remove env package"; env vars are now read via cli.EnvVars(...) flags). The upload package depended on env.Env / env.Map. I inlined a minimal local equivalent inside internal/upload/upload.go:

    type Env interface { Get(string) string; Lookup(string) (string, bool) }
    type OS struct{}
    type Map map[string]string

    This keeps the existing tests almost identical (just env.Map{...}Map{...}).

  3. internal/upload/upload.goUploadCLI(*flag.FlagSet, env.Env) (which used the now-incompatible flag package wiring) was replaced with UploadFile(ctx, env, filename). Same logic, but called from uploadAction in main.go.

  4. go.mod / go.sum — took main's deps + added github.com/google/uuid v1.6.0 as a direct require; ran go mod tidy.

Verification:

  • go build ./...
  • go test ./internal/upload/... ✅ all 11 tests pass
  • bktec --help shows the new upload subcommand
  • bktec upload --help displays correctly
  • Pre-existing test failures on this machine (internal/api missing pact_ffi, internal/runner jest/cucumber/cypress env issues) reproduce on main and are unrelated to this branch.

@socket-security
Copy link
Copy Markdown

socket-security Bot commented Apr 29, 2026

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addedgolang/​github.com/​google/​uuid@​v1.6.0100100100100100

View full report

pda and others added 8 commits April 29, 2026 19:35
Replaces the local Env/OS/Map types and ConfigFromEnv() with cli/v3
flags scoped to the upload subcommand:

  --token       BUILDKITE_ANALYTICS_TOKEN -> uploadConfig.SuiteToken
  --upload-url  BUILDKITE_TEST_ENGINE_UPLOAD_URL -> uploadConfig.UploadUrl

This brings upload into the same idiom as run, plan, and tools
backfill-commit-metadata, where build/runtime configuration is bound
through cli.EnvVars(...) rather than a custom env abstraction.

RunEnvFromEnv now takes an EnvLookup function (matching the signature
of os.LookupEnv directly), simplifying both the production call site
(os.LookupEnv) and tests (a small mapLookup helper).

Amp-Thread-ID: https://ampcode.com/threads/T-019dd7a1-d422-717e-aac1-a48f10021895
Co-authored-by: Amp <amp@ampcode.com>
http.DefaultClient has no Timeout, so a stuck connection or slow
response would block the upload indefinitely. Switch to a package
http.Client with a 5 minute Timeout, which is generous for typical
test result files but bounded enough to fail clearly.

Amp-Thread-ID: https://ampcode.com/threads/T-019dd7a1-d422-717e-aac1-a48f10021895
Co-authored-by: Amp <amp@ampcode.com>
Filename-extension inference is brittle: results files can have
unconventional names (no extension, .junit, .results, .out, etc).

  --format junit
  --format json

When --format is empty, behaviour is unchanged: format is inferred
from .xml -> junit and .json -> json. Invalid formats produce a clear
error rather than a server-side rejection.

Amp-Thread-ID: https://ampcode.com/threads/T-019dd7a1-d422-717e-aac1-a48f10021895
Co-authored-by: Amp <amp@ampcode.com>
Brings upload requests in line with internal/api:

  User-Agent: Buildkite Test Engine Client/<version> (<os>/<arch>)

and adds an exponential-backoff retry loop (5 attempts, ~500ms base
delay, jitter) for network errors, 429, and 5xx responses. The upload
API has been seen returning transient 5xx, and previously the client
would fail the whole job on the first hiccup.

Body bytes are snapshotted up front so each retry attempt rebuilds a
fresh request with a re-readable body. Per-attempt timeouts are still
bounded by the package httpClient.Timeout (5 minutes).

Amp-Thread-ID: https://ampcode.com/threads/T-019dd7a1-d422-717e-aac1-a48f10021895
Co-authored-by: Amp <amp@ampcode.com>
Adds:

- TestUpload_RetriesOn5xxThenSucceeds: confirms transient 5xx
  responses are retried and the eventual 202 is returned cleanly.
- TestUpload_DoesNotRetryOn4xx: confirms client errors short-circuit
  the retry loop (no wasted retries on bad token / bad request).
- TestUpload_SetsUserAgent: confirms the bktec User-Agent header is
  attached to upload requests.
- TestUploadFile_EndToEnd: covers the integration of token validation,
  format inference, env-derived run metadata, and the HTTP send.
- TestUploadFile_MissingToken: confirms missing SuiteToken errors out
  with the expected message.
- TestUploadFile_FormatOverride: confirms an explicit --format value
  is used even when the filename has no usable extension.
- TestUploadFile_FormatInferenceFailsWithoutExtension: confirms the
  helpful error when neither --format nor a recognised extension is
  available.

Also fixes createTestXML to produce filenames ending in .xml (the
previous os.CreateTemp("", "test.xml") put the random suffix
*after* the extension, hiding format inference).

Amp-Thread-ID: https://ampcode.com/threads/T-019dd7a1-d422-717e-aac1-a48f10021895
Co-authored-by: Amp <amp@ampcode.com>
The previous "file does not exist: %s" message obscured permission
errors, EACCES on parent dirs, EIO, etc. Wrapping with %w preserves
errors.Is(err, fs.ErrNotExist) for callers that want to distinguish
missing files from other failure modes.

Amp-Thread-ID: https://ampcode.com/threads/T-019dd7a1-d422-717e-aac1-a48f10021895
Co-authored-by: Amp <amp@ampcode.com>
Replaces the EnvLookup function-typed parameter with a typed BuildEnv
struct populated from cli flags, sharing the existing buildIDFlag,
branchFlag, and jobIDFlag with bktec run / plan, and adding
commitFlag, messageFlag, buildNumberFlag, buildUrlFlag bound to new
config.Config fields (Commit, Message, BuildNumber, BuildUrl).

The upload package no longer touches os.Environ at all: env access is
fully consolidated in cli.go where every other command reads its
inputs. RunEnvFromEnv is replaced with RunEnvFromBuildEnv that takes
the typed struct directly, which also makes the Buildkite-vs-generic
detection more obvious (build.BuildId != "").

Amp-Thread-ID: https://ampcode.com/threads/T-019dd7a1-d422-717e-aac1-a48f10021895
Co-authored-by: Amp <amp@ampcode.com>
@pda
Copy link
Copy Markdown
Member Author

pda commented Apr 29, 2026

🤖 Implemented the eight review suggestions from a local amp/Opus review as separate commits, all pushed to bktec-upload:

  1. upload: wire token/url via cli flags, drop Env interface — Replaces the local Env/OS/Map types and ConfigFromEnv() with cli/v3 flags scoped to the upload subcommand: --token (BUILDKITE_ANALYTICS_TOKEN) and --upload-url (BUILDKITE_TEST_ENGINE_UPLOAD_URL). Brings upload into the same idiom as run, plan, and tools backfill-commit-metadata. RunEnvFromEnv now takes an EnvLookup function (later removed in commit 8).

  2. upload: cap HTTP requests with a 5 minute timeouthttp.DefaultClient has no Timeout; switched to a package http.Client with a 5 minute timeout to bound stuck connections.

  3. upload: add --format flag to override extension-based detection — New --format junit|json flag (env: BUILDKITE_TEST_ENGINE_UPLOAD_FORMAT). When unset, behaviour is unchanged (.xml → junit, .json → json). Invalid formats fail fast client-side.

  4. upload: add User-Agent header and retry on transient failures — Sets User-Agent: Buildkite Test Engine Client/<version> (<os>/<arch>) matching internal/api. Adds an exponential-backoff retry loop (5 attempts, 500ms base, jitter via roko) for network errors, 429, and 5xx responses. Body bytes are snapshotted so retries replay cleanly.

  5. CHANGELOG: add entry for bktec upload — Adds an Unreleased section to CHANGELOG.md.

  6. upload: add tests for UploadFile, retries, 4xx, User-Agent, --format — Seven new tests covering UploadFile end-to-end, retry-on-5xx-then-success, no-retry-on-4xx, User-Agent header, missing token, format override, format inference failure. Also fixes createTestXML to produce filenames ending in .xml (the previous os.CreateTemp("", "test.xml") put the random suffix after the extension).

  7. upload: wrap os.Stat error to expose underlying cause"file does not exist: %s""cannot stat %s: %w", preserving errors.Is(err, fs.ErrNotExist) for callers and surfacing permission errors etc.

  8. upload: read build environment from cfg via cli flags — Replaces the EnvLookup parameter with a typed BuildEnv struct populated from cli flags. Reuses the existing buildIDFlag, branchFlag, jobIDFlag and adds commitFlag, messageFlag, buildNumberFlag, buildUrlFlag bound to new config.Config fields (Commit, Message, BuildNumber, BuildUrl). The upload package no longer touches os.Environ at all.

Verification: go build ./... ✅, go test ./internal/upload/ ./internal/config/ ✅. Pre-existing failures on this machine (internal/api missing pact_ffi, internal/runner jest/cucumber/cypress env issues) reproduce on main and are unrelated.

@pda
Copy link
Copy Markdown
Member Author

pda commented Apr 29, 2026

🤖 Cumulative review of bktec-upload (after rebase + 9 refinement commits)

The feature is in good shape — go vet is clean, go test ./internal/upload/... is green, retries/UA/timeout look right. Below are the things I'd resolve before merge, ordered roughly by impact.

Higher impact

1. Upload-only fields on config.Config will leak into run/plan API payloads

internal/config/config.go gained Commit, Message, BuildNumber, BuildUrl, all with json:"BUILDKITE_*" tags. *config.Config is embedded as the env field in two outbound Test Engine payloads:

  • internal/api/filter_tests.go:14Env *config.Config \json:"env"``
  • internal/api/post_test_plan_metadata.go:19Env *config.Config \json:"env"``

Because the commit/message/build-number/build-url flags are only registered on the upload subcommand, run and plan never populate them — but the JSON tags mean they will now show up as empty strings in filter-tests and post-test-plan-metadata requests. That's a subtle, cross-cutting behaviour change in unrelated requests.

Suggested fix (cheapest): give the upload subcommand its own local struct (mirror of BuildEnv) that the flag Destinations point at, and drop the new fields from config.Config. Or, if you'd rather keep them on cfg for symmetry, mark them json:"-".

2. Multipart filename leaks the local path

part, err := b.writer.CreateFormFile("data", file.Name())

file.Name() is whatever was passed to os.Open — frequently absolute or workspace-rooted. The sibling UploadToS3 already does the right thing: filepath.Base(filePath). This is both a small privacy/info-disclosure issue and an inconsistency with the existing uploader in the same package.

Easy fix: b.writer.CreateFormFile("data", filepath.Base(file.Name())), or thread the original filename through and call filepath.Base once.

3. internal/upload now mixes two unrelated upload responsibilities

The package previously housed UploadToS3 (presigned-POST, no auth header, used by backfill). It now also houses Test Engine results upload (bearer-style suite token, retries, run_env multipart). Different destinations, different auth, different retry policy, different payload shape — the only commonality is "we POST a multipart body".

I'd split: move the new code to internal/testresults (or internal/resultsupload) and leave S3 backfill alone. If you'd prefer to defer the package split, at least narrow the surface area (see #6) so the package isn't double-exporting Upload*/MultipartBody symbols with overlapping names.

Medium impact

4. CLI/logging conventions differ from the rest of the codebase

uploadAction calls debug.SetDebug(...) but, unlike plan/backfillCommitMetadata, doesn't set debug.SetOutput(os.Stderr). Since bktec upload is interactive-ish and may have its stdout captured/piped, debug output going to stdout will pollute the upload's final messages.

Also, internal/upload/upload.go uses slog.Info/slog.Warn for user-facing progress — nothing else in this repo uses log/slog. The convention is fmt.Fprintf(os.Stderr, ...) for user-visible messages and internal/debug for diagnostics. Suggest dropping slog and matching the style used by BackfillCommitMetadata.

A short startup banner (matching plan/backfill's one-liner — not the run ASCII art) would also help users confirm which version of bktec ran.

5. Retry policy comment overstates parity with internal/api

The comment on Upload says it matches internal/api's behaviour, but they're materially different:

internal/api new upload
attempt cap TryForever() WithMaxAttempts(5)
total budget 130s context wrapping the loop none — bounded only by http.Client.Timeout × 5
per-attempt deadline 15s context.WithTimeout none (5min client timeout)

Worst-case the upload retry loop can sit for ~25 minutes on a hung server with a context.Background() parent, vs. ~2 minutes for internal/api. The new policy probably is the right policy for uploads (15s is too short for big bodies), but I'd either:

  • soften/remove the "matches the behaviour of the internal/api client" wording, or
  • factor an explicit shared retry-classification helper and document the deliberate differences.

6. Excess exported surface in internal/upload

Only the CLI wiring needs UploadFile, BuildEnv, and Config. These can be unexported:

  • Upload (covered by UploadFile; tests live in the same package)
  • RunEnvFromBuildEnv
  • RunEnvMap
  • MultipartBody and all four methods
  • NewMultipartBody

Tests are in package upload so unexporting won't break them. Smaller surface = clearer contract = easier to split the package later (see #3).

7. --token is ambiguous next to existing --access-token

bktec upload --token … is a suite token (analytics ingest), while every other Test Engine call uses --access-token (Buildkite API token). Consider --suite-token or --analytics-token (env stays BUILDKITE_ANALYTICS_TOKEN). The current error string is also a bit terse:

return fmt.Errorf("BUILDKITE_ANALYTICS_TOKEN missing")

Compare with the existing pattern: "--suite-token / BUILDKITE_ANALYTICS_TOKEN must not be blank".

8. Response key inconsistency between code and tests

  • TestUpload server returns {"id":"theuuid","url":"..."}
  • TestUpload_RetriesOn5xxThenSucceeds returns {"upload_url":"..."}
  • Production code logs respData["upload_url"]

Pick one and match the real API. A typed response struct (type uploadResponse struct { ID, URL string } or whatever the API actually returns) is preferable to the current map[string]string dance, and would have caught this drift.

Lower impact / cleanup

9. Validation for upload is split across CLI arg checks and lower-level code

Consider a small cfg.ValidateForUpload() (or local equivalent on the new upload-local struct) covering: token present, exactly one positional arg, format valid if set, upload-url parseable. Keeps the validation pattern consistent with ValidateForRun/ValidateForPlan/ValidateForBackfillCommitMetadata.

10. MultipartBody.writer stored by value

type MultipartBody struct {
    writer multipart.Writer       // not *multipart.Writer
    buf    *bytes.Buffer
}
…
writer: *multipart.NewWriter(buf),

Currently safe (multipart.Writer is constructed once and not copied again), but *multipart.Writer is the documented type and removes any "did this get copied?" questions. Trivial change.

11. Whole body buffered in memory, then snapshot for retries

Fine for typical JUnit/JSON sizes, and the snapshot via body.buf.Bytes() is safe because nothing mutates the buffer after Close. Worth noting in a comment so a future change to MultipartBody doesn't accidentally break retry safety. If very large result files become a real use case, switch to a temp-file-backed body with req.GetBody.

12. Cosmetic

  • userAgent is computed at package init — fine, but internal/api builds it per-request inside RoundTrip. Either is OK; minor inconsistency.
  • --upload-url being Hidden: true is consistent with --base-url in this repo. ✓
  • cfg.Output, cfg.UploadFile, cfg.Days, etc. all have json:"-" to keep them out of the env payload — same treatment for Copy over existing project from Buildkite #1 if you go that route.
  • No docs/upload.md — every other runner/feature has a docs page; worth adding for parity (separate PR is fine).

What looks good

  • Token never logged; only key/format/filename/url go into log lines.
  • 5-min http.Client.Timeout is a reasonable default and covers the "DefaultClient waits forever" footgun.
  • Retry classification is correct: 4xx breaks (r.Break() + return non-nil err short-circuits the loop), 429/5xx/network err retry, body bytes are re-readable on each attempt.
  • errors.Is(err, fs.ErrNotExist) test for the wrapped os.Stat error is a nice touch.
  • TestUpload_DoesNotRetryOn4xx and TestUpload_SetsUserAgent are exactly the right shape.
  • BuildEnv field-by-field copy in main.go is slightly stuttery but a deliberate firewall between internal/upload and internal/config — happy to keep that boundary.

Minimum changes I'd ask for before merge

  1. get the upload-only fields off config.Config (or json:"-" them) so run/plan payloads don't change shape.
  2. filepath.Base for the multipart filename.
  3. debug.SetOutput(os.Stderr) in uploadAction, and drop slog.

Everything else is cleanup or follow-up — feel free to defer.

@pda
Copy link
Copy Markdown
Member Author

pda commented Apr 29, 2026

*config.Config is embedded as the env field in two outbound Test Engine payloads

This is something we should fix outside this PR, I think.

It's unusual to embed a local config struct into an API payload, unless it's specifically a usage telemetry request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants