Skip to content

Conversation

darccio
Copy link
Member

@darccio darccio commented Aug 22, 2025

What does this PR do?

Caches Docker images, starting with datadog-agent.

It also improves the DRYness of our workflows by using CUE.

Motivation

We are running 12 jobs, one per contribs' set and each supported Go version, that each one pulls up to 18 images over and over, thus causing:

  • Slow CI time: each services initialization averages around 2 minutes and 30 seconds.
  • Rate limiting: we are hitting Docker Hub, only for the pull request tests of a single PR, 216 times. Any increase on the number of PRs running CI will cause rate limiting, failing our pipelines.

Reviewer's Checklist

  • Changed code has unit tests for its functionality at or near 100% coverage.
  • System-Tests covering this feature have been added and enabled with the va.b.c-dev version tag.
  • There is a benchmark for any new code, or changes to existing code.
  • If this interacts with the agent in a new way, a system test has been added.
  • New code is free of linting errors. You can check this by running ./scripts/lint.sh locally.
  • Add an appropriate team label so this PR gets put in the right place for the release notes.
  • Non-trivial go.mod changes, e.g. adding new modules, are reviewed by @DataDog/dd-trace-go-guild.

Unsure? Have a question? Request a review!

@datadog-datadog-prod-us1
Copy link

datadog-datadog-prod-us1 bot commented Aug 22, 2025

⚠️ Tests

⚠️ Warnings

🧪 1 Test failed

TestTracesAgentIntegration from github.com/DataDog/dd-trace-go/v2/ddtrace/tracer (Datadog)
Failed

=== RUN   TestTracesAgentIntegration
    transport_test.go:92: 
        	Error Trace:	/home/runner/work/dd-trace-go/dd-trace-go/ddtrace/tracer/transport_test.go:92
        	Error:      	Received unexpected error:
        	            	Post "http://localhost:8126/v0.4/traces": dial tcp [::1]:8126: connect: connection refused
        	Test:       	TestTracesAgentIntegration
--- FAIL: TestTracesAgentIntegration (0.00s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
...

ℹ️ Info

❄️ No new flaky tests detected

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 93f203b | Docs | Was this helpful? Give us feedback!

@pr-commenter
Copy link

pr-commenter bot commented Aug 22, 2025

Benchmarks

Benchmark execution time: 2025-08-26 11:43:00

Comparing candidate commit 93f203b in PR branch dario.castane/ktlo/download-agent-once-run-multiple-times with baseline commit 0441ec4 in branch dario.castane/ktlo/disable-main-branch-ci.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 24 metrics, 0 unstable metrics.

@darccio darccio force-pushed the dario.castane/ktlo/download-agent-once-run-multiple-times branch 6 times, most recently from 13741c9 to 030f1c4 Compare August 22, 2025 15:58
@darccio darccio force-pushed the dario.castane/ktlo/download-agent-once-run-multiple-times branch from 030f1c4 to b8b863e Compare August 22, 2025 16:03
# We need to specify a custom health-check. By default, this container will remain "unhealthy" since
# we don't fully configure it with a valid API key (and possibly other reasons)
# This command just checks for our ability to connect to port 8126
flags: --name datadog-agent -e DD_HOSTNAME=github-actions-worker -e DD_APM_ENABLED=true -e DD_BIND_HOST=0.0.0.0 -e DD_API_KEY=invalid_key_but_this_is_fine -e DD_TEST_AGENT_HOST=localhost -e DD_TEST_AGENT_PORT=9126 --health-cmd "bash -c '</dev/tcp/127.0.0.1/8126'" -p 8125:8125/udp -p 8126:8126
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to find a way to generate this flags as a single string, as multiline strings don't work well when manipulated in bash and later passed to a command.

Also, I want to reuse the YAML definitions somehow to make this happen. Not sure how yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUE is the way. Working on this.

@darccio darccio marked this pull request as ready for review August 22, 2025 16:12
@darccio darccio requested a review from a team as a code owner August 22, 2025 16:12
@darccio darccio force-pushed the dario.castane/ktlo/download-agent-once-run-multiple-times branch from 257db9e to 9d77f94 Compare August 25, 2025 15:47
Comment on lines +9 to +14
// Increase time WAF time budget to reduce CI flakiness
// Users may build our library with GOTOOLCHAIN=local. If they do, and our
// go.mod file specifies a newer Go version than their local toolchain, their
// build will break. Run our tests with GOTOOLCHAIN=local to ensure that
// our library builds with all of the Go versions we claim to support,
// without having to download a newer one.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments are now in CUE file, not in YAML.

@darccio darccio force-pushed the dario.castane/ktlo/download-agent-once-run-multiple-times branch from b4696b9 to fbda660 Compare August 25, 2025 16:14
Copy link
Member

@kakkoyun kakkoyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether we gain a lot of benefits by migrating everything to the cue here.

It is still pretty verbose. Maybe there are more ways to not to repeat ourselves in cue? Then again, I'm not sure whether it is a good thing to be terse and implicit.

It is definitely better for the services workflow. But for the pull-request and unit-integration workflows, I'm not sure.

@darccio
Copy link
Member Author

darccio commented Aug 26, 2025

It is definitely better for the services workflow. But for the pull-request and unit-integration workflows, I'm not sure.

I agree, but I didn't want to take on a full refactor yet. My focus was on services and avoiding duplicating versions around.

I definitely see benefits on using CUE, although it's a bit complex. See what I had to do to achieve the conversion from #Service to #Image to reuse the service definition. Once it's set up, it just works.

@darccio darccio force-pushed the dario.castane/ktlo/disable-main-branch-ci branch from 0441ec4 to d11299f Compare August 27, 2025 11:08
@darccio darccio requested review from a team as code owners August 27, 2025 11:08
@darccio darccio force-pushed the dario.castane/ktlo/disable-main-branch-ci branch 2 times, most recently from f9521df to bd8a68a Compare August 28, 2025 08:49
@darccio darccio force-pushed the dario.castane/ktlo/disable-main-branch-ci branch from ba0972d to 957c7cf Compare September 3, 2025 14:27
Base automatically changed from dario.castane/ktlo/disable-main-branch-ci to main September 4, 2025 06:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants