Skip to content

Conversation

mahf708
Copy link
Contributor

@mahf708 mahf708 commented Oct 31, 2024

came up in #3068

@mahf708 mahf708 changed the title dot production along a rank-1 field dot product along a rank-1 field Oct 31, 2024
@E3SM-Bot
Copy link
Collaborator

Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects:

Pull Request Auto Testing STARTING (click to expand)

Build Information

Test Name: SCREAM_PullRequest_Autotester_Weaver

  • Build Num: 6266
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS
PULLREQUESTNUM 3084
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA 4cca117
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA e140df9
TEST_REPO_ALIAS SCREAM

Using Repos:

Repo: SCREAM (E3SM-Project/scream)
  • Branch: mahf708/field-utils/dot
  • SHA: 4cca117
  • Mode: TEST_REPO

Pull Request Author: mahf708

@E3SM-Bot
Copy link
Collaborator

Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED

Note: Testing will normally be attempted again in approx. 2 Hrs. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run.

Pull Request Auto Testing has FAILED (click to expand)

Build Information

Test Name: SCREAM_PullRequest_Autotester_Weaver

  • Build Num: 6266
  • Status: FAILED

Jenkins Parameters

Parameter Name Value
PR_LABELS
PULLREQUESTNUM 3084
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA 4cca117
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA e140df9
TEST_REPO_ALIAS SCREAM
SCREAM_PullRequest_Autotester_Weaver # 6266 FAILED (click to see last 100 lines of console output)

using GIT_SSH to set credentials jgfouca github key
Verifying host key using manually-configured host key entries
 > /home/projects/ppc64le/git/2.10.1/bin/git submodule update --init --recursive components/elm/src/external_models/sbetr # timeout=10
using GIT_SSH to set credentials jgfouca github key
Verifying host key using manually-configured host key entries
 > /home/projects/ppc64le/git/2.10.1/bin/git submodule update --init --recursive components/eam/src/physics/rrtmgp/external # timeout=10
hudson.plugins.git.GitException: Command "/home/projects/ppc64le/git/2.10.1/bin/git submodule update --init --recursive components/eam/src/physics/rrtmgp/external" returned status code 1:
stdout: 
stderr: Cloning into '/home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6266/scream/components/eam/src/physics/rrtmgp/external'...
ssh: Could not resolve hostname github.com: Name or service not known
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
fatal: clone of 'git@github.com:E3SM-Project/rte-rrtmgp.git' into submodule path '/home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6266/scream/components/eam/src/physics/rrtmgp/external' failed
Failed to clone 'components/eam/src/physics/rrtmgp/external'. Retry scheduled
Cloning into '/home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6266/scream/components/eam/src/physics/rrtmgp/external'...
ssh: Could not resolve hostname github.com: Name or service not known
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
fatal: clone of 'git@github.com:E3SM-Project/rte-rrtmgp.git' into submodule path '/home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6266/scream/components/eam/src/physics/rrtmgp/external' failed
Failed to clone 'components/eam/src/physics/rrtmgp/external' a second time, aborting

at PluginClassLoader for git-client//org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2846)
at PluginClassLoader for git-client//org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:2185)
at PluginClassLoader for git-client//org.jenkinsci.plugins.gitclient.CliGitAPIImpl$7.lambda$execute$0(CliGitAPIImpl.java:1573)
at Jenkins v2.462.2//com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
at Jenkins v2.462.2//com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
at Jenkins v2.462.2//com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at Jenkins v2.462.2//com.google.common.util.concurrent.DirectExecutorService.execute(DirectExecutorService.java:51)
at java.base/java.util.concurrent.ExecutorCompletionService.submit(ExecutorCompletionService.java:184)
at PluginClassLoader for git-client//org.jenkinsci.plugins.gitclient.cgit.GitCommandsExecutor.submitRemainingCommand(GitCommandsExecutor.java:77)
at PluginClassLoader for git-client//org.jenkinsci.plugins.gitclient.cgit.GitCommandsExecutor.invokeAll(GitCommandsExecutor.java:70)

Also: hudson.remoting.Channel$CallSiteStackTrace: Remote call to weaver
at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1826)
at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:356)
at hudson.remoting.Channel.call(Channel.java:1042)
at PluginClassLoader for git-client//org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.execute(RemoteGitImpl.java:153)
at jdk.internal.reflect.GeneratedMethodAccessor155.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:569)
at PluginClassLoader for git-client//org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.invoke(RemoteGitImpl.java:138)
at PluginClassLoader for git-client/jdk.proxy28/jdk.proxy28.$Proxy156.execute(Unknown Source)
at PluginClassLoader for git//hudson.plugins.git.extensions.impl.SubmoduleOption.onCheckoutCompleted(SubmoduleOption.java:196)
at PluginClassLoader for git//hudson.plugins.git.GitSCM._checkout(GitSCM.java:1395)
at PluginClassLoader for git//hudson.plugins.git.GitSCM.checkout(GitSCM.java:1277)
at hudson.scm.SCM.checkout(SCM.java:540)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1247)
at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:649)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:85)
at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:521)
at hudson.model.Run.execute(Run.java:1894)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:44)
at hudson.model.ResourceController.execute(ResourceController.java:101)
at hudson.model.Executor.run(Executor.java:446)
Caused: hudson.plugins.git.GitException
at PluginClassLoader for git-client//org.jenkinsci.plugins.gitclient.cgit.GitCommandsExecutor.checkResult(GitCommandsExecutor.java:89)
at PluginClassLoader for git-client//org.jenkinsci.plugins.gitclient.cgit.GitCommandsExecutor.invokeAll(GitCommandsExecutor.java:69)
at PluginClassLoader for git-client//org.jenkinsci.plugins.gitclient.cgit.GitCommandsExecutor.invokeAll(GitCommandsExecutor.java:47)
at PluginClassLoader for git-client//org.jenkinsci.plugins.gitclient.CliGitAPIImpl$7.execute(CliGitAPIImpl.java:1576)
at PluginClassLoader for git-client//org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$GitCommandMasterToSlaveCallable.call(RemoteGitImpl.java:170)
at PluginClassLoader for git-client//org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$GitCommandMasterToSlaveCallable.call(RemoteGitImpl.java:161)
at hudson.remoting.UserRequest.perform(UserRequest.java:211)
at hudson.remoting.UserRequest.perform(UserRequest.java:54)
at hudson.remoting.Request$2.run(Request.java:377)
at hudson.remoting.InterceptingExecutorService.lambda$wrap$0(InterceptingExecutorService.java:78)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused: java.io.IOException: Could not perform submodule update
at PluginClassLoader for git//hudson.plugins.git.extensions.impl.SubmoduleOption.onCheckoutCompleted(SubmoduleOption.java:201)
at PluginClassLoader for git//hudson.plugins.git.GitSCM._checkout(GitSCM.java:1395)
at PluginClassLoader for git//hudson.plugins.git.GitSCM.checkout(GitSCM.java:1277)
at hudson.scm.SCM.checkout(SCM.java:540)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1247)
at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:649)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:85)
at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:521)
at hudson.model.Run.execute(Run.java:1894)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:44)
at hudson.model.ResourceController.execute(ResourceController.java:101)
at hudson.model.Executor.run(Executor.java:446)
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash -le

cd $WORKSPACE/${BUILD_ID}/

./scream/components/eamxx/scripts/jenkins/jenkins_cleanup.sh
[SCREAM_PullRequest_Autotester_Weaver] $ /bin/bash -le /tmp/jenkins8979976397285766803.sh
POST BUILD TASK : SUCCESS
END OF POST BUILD TASK : 0
Sending e-mails to: lbertag@sandia.gov
Finished: FAILURE

template <typename ST>
Field dot_along_rank1_dim(const int &pd, const Field &f1, const Field &f2,
const ekat::Comm *co = nullptr) {
return do_dot_along_rank1_dim<ST>(pd, f1, f2, co);
Copy link
Contributor

@tcclevenger tcclevenger Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say add the implementation to the impl namespace like is done with the other utils, and just give the same name as this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered namespacing impl, impl_dot and nothing, and chose nothing for no particular reason, but I will revisit. Let's see what Luca suggests. If he doesn't have a preference, I will go with impl as you suggested

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also vote for using the impl namespace and keep the same signature.

<< l2.dim(pd)
<< " \n"
"along the provided dimension pd "
<< pd << " .\n");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add the checks above to the dot_along_rank1_dim() function like with the other utils.

#include "share/field/field.hpp"

namespace scream {

Copy link
Contributor

@tcclevenger tcclevenger Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's where I would add namespace impl {. See field_utils_impl.hpp.

const auto &u2 = f2.get_header().get_identifier().get_units();
const auto &g2 = f2.get_header().get_identifier().get_grid_name();

EKAT_REQUIRE_MSG(pd <= 5 && pd >= 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should make this super generic. The impl below is quite long mostly b/c of handling different values of pd. What if we only supported pd=0 (for contraction along horiz dim) and pd=rank-1 (for contraction along vertical dim)? We could even make two separate methods, to make the code more self-explanatory:

Field horiz_contracttion (const Field& field, const Field& weight);
Field vert_contracttion (const Field& field, const Field& weight);

or something like that. We'd of course check that the input fields have the correct tag (COL for horiz and LEV/ILEV for vert), that it is at the expected, and that the extent match. We could also make it so that the weight field has JUST the horiz (resp. vert) dimension, so that it is indeed a weight. Maybe we should make weight optional too, so

Field horiz_contraction (const Field& field, const Field* weight = nullptr);

I think this would cover most of our potential use cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two comments:

  • are we always leading with COL and ending with LEV in EAMxx? I didn't assume that in the impl
  • I see this potentially useful for one additional CMP dim (say swband)

If the answer to the first is YES, then I can simplify this impl into the following:

  • only allow pg, and only allow up to 3 dims a permutation of COL, LEV, CMP
  • contract_field_along_col (essentially as above, but pd = 0)
  • contract_field_along_lev (essentially as above, but pd = rank-1 = 2)
  • contract_field_along_cmp (essentially as above, but pd = 1)

But this is essentially just the same as saying, cut the condition on pd from [1,5] to [1,3], which is fine with me. We won't gain much simplicity, right?

Thoughts?

Copy link
Contributor

@bartgol bartgol Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We baked the assumptions that "COL comes first and LEV/ILEV come last" in a lot of places, so I think it's safe to assume that will remain the case.

Copy link
Contributor

@bartgol bartgol Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

contract_field_along_lev (essentially as above, but pd = rank-1 = 2)

Well, LEV could be at pd=0 (for a (LEV) field), 1 (for a (COL,LEV) field) ,or 2 (for a (COL,CMP,LEV) field), so we cannot assume it's 2. But we can assume it's rank-1. OTOH, for CMP we can prob assume it is at position 1, for now.

Copy link
Contributor

@bartgol bartgol Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reduction along CMP, assuming that the extent is "small", I would consider doing something like

Field out = f.get_component(0).clone(); // get output layout
out.deep_copy(0);
for (int icmp=0; icmp<num_cmp; ++icmp) {
  out.update(f.get_component(icmp),1,1);
}

Requires num_cmp loops, but they are ALL ||for, no ||reduce.

Edit: obvious mods to add also a weight field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, gimme your final advice: You don't like the complex generic impl along an arbitrary axis and I should try to produce a simple one that does narrow usecases in isolation?

Copy link
Contributor

@bartgol bartgol Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. I doubt we will ever need to reduce along a generic dimension. Our only two/three cases are COL, LEV/ILEV, and CMP. The latter is prob easier to impl via for loop and calls to update; the COL case can assume it's the 1st dim; and the LEV/ILEV case can assume it's the last dim. And we can assume we contract a single field, possibly weighted by a 1d field (of length equal to the extent along which we are contracting).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you said, these use cases will cover our most likely usage: horizontal sum/avg, vertical sum/avg, and reduction across bands (or other CMP extents).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

@bartgol bartgol force-pushed the master branch 2 times, most recently from 7114e81 to 579850c Compare November 8, 2024 23:44
@mahf708
Copy link
Contributor Author

mahf708 commented Nov 9, 2024

to be revisited in a few weeks.

@mahf708 mahf708 closed this Nov 9, 2024
Copy link
Contributor

mergify bot commented Nov 9, 2024

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 Enforce checks passing

This rule is failing.

Make sure that checks are not failing on the PR, and reviewers approved

  • #approved-reviews-by >= 1
  • any of:
    • check-skipped="gcc-openmp / .*"
    • check-success="gcc-openmp / .*"
  • any of:
    • check-skipped="gcc-cuda / .*"
    • check-success="gcc-cuda / .*"
  • any of:
    • check-skipped="cpu-gcc / .*"
    • check-success="cpu-gcc / .*"
  • any of:
    • check-skipped=cpu-gcc
    • check-success=cpu-gcc
  • #changes-requested-reviews-by == 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants