-
Notifications
You must be signed in to change notification settings - Fork 8
Add AMD/HIP Autotesting #476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #476 +/- ##
=======================================
Coverage 93.42% 93.42%
=======================================
Files 303 303
Lines 25171 25171
Branches 2766 2766
=======================================
Hits 23517 23517
Misses 1654 1654 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
0b01e95
to
73416a7
Compare
To update: I'm debugging a little weirdness with the AMD runners, but this is otherwise good to go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thanks, Mike!
a4696d1
to
24f926c
Compare
@singhbalwinder @odiazib @overfelt @jaelynlitz For anyone not closely following this PR, it is rebased onto the branch for PR #477 that brings in the However... it does produce a lot more failing tests in this PR, many of which are duplicated in #477, and I will add a similar list of failing tests to that PR. What do we want to do about the newly-failing tests, with the change in the compare script, and potentially some newly-exposed fails on AMD MI200-series GPUs? My thoughts are:
CUDA Fails - Release/Double 48 - validate_mer07_veh02_nuc_mosaic_1box (Failed)
60 - validate_calcsize_compute_dry_volume (Failed)
62 - validate_stand_modal_aero_calcsize_sub (Failed)
74 - validate_ma_precpprod (Failed)
90 - validate_compute_massflux_small (Failed)
134 - validate_pcarbon_aging_1subarea (Failed)
202 - validate_calc_1_impact_rate_ts_0 (Failed)
204 - validate_modal_aero_bcscavcoef_get_ts_355 (Failed)
212 - validate_baseline_aero_model_wetdep_ts_379 (Failed)
214 - wetdep_compare_clddiag_output (Failed)
222 - wetdep_compare_wetdep_prevap_130_output (Failed)
224 - wetdep_compare_wetdep_prevap_230_output (Failed)
234 - wetdep_compare_wetdep_scavenging_true_output (Failed)
236 - wetdep_compare_wetdep_scavenging_false_output (Failed)
240 - wetdep_compare_rain_mix_ratio_output (Failed)
246 - wetdep_compare_wetdep_resusp_130_output (Failed)
248 - wetdep_compare_wetdep_resusp_230_output (Failed)
364 - validate_linmat_ts_355 (Failed)
366 - validate_nlnmat_ts_355 (Failed)
368 - validate_imp_prod_loss_ts_355 (Failed)
370 - validate_newton_raphson_iter_ts_355 (Failed)
408 - validate_maxsattype1_merged (Failed)
410 - validate_maxsattype2_merged (Failed)
498 - validate_lin_strat_chem_solve_ts_1415 (Failed)
500 - validate_lin_strat_sfcsink_ts_1415_multicol (Failed)
502 - validate_lin_strat_sfcsinkmulticol_merged (Failed)
512 - validate_chm_diags_ts_355 (Failed)
538 - validate_calc_het_rates_merged (Failed)
540 - validate_calc_precip_rescale_merged (Failed)
546 - validate_sethet_merged (Failed)
554 - validate_calc_sox_aqueous_ts_355_merged (Failed)
566 - validate_calc_diag_spec_ts_355 (Failed)
580 - validate_modal_aero_lw_ts_355 (Failed)
584 - validate_update_aod_spec_ts_355 (Failed)
586 - validate_aer_rad_props_lw_ts_355 (Failed)
588 - validate_aer_rad_props_sw_ts_355 (Failed)
590 - validate_volcanic_cmip_sw_ts_355 (Failed)
616 - validate_mam_soaexch_1subarea_ts_379 (Failed)
618 - validate_gas_aer_uptkrates_1box1gas_ts_379 (Failed)
620 - validate_mam_gasaerexch_1subarea_ts_379 (Failed)
622 - validate_vert_interp_ts_300 (Failed)
624 - validate_vert_interp_col_ts_300 (Failed) HIP Fails - Release/Double 48 - validate_mer07_veh02_nuc_mosaic_1box (Failed)
60 - validate_calcsize_compute_dry_volume (Failed)
62 - validate_stand_modal_aero_calcsize_sub (Failed)
74 - validate_ma_precpprod (Failed)
90 - validate_compute_massflux_small (Failed)
134 - validate_pcarbon_aging_1subarea (Failed)
202 - validate_calc_1_impact_rate_ts_0 (Failed)
204 - validate_modal_aero_bcscavcoef_get_ts_355 (Failed)
212 - validate_baseline_aero_model_wetdep_ts_379 (Failed)
214 - wetdep_compare_clddiag_output (Failed)
222 - wetdep_compare_wetdep_prevap_130_output (Failed)
224 - wetdep_compare_wetdep_prevap_230_output (Failed)
234 - wetdep_compare_wetdep_scavenging_true_output (Failed)
236 - wetdep_compare_wetdep_scavenging_false_output (Failed)
240 - wetdep_compare_rain_mix_ratio_output (Failed)
246 - wetdep_compare_wetdep_resusp_130_output (Failed)
248 - wetdep_compare_wetdep_resusp_230_output (Failed)
364 - validate_linmat_ts_355 (Failed)
366 - validate_nlnmat_ts_355 (Failed)
368 - validate_imp_prod_loss_ts_355 (Failed)
370 - validate_newton_raphson_iter_ts_355 (Failed)
408 - validate_maxsattype1_merged (Failed)
410 - validate_maxsattype2_merged (Failed)
498 - validate_lin_strat_chem_solve_ts_1415 (Failed)
500 - validate_lin_strat_sfcsink_ts_1415_multicol (Failed)
502 - validate_lin_strat_sfcsinkmulticol_merged (Failed)
512 - validate_chm_diags_ts_355 (Failed)
538 - validate_calc_het_rates_merged (Failed)
540 - validate_calc_precip_rescale_merged (Failed)
546 - validate_sethet_merged (Failed)
554 - validate_calc_sox_aqueous_ts_355_merged (Failed)
566 - validate_calc_diag_spec_ts_355 (Failed)
580 - validate_modal_aero_lw_ts_355 (Failed)
584 - validate_update_aod_spec_ts_355 (Failed)
586 - validate_aer_rad_props_lw_ts_355 (Failed)
588 - validate_aer_rad_props_sw_ts_355 (Failed)
590 - validate_volcanic_cmip_sw_ts_355 (Failed)
616 - validate_mam_soaexch_1subarea_ts_379 (Failed)
618 - validate_gas_aer_uptkrates_1box1gas_ts_379 (Failed)
620 - validate_mam_gasaerexch_1subarea_ts_379 (Failed)
622 - validate_vert_interp_ts_300 (Failed)
624 - validate_vert_interp_col_ts_300 (Failed) The following are unclear because the 421 - run_stand_dropmixnuc_ts_1407 (Subprocess aborted)
422 - validate_stand_dropmixnuc_ts_1407 (Failed)
423 - run_dropmixnuc_ts_1400 (Subprocess aborted)
424 - validate_dropmixnuc_ts_1400 (Failed)
425 - run_dropmixnuc_ts_1401 (Subprocess aborted)
426 - validate_dropmixnuc_ts_1401 (Failed)
427 - run_dropmixnuc_ts_1402 (Subprocess aborted)
428 - validate_dropmixnuc_ts_1402 (Failed)
429 - run_dropmixnuc_ts_1403 (Subprocess aborted)
430 - validate_dropmixnuc_ts_1403 (Failed)
431 - run_dropmixnuc_ts_1404 (Subprocess aborted)
432 - validate_dropmixnuc_ts_1404 (Failed)
433 - run_dropmixnuc_ts_1405 (Subprocess aborted)
434 - validate_dropmixnuc_ts_1405 (Failed)
435 - run_dropmixnuc_ts_1406 (Subprocess aborted)
436 - validate_dropmixnuc_ts_1406 (Failed)
437 - run_dropmixnuc_ts_1407 (Subprocess aborted)
438 - validate_dropmixnuc_ts_1407 (Failed)
439 - run_dropmixnuc_ts_1408 (Subprocess aborted)
440 - validate_dropmixnuc_ts_1408 (Failed)
441 - run_dropmixnuc_ts_1409 (Subprocess aborted)
442 - validate_dropmixnuc_ts_1409 (Failed)
443 - run_dropmixnuc_ts_1410 (Subprocess aborted)
444 - validate_dropmixnuc_ts_1410 (Failed)
445 - run_dropmixnuc_ts_1411 (Subprocess aborted)
446 - validate_dropmixnuc_ts_1411 (Failed)
447 - run_dropmixnuc_ts_1412 (Subprocess aborted)
448 - validate_dropmixnuc_ts_1412 (Failed)
449 - run_dropmixnuc_ts_1413 (Subprocess aborted)
450 - validate_dropmixnuc_ts_1413 (Failed)
451 - run_dropmixnuc_ts_1414 (Subprocess aborted)
452 - validate_dropmixnuc_ts_1414 (Failed)
453 - run_dropmixnuc_ts_1415 (Subprocess aborted)
454 - validate_dropmixnuc_ts_1415 (Failed)
455 - run_dropmixnuc_ts_1416 (Subprocess aborted)
456 - validate_dropmixnuc_ts_1416 (Failed)
457 - run_dropmixnuc_ts_1417 (Subprocess aborted)
458 - validate_dropmixnuc_ts_1417 (Failed) CPU Fails - Release/Double48 - validate_mer07_veh02_nuc_mosaic_1box (Failed) nuc_tests_new
60 - validate_calcsize_compute_dry_volume (Failed)
62 - validate_stand_modal_aero_calcsize_sub (Failed)
74 - validate_ma_precpprod (Failed)
90 - validate_compute_massflux_small (Failed)
134 - validate_pcarbon_aging_1subarea (Failed)
202 - validate_calc_1_impact_rate_ts_0 (Failed)
204 - validate_modal_aero_bcscavcoef_get_ts_355 (Failed)
212 - validate_baseline_aero_model_wetdep_ts_379 (Failed)
214 - wetdep_compare_clddiag_output (Failed)
222 - wetdep_compare_wetdep_prevap_130_output (Failed)
224 - wetdep_compare_wetdep_prevap_230_output (Failed)
234 - wetdep_compare_wetdep_scavenging_true_output (Failed)
236 - wetdep_compare_wetdep_scavenging_false_output (Failed)
240 - wetdep_compare_rain_mix_ratio_output (Failed)
246 - wetdep_compare_wetdep_resusp_130_output (Failed)
248 - wetdep_compare_wetdep_resusp_230_output (Failed)
364 - validate_linmat_ts_355 (Failed)
366 - validate_nlnmat_ts_355 (Failed)
368 - validate_imp_prod_loss_ts_355 (Failed)
370 - validate_newton_raphson_iter_ts_355 (Failed)
408 - validate_maxsattype1_merged (Failed)
410 - validate_maxsattype2_merged (Failed)
498 - validate_lin_strat_chem_solve_ts_1415 (Failed)
500 - validate_lin_strat_sfcsink_ts_1415_multicol (Failed)
502 - validate_lin_strat_sfcsinkmulticol_merged (Failed)
512 - validate_chm_diags_ts_355 (Failed)
538 - validate_calc_het_rates_merged (Failed)
540 - validate_calc_precip_rescale_merged (Failed)
546 - validate_sethet_merged (Failed)
554 - validate_calc_sox_aqueous_ts_355_merged (Failed)
566 - validate_calc_diag_spec_ts_355 (Failed)
580 - validate_modal_aero_lw_ts_355 (Failed)
584 - validate_update_aod_spec_ts_355 (Failed)
586 - validate_aer_rad_props_lw_ts_355 (Failed)
588 - validate_aer_rad_props_sw_ts_355 (Failed)
590 - validate_volcanic_cmip_sw_ts_355 (Failed)
616 - validate_mam_soaexch_1subarea_ts_379 (Failed)
618 - validate_gas_aer_uptkrates_1box1gas_ts_379 (Failed)
620 - validate_mam_gasaerexch_1subarea_ts_379 (Failed)
622 - validate_vert_interp_ts_300 (Failed)
624 - validate_vert_interp_col_ts_300 (Failed) |
This appears to be working correctly now.
One thing to note, is that I currently have it configured to run on AMD MI250 OR MI210 GPUs. This is to get jobs picked up faster, since we have different nodes containing one or the other, and both belong to the same
AMD_GFX90A
architecture. However, if anyone would like to have better control over which card is used for a run, I can look into changing that.The caveat being that a handful of
dropmixnuc
-related tests are failing for the HIP build. However, all test did run, and the failures are below (the workflow output is enormous--which will be fixed by the changes to themam_x_validation
compare script 🎉).dropmixnuc failures
HIP Autotest Results - Release/Double