Skip to content

Conversation

atmyers
Copy link
Member

@atmyers atmyers commented Aug 18, 2025

Turn on OMP threading in addParticles.

The proposed changes:

  • fix a bug or incorrect behavior in AMReX
  • add new capabilities to AMReX
  • changes answers in the test suite to more than roundoff level
  • are likely to significantly affect the results of downstream AMReX users
  • include documentation in the code and/or rst files, if appropriate

@WeiqunZhang
Copy link
Member

I think The omp parallel region could include the for loop over levels.

@AlexanderSinn
Copy link
Member

Won't this have threading issues in DefineAndReturnParticleTile(lev, mfi.index(), mfi.LocalTileIndex()); if the std::map entry for the index does not exist yet?

@atmyers
Copy link
Member Author

atmyers commented Aug 18, 2025

Won't this have threading issues in DefineAndReturnParticleTile(lev, mfi.index(), mfi.LocalTileIndex()); if the std::map entry for the index does not exist yet?

Good point - we still need to do that part in serial.

for (int lev = 0; lev < other.numLevels(); ++lev)
{
const auto& plevel_other = other.GetParticles(lev);
for(MFIter mfi = other.MakeMFIter(lev); mfi.isValid(); ++mfi)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not related to OpeMP. But we could use this version MFIter MakeMFIter (int lev, const MFItInfo& info) const to disable gpu device sync here and above. If we disable it here, we need to add Gpu::streamSynchronizeAll() after the loop over levels.

Do you want to do this in a follow-up or this PR?

This comment was marked as off-topic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try this in a follow-on PR.

@ax3l
Copy link
Member

ax3l commented Aug 20, 2025

I did a quick test in ImpactX with 10M particles and looking at init, e.g., with examples/fodo/run_fodo.py (all using 1 MPI rank and thus 1 AMReX box).

I see a 2x speedup for ParticleContainer::addParticles with 4 OMP threads now on my laptop over AMReX development 🎉

---

ParticleContainer::addParticles                              6     0.3628     0.3628     0.3628   1.76%


Old
---

ParticleContainer::addParticles                              6     0.7238     0.7238     0.7238   3.89%

@atmyers atmyers merged commit e056972 into AMReX-Codes:development Aug 20, 2025
75 checks passed
@ax3l
Copy link
Member

ax3l commented Aug 20, 2025

Tested scalability a bit more for a 10M particle test, I think this looks reasonable (tapers off quickly, no benefit from threads in the old one):

New
---

1 OMP thread
ParticleContainer::addParticles                              1     0.2056     0.2056     0.2056   6.29%

2 OMP threads
ParticleContainer::addParticles                              1     0.1549     0.1549     0.1549   6.62%

4 OMP threads
ParticleContainer::addParticles                              1     0.1423     0.1423     0.1423   8.55%

6 OMP threads
ParticleContainer::addParticles                              1     0.1392     0.1392     0.1392   9.99%

8 OMP threads
ParticleContainer::addParticles                              1     0.1415     0.1415     0.1415  10.98%



Old
---

1 OMP thread
ParticleContainer::addParticles                              1     0.2191     0.2191     0.2191   6.46%

2 OMP threads
ParticleContainer::addParticles                              1     0.2214     0.2214     0.2214   8.72%

4 OMP threads
ParticleContainer::addParticles                              1     0.2282     0.2282     0.2282  12.65%

6 OMP threads
ParticleContainer::addParticles                              1     0.2249     0.2249     0.2249  13.78%

8 OMP threads
ParticleContainer::addParticles                              1     0.2243     0.2243     0.2243  15.34%

@ax3l
Copy link
Member

ax3l commented Aug 20, 2025

Correctness is fine, too: ImpactX tests all pass locally.

cmake --fresh -S . -B build -DImpactX_PYTHON=ON -DpyAMReX_IPO=OFF -DImpactX_PYTHON_IPO=OFF -DImpactX_FFT=ON -DImpactX_amrex_src=$HOME/src/amrex

cmake --build build -j 6

ctest --test-dir build --output-on-failure -j 6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants