Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove managed memory from read_workerflow #82

Closed
wants to merge 2 commits into from

Conversation

atmyers
Copy link
Member

@atmyers atmyers commented Sep 13, 2024

No description provided.

@stevenhofmeyr
Copy link
Collaborator

These modifications result in a significant increases in the infection rates seen in GPU runs, but there is no change for CPU runs. In versions before this, the CPU and GPU results are almost identical for the same seed, but now they differ significantly.

@@ -736,16 +766,17 @@ void CensusData::assignTeachersAndWorkgroup (AgentContainer& pc /*!< Agent
auto Ncommunity = demo.Ncommunity;

auto np = soa.numParticles();
for (int ip = 0; ip < np; ++ip) {

amrex::ParallelForRNG( np,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is what causes the difference between the GPU and CPU results. Maybe there are updates in the loop that require atomics for the GPU

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right - I think that instead of running this the Gpu, for now we need to explicitly copy the data off the device, run this loop in serial, then copy the data back. Since this only happens at initialization I think this will be okay.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See update.

Copy link
Member Author

@atmyers atmyers Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note - with this change, AND with constructing the bins on the GPU, the code runs without crashing on development with amrex.the_arena_is_managed=0.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I run unmanaged with random initial cases, I still get a segfault here, on line 961 in CensusData.cpp:
auto inds = bins.permutationPtr();

Copy link
Collaborator

@stevenhofmeyr stevenhofmeyr Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to go away if we use the GPU binning policy rather than Serial.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Binning with BinPolicy::Serial still relies on managed memory, but if you change that to BinPolicy::GPU, it works for me - is that not what you're seeing?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's what i'm seeing. With GPU policy, it works. So we should keep the default managed memory for reproducibility. But I can now check the other branch to see if there is anywhere managed memory is being implicitly used.

@atmyers atmyers mentioned this pull request Sep 27, 2024
@atmyers
Copy link
Member Author

atmyers commented Sep 27, 2024

Superseded by #84

@atmyers atmyers closed this Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants