Skip to content

run_sbc (and run_tarp) run time #1329

@humnaawan

Description

@humnaawan

Hello, is there any documentation on how to effectively use num_workers and use_batched_sampling? I am running into very long run times with run_sbc and I am not sure whats going wrong. Here's how I'm calling the function:

    ranks, dap_samples = run_sbc(thetas=thetas, xs=xs,
                                            posterior=posterior,
                                            num_posterior_samples=nsamples,
                                            show_progress_bar=True,
                                            num_workers=ncpus
                                            )

I have 1000 simulations and I set nsamples to be 1000. When I toggle the above between use_batched_sampling=False and use_batched_sampling=True (default) in the function call, the former at least gives me a progress update although it still doesn't finish.

Looking through the code, I think the bottleneck might be max_sampling_batch_size which is set to 10,000? The parameter is not exposed though (at least when you build a posterior via inference.build_posterior). I did set simulation_batch_size in simulate_for_sbi (to be int(nsims/ncpus)) but I dont think that gets communicated to the DirectPosterior object.

I run into the same issue with run_tarp which doesnt have the use_batched_sampling exposed (although #1321 should enable that once its merged).

I use cpus-per-task=35 in my sbatch script and confirm that 35 cpus are indeed available. The run_sbc call seems to be stuck at 1/1000 even after 5hours when using the default option for use_batched_sampling, and barely passes 100/1000 after 12hrs (even though the time estimates on the progress bar estimate otherwise) when I set use_batched_sampling=False.

I'd really appreciate some help. I am starting to unpack run_sbc since I can't think of anything else but thought I'd inquire here in case I'm missing something. My understanding is that my call never makes it past get_posterior_samples_on_batch (which calls posterior.sample_batched).

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions