[Feature Request] ASHA

Asynchronous Successive Halving Algorithm (ASHA) is, as far as I can understand, a tweaked version of the Hyperband multi-fidelity intenstifier (aka [ early stopping | pruning | scheduler ] algorithm) that allows to make better use of the available computing power in the context of many cpu cores (improves cpu occupancy).

Here are a bunch of references:
- nice [explanation](https://d2l.ai/chapter_hyperparameter-optimization/sh-async.html) in a book that I found helpful for myself to understand how it works
- authors' 2018 [blog post](https://blog.ml.cmu.edu/2018/12/12/massively-parallel-hyperparameter-optimization/) introducing the method
- 2017 [paper](https://openreview.net/revisions?id=S1Y7OOlRZ) (rejected by ICLR 2018, accepted by MLSys 2020); arxiv preprint [here](https://arxiv.org/abs/1810.05934)
- another docs entry offering a pretty nice [explanation](https://docs.determined.ai/latest/model-dev-guide/hyperparameter/search-methods/hp-adaptive-asha.html)

I think [here](https://github.com/liamcli/darts_asha) is the original repo in which the authors published the reference implementation, but a bunch of other optimizers implement it, too nowadays (e.g., [Oríon](https://orion.readthedocs.io/en/stable/user/algorithms.html#asha), [optuna](https://optuna.readthedocs.io/en/stable/reference/generated/optuna.pruners.SuccessiveHalvingPruner.html)). I've personally used `ray[tune]`'s implementation before (docs [overview](https://docs.ray.io/en/latest/tune/api/schedulers.html#asha-tune-schedulers-ashascheduler), [api reference](https://docs.ray.io/en/latest/tune/api/doc/ray.tune.schedulers.AsyncHyperBandScheduler.html#ray.tune.schedulers.AsyncHyperBandScheduler)), and, at some point, I somehow ended up stepping through the [code](https://github.com/ray-project/ray/blob/f1316e8af8c8930cd062f78b1d46f7933f744bf8/python/ray/tune/schedulers/async_hyperband.py#L19) and their implementation looked surprisingly simple (yes, I know... famous last words, but after reading research papers there's sometimes this shock when you read the code and see that the implementation could have been explained in much simpler words).

**Motivation:** I was checking out an optimization session running on 64 cores and noticed the cpu load drop below 50% after 12 hours. Not sure yet exactly how many trials it had gotten through at that point (after 22 hours it got through 12600 trials, and cpu load is under 40%).

Also, not sure if this is really the bottleneck that throttles cpu usage, I know it can be something else, too, like, e.g., many processes waiting for the surrogate model to retrain before starting a new trial or something (that's why I also mentioned the number of finished trials above) or perhaps other reasons, and it's kinda hard to debug, so I thought it'd be useful to be able to eliminate some potential causes, and this is the one I thought of first.

However, if you have some suggestions on improving parallelism, please let me know.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature Request] ASHA #1169

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature Request] ASHA #1169

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions