-
-
Notifications
You must be signed in to change notification settings - Fork 241
Description
Asynchronous Successive Halving Algorithm (ASHA) is, as far as I can understand, a tweaked version of the Hyperband multi-fidelity intenstifier (aka [ early stopping | pruning | scheduler ] algorithm) that allows to make better use of the available computing power in the context of many cpu cores (improves cpu occupancy).
Here are a bunch of references:
- nice explanation in a book that I found helpful for myself to understand how it works
- authors' 2018 blog post introducing the method
- 2017 paper (rejected by ICLR 2018, accepted by MLSys 2020); arxiv preprint here
- another docs entry offering a pretty nice explanation
I think here is the original repo in which the authors published the reference implementation, but a bunch of other optimizers implement it, too nowadays (e.g., Oríon, optuna). I've personally used ray[tune]'s implementation before (docs overview, api reference), and, at some point, I somehow ended up stepping through the code and their implementation looked surprisingly simple (yes, I know... famous last words, but after reading research papers there's sometimes this shock when you read the code and see that the implementation could have been explained in much simpler words).
Motivation: I was checking out an optimization session running on 64 cores and noticed the cpu load drop below 50% after 12 hours. Not sure yet exactly how many trials it had gotten through at that point (after 22 hours it got through 12600 trials, and cpu load is under 40%).
Also, not sure if this is really the bottleneck that throttles cpu usage, I know it can be something else, too, like, e.g., many processes waiting for the surrogate model to retrain before starting a new trial or something (that's why I also mentioned the number of finished trials above) or perhaps other reasons, and it's kinda hard to debug, so I thought it'd be useful to be able to eliminate some potential causes, and this is the one I thought of first.
However, if you have some suggestions on improving parallelism, please let me know.