-
Notifications
You must be signed in to change notification settings - Fork 19
Description
Right now there's nothing that indicates if a book is actively being bound.
We tried basic lock files in the past but that actually ended up with a lot of required human intervention to remove lock files when prefect flows were cancelled or the computer was restarted. The issues back then were pretty much solved by implementing a work flow with a concurrency limit so that only one instance of make_book can be running per platform at a time.
The place this shows up now is if make_book is still running / scheduling but there's a whole bunch of failed books and we're running so-data-package platform autofix
. A failed book is set to rebind and the rebind starts, but then make_book picks it up. make_book will first see that there are files on-disk so it will add the book to the failed_list, then it will delete all the files on-disk to try the failed books again.
This could be fixed with job-db that has a ~4 hour timeout set per book. The current "fix" is to just turn off make_book while running many fixes