Gracefully handling non-handler failures in Custom Resources

# TL;DR:

> Failures that occur outside of the handler of a Custom Resource result in long periods of inactivity when invoking CDK commands, they're also never raised as actual failures. This is a PITA.

So recently whilst working on HLS I was making some  🎸 _**Custom Resources**_ 🤘 

I'd wrapped my logic in beautiful `try/except` blocks and I'd handled the [CFN Callbacks](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-lambda-function-code-cfnresponsemodule.html) so that my Custom Resource called back to the mothership 👽 🛸 to tell CFN what was happening. This is used by Custom Resources to tell CloudFormation (and CDK) whether a resources `creation/update/delete` has been successful or not.

**BUT**

When I ran `cdk deploy`, my deployment was seemingly stuck on creating the Custom Resource _forever_. Upon further inspection, I could see in the logs of the handler that it was erroring out as soon as it was invoked - strange, this should be caught and Cloud Formation should be informed of the failure and begin the rollback logic.

So, I've got a Stack stuck deploying, my first thought? Delete the thing. So I deleted the stack... and it got stuck deleting the Custom Resource _forever_ 🙃 .

# u wot 🤨 

So, this was confusing at first but then I took a look at the error messages in CloudWatch. Let's say I had a `index.py` like:

```python
import cfnresponse
import my_cool_module

def handler(event, context):
    try:
        my_cool_module.do_something()
        cfnresponse.send_success() # This isn't real but you get the idea
    except my_cool_module.a_not_so_cool_exception as ex:
        print(ex)
        cfnresponse.send_failure(ex)
```

My importing of `my_cool_module` was erroring, not anything in my `handler` function. Because of this, I was never reaching any of my callback code, which meant that as far as CDK/CloudFormation were concerned, my Custom Resource was doing its thing and it'd hear from it eventually.

Because these callbacks are required for any CDK action, they'd result in infinitely (1+ hours) running `deploys/updates/destroys` which really wastes time.

You might ask, did you not test your code locally @ciaranevans?! - Well, I did. It worked beautifully because of how it was interpreting the import statement... not so correct when actually on its own in a Lambda 😭 

# So what should we do?

I suppose the easiest and _grossest_ way could be:

```python
import cfnresponse
try:
    import my_cool_module
except:
    cfnresponse.send_failure()

def handler(event, context):
    try:
        my_cool_module.do_something()
        cfnresponse.send_success() # This isn't real but you get the idea
    except my_cool_module.a_not_so_cool_exception as ex:
        print(ex)
        cfnresponse.send_failure(ex)
```

![my eyes!](https://media.giphy.com/media/TEdBirpbIVMRpfqON1/giphy.gif)

I don't like try/catch or conditional imports. So if someone has a better idea or knows of how we could gracefully handle this kind of issue, I'm all ears!

I imagine most languages will suffer this kind of situation, or at least have it a situation that's possible - is there a way for CDK to treat _any_ failure that's not explicitly handled as a failure for CloudFormation? 🤷 

- Mind dump over.

cc. @developmentseed/earthdata-infrastructure 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gracefully handling non-handler failures in Custom Resources #15

TL;DR:

u wot 🤨

So what should we do?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Gracefully handling non-handler failures in Custom Resources #15

Description

TL;DR:

u wot 🤨

So what should we do?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions