-
Notifications
You must be signed in to change notification settings - Fork 63
Description
Hi all,
As the title suggests, I am getting the infamous "Task stuck on RUNNABLE" line when I try to run this simple flow:
from metaflow import FlowSpec, step
import os
global_value = 5
class ProcessDemoFlow(FlowSpec):
@step
def start(self):
global global_value
global_value = 9
print('process ID is', os.getpid())
print('global_value is', global_value)
self.next(self.end)
@step
def end(self):
print('process ID is', os.getpid())
print('global_value is', global_value)
if __name__ == '__main__':
ProcessDemoFlow()
For provisioning the infrastructure, I used the minimal Terraform AWS template on the README. However, I had to make a few adjustments to remove some errors (I could not have been the only one...):
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "5.5.3"
...
In the vpc
module I had to modify the version number to the latest version: 5.5.3.
module "metaflow" {
source = "outerbounds/metaflow/aws"
version = "0.12.0"
resource_prefix = local.resource_prefix
resource_suffix = local.resource_suffix
enable_step_functions = false
subnet1_id = module.vpc.public_subnets[0]
subnet2_id = module.vpc.public_subnets[1]
vpc_cidr_blocks = [module.vpc.vpc_cidr_block]
vpc_id = module.vpc.vpc_id
with_public_ip = true
db_engine_version = 16
db_instance_type = "db.t3.small"
tags = {
"managedBy" = "terraform"
}
}
In the metaflow
module I had to change module.vpc.vpc_cidr_blocks
to [module.vpc.vpc_cidr_block]
, because I was getting an error saying that module.vpc.vpc_cidr_blocks
didn't exist. I confirmed that no such variable exists in the vpc
module (no idea why this is in the template...). I also had to update the version number to 0.12.0. Finally, I got an error stating that the combination of Postgres, a db_engine_version of "11", and a db_instance_type of "db.t2.small" (default values) is not allowed by AWS. So I updated the engine_version to 16 and db_instance_type to "db.t3.small".
I have done basic checks regarding looking at Batch, ECS, and EC2, and everything appears to be connected, valid, etc. What could be going wrong? I keep seeing stuff about my compute environment might be too limited for the task, but given that this is a very basic task I don't think that is the issue. Are one of the modifications I made to the minimal template wrong?