Skip to content

AWS credentials as environment variables not working as expected #191

@rabernat

Description

@rabernat

I'm trying to load private data from S3 in a fused UDF, and I want to make sure I'm doing it the "right" way.

I'm trying to follow these instructions: https://docs.fused.io/basics/utilities/#environment-variables
In one UDF, I've got this:

env_vars = """
AWS_ACCESS_KEY_ID=AK...
AWS_SECRET_ACCESS_KEY=Gt...
"""

# Path to your .env file
env_file_path = '/mnt/cache/.env'

@fused.udf
def udf(bbox=None, n=10):
    # Writing the environment variables to the .env file
    with open(env_file_path, 'w') as file:
        file.write(env_vars)

In the second UDF I've got this.

@fused.udf
def udf():
    import os

    import boto3
    from dotenv import load_dotenv

    # Load environment variable
    env_file_path = '/mnt/cache/.env'
    load_dotenv(env_file_path, override=True)
    
    # these are being set correctly
    assert os.environ['AWS_ACCESS_KEY_ID'] == 'AK...'
    assert os.environ['AWS_SECRET_ACCESS_KEY'] == 'Gt...'

    # doesn't work
    # botocore.exceptions.ClientError: An error occurred (InvalidToken) when calling the GetObject operation: The provided token is malformed or otherwise invalid.
    # s3 credentials not detected correctly from environment
    # s3 = boto3.client('s3')

    # does work if I explicitly pass the credentials
    s3 = boto3.client(
        's3',
        aws_access_key_id=os.environ['AWS_ACCESS_KEY_ID'],
        aws_secret_access_key=os.environ['AWS_SECRET_ACCESS_KEY']
    )

    bucket="arraylake-earthmover-production"
    key="6462e90c27af040cabc066e8/chunks/0081af97634c03fc1c3fcd16b1f3c196558c15c096674f5a0052bf25479d0e8b.00000000000000000000000000000000"
    obj = s3.get_object(Bucket=bucket, Key=key)
    print(obj)

In most normal Python environments, boto3 will automatically get the credentials from the environment variables without having to pass them explicitly (see https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html#environment-variables). However, in the fused UDF, this is not working for some reason, and if I don't pass the credentials explicitly, I get the "The provided token is malformed or otherwise invalid" error.

This is obviously not a huge problem. The workaround--explicitly passing the credentials--is easy enough. But I thought I would open this issue to try to understand better what is going on here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions