Skip to content

Add Google Drive and AWS S3 as a Remote Storage option #503

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 65 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
83ac1f4
google drive setup via python api first draft
Mar 25, 2025
e37861e
enable google drive config setup via tui
Mar 25, 2025
01503f7
minor compatibility and ui changes
Mar 26, 2025
7380bac
protectedclient secret input box
Mar 26, 2025
965e016
google drive connection setup via TUI
Mar 26, 2025
6698cb0
add aws as remote storage via python api first draft
Mar 28, 2025
60b77aa
add: logging and connection check for aws s3
Mar 29, 2025
d9755da
update: type checking for aws regions
Mar 29, 2025
f3947a7
add: save aws configs via TUI
Mar 29, 2025
b328af9
add: setup aws connection via TUI
Mar 30, 2025
d2a4d83
feat: setup google drive on machines with no browser
Mar 31, 2025
d9ae864
fix: minor bug
Mar 31, 2025
965bb17
fix: logical error
Mar 31, 2025
377cea7
add: logging for google drive connections
Apr 1, 2025
409d448
refactor: move google drive client secret to be entered at runtime wh…
May 30, 2025
0b33b86
refactor: aws_regions.py; provide aws secret access key at runtime
May 31, 2025
0733f51
add: docstrings to gdrive.py
May 31, 2025
023ada3
add: root_folder_id config to google drive; some refactor
Jun 1, 2025
e869986
refactor: radiobuttons switch in configs.py
Jun 1, 2025
9edbc8f
edit: minor changes to SetupAwsScreen for setting up aws connection
Jun 2, 2025
150f2ea
refactor: SetupGdriveScreen and handle errors
Jun 2, 2025
70659cd
add: some tooltips for google drive configs
Jun 3, 2025
985e921
fix: vanishing central path, radio button order, minor refactor
Jun 4, 2025
d7f13d4
fix: minor bug
Jun 4, 2025
f7807d1
refactor: single button for setup connection
Jun 4, 2025
be8f6b1
add: backwards compatibility to configs while load from config file
Jun 5, 2025
0b7483b
edit: raise error on bucket not present
Jun 5, 2025
2579827
rename: aws region config key
Jun 9, 2025
0a1ca87
rename: connection method from aws_s3 to aws
Jun 9, 2025
8bb7c28
add: utility function to remove duplicate code
Jun 9, 2025
8beaa42
add: docstrings to setup gdrive dialog
Jun 9, 2025
e53984b
update: config dict inplace change for backward compatibility, use ex…
Jun 19, 2025
772c3c1
add: docstrings to setup connection functions; remove: aws region class
Jun 20, 2025
91f2454
add: docstrings to setup widgets function; use backwards compatibility
Jun 20, 2025
ee88875
add: docstrings to rclone function, change arugment order
Jun 20, 2025
c0a7eca
minor changes
Jun 20, 2025
eb3f098
refactor: centralize the get secret function
Jun 20, 2025
ad8e9b1
extend centralized function for sensitive information input to ssh co…
Jun 20, 2025
d46fd01
convert stage from float to int
Jun 20, 2025
730c2c5
add: function for getting aws bucket name
Jun 20, 2025
3c0e9cd
refactor: connection methods list
Jun 20, 2025
8e175b5
move: widgets not match saved configs
Jun 20, 2025
93acc67
Merge remote-tracking branch 'upstream/main' into add_gdrive_aws_remote
JoeZiminski Jun 23, 2025
2c4c941
Fix linting.
JoeZiminski Jun 23, 2025
3ae5ea6
fix: overly long container in configs tab
Jun 26, 2025
bc073e6
add: first version of service account file setup method
Jun 27, 2025
5b6282f
remove: utility functions and docstrings in python api for config tok…
Jun 27, 2025
da421c8
fix: css
Jun 28, 2025
c1cf1ab
update: setup gdrive screen to use service account file and refactor …
Jun 28, 2025
7ccc6de
rename: existing functions from config token -> service account
Jun 28, 2025
6c91bb2
edit: docstrings in accordance with new connection setup
Jun 28, 2025
31bad6d
update: function for checking successful rclone connection
Jun 28, 2025
6f7a53e
edit: setup aws screen css
Jun 28, 2025
42d2ef0
Fix assert in CreateFolders.
JoeZiminski Jul 2, 2025
d0f948c
Allow no central_path for aws or gdrive and refactor configs_content.
JoeZiminski Jul 2, 2025
af60167
Fix tests.
JoeZiminski Jul 3, 2025
887587a
Merge remote-tracking branch 'upstream/main' into add_gdrive_aws_remote
JoeZiminski Jul 3, 2025
05e9fe8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 3, 2025
590872b
Fix linting.
JoeZiminski Jul 3, 2025
24f93a8
Merge branch 'add_gdrive_aws_remote' of https://github.com/cs7-shrey/…
JoeZiminski Jul 3, 2025
8bfa77d
Add tests for new base path behaviour.
JoeZiminski Jul 3, 2025
9fb0601
add: call rclone with popen; refactor: google drive connection to use…
Jul 4, 2025
ac54a8f
fix: failing typehint on python 3.9
Jul 4, 2025
6276c0e
fix: another failing typehint
Jul 4, 2025
ed5cfd2
fix: failing monkeypatch
Jul 5, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions datashuttle/configs/aws_regions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
from typing import List, Literal, get_args

# -----------------------------------------------------------------------------
# AWS regions
# -----------------------------------------------------------------------------

AwsRegion = Literal[
"us-east-1",
"us-east-2",
"us-west-1",
"us-west-2",
"ca-central-1",
"eu-west-1",
"eu-west-2",
"eu-west-3",
"eu-north-1",
"eu-south-1",
"eu-central-1",
"ap-southeast-1",
"ap-southeast-2",
"ap-northeast-1",
"ap-northeast-2",
"ap-northeast-3",
"ap-south-1",
"ap-east-1",
"sa-east-1",
"il-central-1",
"me-south-1",
"af-south-1",
"cn-north-1",
"cn-northwest-1",
"us-gov-east-1",
"us-gov-west-1",
]


def get_aws_regions_list() -> List[str]:
"""Return AWS S3 bucket regions as a list."""
return list(get_args(AwsRegion))
41 changes: 39 additions & 2 deletions datashuttle/configs/canonical_configs.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
Literal,
Optional,
Union,
get_args,
)

if TYPE_CHECKING:
Expand All @@ -25,18 +26,30 @@

import typeguard

from datashuttle.configs.aws_regions import AwsRegion
from datashuttle.utils import folders, utils
from datashuttle.utils.custom_exceptions import ConfigError

connection_methods = Literal["ssh", "local_filesystem", "gdrive", "aws"]


def get_connection_methods_list() -> List[str]:
"""Return the canonical connection methods."""
return list(get_args(connection_methods))


def get_canonical_configs() -> dict:
"""Return the only permitted types for DataShuttle config values."""
canonical_configs = {
"local_path": Union[str, Path],
"central_path": Optional[Union[str, Path]],
"connection_method": Optional[Literal["ssh", "local_filesystem"]],
"connection_method": Optional[connection_methods],
"central_host_id": Optional[str],
"central_host_username": Optional[str],
"gdrive_client_id": Optional[str],
"gdrive_root_folder_id": Optional[str],
"aws_access_key_id": Optional[str],
"aws_region": Optional[AwsRegion],
}

return canonical_configs
Expand Down Expand Up @@ -101,7 +114,8 @@ def check_dict_values_raise_on_fail(config_dict: Configs) -> None:

check_config_types(config_dict)

raise_on_bad_local_only_project_configs(config_dict)
if config_dict["connection_method"] not in ["aws", "gdrive"]:
raise_on_bad_local_only_project_configs(config_dict)

if list(config_dict.keys()) != list(canonical_dict.keys()):
utils.log_and_raise_error(
Expand Down Expand Up @@ -130,6 +144,29 @@ def check_dict_values_raise_on_fail(config_dict: Configs) -> None:
ConfigError,
)

# Check gdrive settings
elif config_dict["connection_method"] == "gdrive":
if not config_dict["gdrive_root_folder_id"]:
utils.log_and_raise_error(
"'gdrive_root_folder_id' is required if 'connection_method' "
"is 'gdrive'.",
ConfigError,
)

if not config_dict["gdrive_client_id"]:
utils.log_and_message(
"`gdrive_client_id` not found in config. default rlcone client will be used (slower)."
)

# Check AWS settings
elif config_dict["connection_method"] == "aws" and (
not config_dict["aws_access_key_id"] or not config_dict["aws_region"]
):
utils.log_and_raise_error(
"Both aws_access_key_id and aws_region must be present for AWS connection.",
ConfigError,
)

# Initialise the local project folder
utils.print_message_to_user(
f"Making project folder at: {config_dict['local_path']}"
Expand Down
53 changes: 45 additions & 8 deletions datashuttle/configs/config_class.py
Original file line number Diff line number Diff line change
Expand Up @@ -120,16 +120,48 @@ def dump_to_file(self) -> None:
def load_from_file(self) -> None:
"""Load a config dict saved at .yaml file.

Note this will not automatically check the configs are valid,
this requires calling self.check_dict_values_raise_on_fail().
This will do a minimal backwards compatibility check and
add config keys to ensure backwards compatibility with new connection
methods added to Datashuttle.

However, this will not automatically check the configs are valid, this
requires calling self.check_dict_values_raise_on_fail()
"""
with open(self.file_path) as config_file:
config_dict = yaml.full_load(config_file)

load_configs.convert_str_and_pathlib_paths(config_dict, "str_to_path")

self.update_config_for_backward_compatability_if_required(config_dict)

self.data = config_dict

def update_config_for_backward_compatability_if_required(
self, config_dict: Dict
):
"""Add keys introduced in later Datashuttle versions if they are missing."""
canonical_config_keys_to_add = [
"gdrive_client_id",
"gdrive_root_folder_id",
"aws_access_key_id",
"aws_region",
]

# All keys shall be missing for a backwards compatibility update
if not (
all(
key in config_dict.keys()
for key in canonical_config_keys_to_add
)
):
assert not any(
key in config_dict.keys()
for key in canonical_config_keys_to_add
)

for key in canonical_config_keys_to_add:
config_dict[key] = None

# -------------------------------------------------------------------------
# Utils
# -------------------------------------------------------------------------
Expand Down Expand Up @@ -186,6 +218,10 @@ def get_base_folder(
) -> Path:
"""Return the full base path for the given top-level folder.

If the connection method is `aws` or `drive`, the base path
might be `None` (e.g. if the Google Drive is the project folder).
In this case, the base path is ignored.

Parameters
----------
base
Expand All @@ -202,7 +238,12 @@ def get_base_folder(
if base == "local":
base_folder = self["local_path"] / top_level_folder
elif base == "central":
base_folder = self["central_path"] / top_level_folder
if self["central_path"] is None:
# This path should never be triggered for local-only
assert self["connection_method"] in ["aws", "gdrive"]
base_folder = Path(top_level_folder)
else:
base_folder = self["central_path"] / top_level_folder

return base_folder

Expand Down Expand Up @@ -299,8 +340,4 @@ def is_local_project(self):
A project is 'local-only' if it has no `central_path` and `connection_method`.
It can be used to make folders and validate, but not for transfer.
"""
canonical_configs.raise_on_bad_local_only_project_configs(self)

params_are_none = canonical_configs.local_only_configs_are_none(self)

return all(params_are_none)
return self["connection_method"] is None
Loading
Loading