Skip to content

[Prototyping] Using rclone lsjson for all searches#530

Closed
JoeZiminski wants to merge 57 commits intomainfrom
playing_search_function
Closed

[Prototyping] Using rclone lsjson for all searches#530
JoeZiminski wants to merge 57 commits intomainfrom
playing_search_function

Conversation

@JoeZiminski
Copy link
Member

@JoeZiminski JoeZiminski commented Jun 21, 2025

superseded by #551

This PR is for prototyping the new way of searching files / folders introduced in #407. The main commit here is this one, all the others are from PR #208 which this was branched from for extended testing.

Generally this way is better because we can have one function for all use cases. I tried for a long time to play with RClones `--includeorfilter`` arguments, but I could not get reliable behaviour across folders and files.

For example, if at a search path we have:

sub-001/
  some_files.txt
sub-002/
other_folder/
sub-001.txt

then --include with any search string (e.g. sub-*) would include all folders no matter what. The only want to avoid this was to suffix the search with a /i..esub-*/`. So the search string would be different between files and folders. I may have missed something here, but I think it reflects that rlcone is more built for handling files directly, and the search functions etc. behave more naturally when transferring files but not folderes.

The solution here is to just grab everything from lsjson and then parse it in Python. A benefit is it is more flexible and interpretable. A downside is it might be slower. However, as we are just performing 1-folder level of search (i.e. non-recursive) it should never be too bad as there are unlikely to be tens of thousands of files / folders in a single directory.

Currently the search_for_folders() is set up for testing, but essentially it could be something as simple as:

      config_name = cfg.get_rclone_config_name(cfg["connection_method"]) if local_or_central == "central" else None
      
      all_folder_names, all_filenames = search_gdrive_or_aws_for_folders(  # this func would be renamed
                search_path,
                search_prefix,
                config_name ,
                return_full_path,
        )

@cs7-shrey I think I can make a PR to switch SSH and local filesystem to this method (after a bit more work on it). Then you can use it directory from your AWS/Google drive PR?

@JoeZiminski JoeZiminski changed the title Prototyping using rclone lsjson for all searches [Prototyping] Using rclone lsjson for all searches Jun 21, 2025
@JoeZiminski JoeZiminski deleted the playing_search_function branch November 18, 2025 22:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant