-
Notifications
You must be signed in to change notification settings - Fork 4
Implement remote table registration calls #28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement remote table registration calls #28
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the README, we should decide on whether or not we're capitalizing TileDB
README.md
Outdated
validation process to ensure that the table seen by the server is the same one that | ||
the user has requested the API to register. This is achieved by writing a "SecretToken" | ||
key to the tiledb array metadata. The tiledb file seen by the server must have a key | ||
matching the one provided in the registration call managed by omero2pandas. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point of the SecretToken is not really validation - it's to ensure the client attempting to register a file has read access to that file.
# path.as_uri() exists but mangles any spaces in the path! | ||
write_path = str(write_path) | ||
output_path = str(output_path) | ||
# Use a default chunk size if not set | ||
chunk_size = chunk_size or 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the default chunk_size
in the function definition sufficient here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functions above this one permit explicitly supplying chunk_size=None
which is interpreted as "determine automatically". For most modes this process involves inspecting the input table to figure out how many rows the API can process at once, but this is largely irrelevant for local TileDB conversion.
It'd be possible to resolve this in the calls to create_tiledb
by not passing chunk_size
at all if it was None
, but having to unpack this created more mess than just having a fallback here.
content = _check_response(result) | ||
ann_id = content["data"]["file_annotation"] | ||
LOGGER.info(f"Registered table successfully as FileAnnotation {ann_id}") | ||
return ann_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we return the annotation ID instead of the Original File ID? In most places we use Original File ID for table identification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use the annotation ID in omero table register
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also do this elsewhere in the library. This was chosen on the basis that getting an OriginalFile given the FileAnnotation is easier than figuring out the FileAnnotation starting from the OriginalFile.
Co-authored-by: Kevin Kozlowski <kevin@glencoesoftware.com>
Testing: First I attempted to run the following: client = omero.client('localhost', 4064)
df = pd.read_csv(csv_path)
res = omero2pandas.upload_table(
df, 'o2p Test Table',
local_path='/home/kevin/testdata/tables/o2ptest.tiledb',
omero_connector=client,
username='kevin', password='mypass')
print(res) And got
Seems supplying your own I then attempted the following: And got the following error:
This is almost certainly because my local install is not configured for https. Do we want to have an option to support this or are we comfortable only supporting https-enabled servers? I proceeded with my testing by replacing However, I then got the following error:
Not sure what's happening here - I'll look into it a bit more. There was also some unexpected naming behavior I encountered during testing. |
It looks like the |
This may be an issue with the csv being formatted for import with |
Confirmed this was the issue. I don't think most users would expect this tool to work with the |
Should be fixed now, handling the different client objects we may get is tricky so I'm glad I put a check in there 😄
Path parsing in Python without adding dependencies is more clunky than I'd like, so for now I'd hard-coded in https as a requirement. If we want to support http we could stick in an extra argument or something? Happy to hear ideas, especially those that don't involve urllib.
Nice catch, I missed that TileDBs are seen as directories by the os. For convenience I'd wanted to let the user supply folder names instead of full paths to dump the TileDB based on the table name instead. Pushed a tweak that should catch the TileDB nesting issue, but if that's not sufficient we can require absolute paths to .tiledb files as the inputs. Support for an
Fixed
👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error handling and functionality working as expected.
I'm getting that same df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
omero2pandas.upload_table(df, "Table", parent_type='Image', parent_id=36445,
local_path="table.tiledb",
remote_path="/data/marc/tables/table1.tiledb")
I've tried adding Auth here is through |
@mabruce It looks like you're trying to save to |
Yes, my assumption was that this would work like with |
Yes, you'd have to perform the steps individually to achieve that workflow. The wrapper is primarily intended for scenarios where storage is mounted on both the client and server machines. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Successfully registered a table "remotely" via symlinks to destination folder on server.
This PR adds the second stage of remote table registration intended for use with the omero-plus API.
Changes as follows:
omero2pandas.remote.create_tiledb
(wasregister_table
)omero-py
already requiresrequests
so no new dependencies were needed