Skip to content

Commit ef09991

Browse files
authored
Merge pull request #28 from DavidStirling/remote-registration
Implement remote table registration calls
2 parents 0f8bf7a + 7bfb9dd commit ef09991

File tree

4 files changed

+224
-37
lines changed

4 files changed

+224
-37
lines changed

README.md

Lines changed: 83 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -271,23 +271,22 @@ These should match the relevant column type. Mapped variables are substituted in
271271
A `variables` map usually isn't needed for simple queries. The basic condition string should automatically get converted to a meaningful type, but when this fails
272272
replacing tricky elements with a variable may help.
273273

274-
### Remote registration [Experimental]
274+
### Remote registration
275275

276276
For **OMERO Plus** installations which support TileDB as the OMERO.tables backend
277277
it is possible to register tables in-place in a similar manner to in-place image
278278
imports (otherwise table data is stored in the ManagedRepository).
279279

280+
This is a two-step process:
281+
1) Convert the dataframe into a TileDB file
282+
2) Register the remote converted table with OMERO
283+
280284
If you don't know what table backend your OMERO Plus server is using, you
281285
probably don't have this feature available. If you have access to the server
282286
machine you can check by running `omero config get omero.tables.module`,
283287
if the response is `omero_plus.run_tables_pytables_or_tiledb` then tiledb is
284288
available.
285289

286-
This feature is currently in active development. The current version of
287-
omero2pandas can export tables locally in TileDB format to be registered with
288-
OMERO using external tooling.
289-
290-
291290
For this mode to be available extra dependencies must also be installed as follows
292291

293292
```bash
@@ -305,8 +304,82 @@ db_path = omero2pandas.upload_table("/path/to/my_data.csv", "Name for table",
305304
```
306305

307306
Similar to regular table uploads, the input can be a dataframe in memory or a
308-
csv file on disk.
307+
csv file on disk. The input will be copied into a new TileDB database and
308+
registered to OMERO in-place.
309+
310+
To perform this kind of registration you need to provide the `local_path` argument
311+
to the standard `omero2pandas.upload_table` function (alongside required params for
312+
a "normal" upload e.g. server connection details). The local path is the file path
313+
where the tiledb file will be written to and registered to OMERO from.
314+
If you provide a directory instead the tiledb file will be named based on the `table_name` argument.
315+
316+
Naturally, the OMERO server will need to be able to access the resulting tiledb file
317+
in order to be registered. If the `local_path` is also visible from the server machine
318+
(e.g. you're running the upload on the server itself) then that's sufficient. Otherwise
319+
a `remote_path` argument is also available to tell the server where it should
320+
find the table. This is typically needed if the tiledb file ends up mounted at a
321+
different location between the local machine and the OMERO server.
322+
323+
For example, if registering from a Windows machine with a network drive to an OMERO server on Linux:
324+
```python
325+
omero2pandas.upload_table(
326+
df, "My Custom Table",
327+
local_path="J:\\data\\tables\\my_omero_table.tiledb",
328+
remote_path="/network_data/tables/my_omero_table.tiledb"
329+
)
330+
```
331+
332+
Effectively, `local_path` is where the current machine should write the data to, `remote_path`
333+
is where that file will be from the OMERO server's point of view. No remote path
334+
implies that both machines will see the file at the local path.
335+
336+
Note that when a table is registered remotely it is not part of the Managed Repository
337+
used to store OMERO data. This means that it becomes the user's responsibility to
338+
update the table object on the OMERO server if the file is moved/deleted.
339+
340+
#### How it works
341+
342+
Remote registration is a two-step process: conversion to TileDB format followed
343+
by registration using a HTTP API.
344+
345+
The TileDB conversion is handled automatically by omero2pandas. This largely involves
346+
creating a TileDB database from your dataframe and adding a few details to
347+
the converted table array metadata. Most native pandas column types are supported.
348+
349+
The actual registration involves telling the server that we'd like to register a
350+
remote table and providing it with the TileDB location. There is then a security
351+
check to ensure that the user is able to read the file that they've asked the API
352+
to register. This is achieved by asking the user to provide a "SecretToken"
353+
which must also be present in the the TileDB array metadata. omero2pandas will
354+
manage the creation of this token automatically. When using omero2pandas this
355+
process also implicitly confirms that the table seen by the server is the same
356+
one written by this library.
357+
358+
While it is possible to manually create and register tables without a `SecretToken`,
359+
this is strongly discouraged as other users could potentially register and access
360+
the same table without permission. With that in mind the implementation within
361+
omero2pandas could be considered as an example of "best practice" for handling
362+
remote table registration.
363+
364+
If the registration succeeds the tables API will create all the necessary OMERO
365+
objects and return a FileAnnotation ID just as if we'd uploaded the table normally.
366+
367+
#### Converting to TileDB format without registration
368+
369+
While the processes of tiledb conversion and remote registration are intended to
370+
be used together, it is possible to only convert a table to an OMERO Plus-compatible
371+
TileDB file. This can be achieved as follows:
372+
373+
```python
374+
import pandas as pd
375+
from omero2pandas.remote import create_tiledb
376+
df = pd.read_csv("/path/to/table.csv")
377+
secret_token = create_tiledb(df, "/path/to/output.tiledb")
378+
```
379+
380+
This will convert an input dataframe of csv file path into a TileDB file with
381+
appropriate metadata for remote registration.
309382

310-
A `remote_path` argument is also available. In future versions this will be
311-
used if the remote table path is different from the server's point of view (e.g.
312-
network drives are mapped at another location).
383+
For convenience the creation function will return the SecretToken needed to perform
384+
remote registration securely. That token could also be retrieved from the TileDB
385+
file metadata if necessary.

omero2pandas/__init__.py

Lines changed: 32 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,9 @@
2020
from omero2pandas.connect import OMEROConnection
2121
from omero2pandas.upload import create_table
2222
if find_spec("tiledb"):
23-
from omero2pandas.remote import register_table
23+
from omero2pandas.remote import create_remote_table
2424
else:
25-
register_table = None
25+
create_remote_table = None
2626

2727
LOGGER = logging.getLogger(__name__)
2828

@@ -48,7 +48,7 @@ def get_table_size(file_id=None, annotation_id=None, omero_connector=None,
4848
object_id, object_type = _validate_requested_object(
4949
file_id=file_id, annotation_id=annotation_id)
5050

51-
with OMEROConnection(server=server, username=username, password=password,
51+
with _get_connection(server=server, username=username, password=password,
5252
port=port, client=omero_connector) as connector:
5353
conn = connector.get_gateway()
5454
data_table = _get_table(conn, object_type, object_id)
@@ -78,7 +78,7 @@ def get_table_columns(file_id=None, annotation_id=None,
7878
object_id, object_type = _validate_requested_object(
7979
file_id=file_id, annotation_id=annotation_id)
8080

81-
with OMEROConnection(server=server, username=username, password=password,
81+
with _get_connection(server=server, username=username, password=password,
8282
port=port, client=omero_connector) as connector:
8383
conn = connector.get_gateway()
8484

@@ -124,7 +124,7 @@ def read_table(file_id=None, annotation_id=None, column_names=(), rows=None,
124124
object_id, object_type = _validate_requested_object(
125125
file_id=file_id, annotation_id=annotation_id)
126126

127-
with OMEROConnection(server=server, username=username, password=password,
127+
with _get_connection(server=server, username=username, password=password,
128128
port=port, client=omero_connector) as connector:
129129
conn = connector.get_gateway()
130130

@@ -186,7 +186,7 @@ def read_table(file_id=None, annotation_id=None, column_names=(), rows=None,
186186
def upload_table(source, table_name, parent_id=None, parent_type='Image',
187187
links=None, chunk_size=None, omero_connector=None,
188188
server=None, port=4064, username=None, password=None,
189-
local_path=None, remote_path=None):
189+
local_path=None, remote_path=None, prefix=""):
190190
"""
191191
Upload a pandas dataframe to a new OMERO table.
192192
For the connection, supply either an active client object or server
@@ -210,9 +210,16 @@ def upload_table(source, table_name, parent_id=None, parent_type='Image',
210210
register remotely
211211
:param remote_path: [TileDB only], mapping for local_path on the server
212212
(if different from local system)
213+
:param prefix: [TileDB only], API prefix for your OMERO server,
214+
relative to server URL. Use this if your OMERO server
215+
is not at the top-level URL of the server.
216+
e.g. for my.omero.server/custom_omero
217+
supply prefix="custom_omero"
213218
:param password: Password for server login
214219
:return: File Annotation ID of the new table
215220
"""
221+
if not table_name or not isinstance(table_name, str):
222+
raise ValueError(f"Invalid table name: '{table_name}'")
216223
# Coerce inputs to the links list input format
217224
links = links or []
218225
if (len(links) == 2 and
@@ -225,22 +232,25 @@ def upload_table(source, table_name, parent_id=None, parent_type='Image',
225232
if parent_id is not None:
226233
if (parent_type, parent_id) not in links:
227234
links.append((parent_type, parent_id))
228-
if not links and not local_path:
235+
if not links:
229236
raise ValueError("No OMERO objects to link the table to")
230237
elif not isinstance(links, Iterable):
231238
raise ValueError(f"Links should be an iterable list of "
232239
f"type/id pairs, not {type(links)}")
233-
with OMEROConnection(server=server, username=username, password=password,
240+
with _get_connection(server=server, username=username, password=password,
234241
port=port, client=omero_connector) as connector:
235-
conn = connector.get_gateway()
236-
conn.SERVICE_OPTS.setOmeroGroup('-1')
237242
if local_path or remote_path:
238-
if not register_table:
243+
if not create_remote_table:
239244
raise ValueError("Remote table support is not installed")
240-
ann_id = register_table(source, local_path,
241-
remote_path=remote_path,
242-
chunk_size=chunk_size)
245+
ann_id = create_remote_table(source, table_name, local_path,
246+
remote_path=remote_path,
247+
links=links,
248+
chunk_size=chunk_size,
249+
connector=connector,
250+
prefix=prefix)
243251
else:
252+
conn = connector.get_gateway()
253+
conn.SERVICE_OPTS.setOmeroGroup('-1')
244254
ann_id = create_table(source, table_name, links, conn, chunk_size)
245255
if ann_id is None:
246256
LOGGER.warning("Failed to create OMERO table")
@@ -288,7 +298,7 @@ def download_table(target_path, file_id=None, annotation_id=None,
288298
assert not os.path.exists(target_path), \
289299
f"Target file {target_path} already exists"
290300

291-
with OMEROConnection(server=server, username=username, password=password,
301+
with _get_connection(server=server, username=username, password=password,
292302
port=port, client=omero_connector) as connector:
293303
conn = connector.get_gateway()
294304

@@ -433,3 +443,10 @@ def connect_to_omero(client=None, server=None, port=4064,
433443
allow_token=allow_token)
434444
connector.connect(interactive=interactive, keep_alive=keep_alive)
435445
return connector
446+
447+
448+
def _get_connection(client=None, **kwargs):
449+
"""Create an OMEROConnection instance or use existing if supplied"""
450+
if client is not None and isinstance(client, OMEROConnection):
451+
return client
452+
return OMEROConnection(client=client, **kwargs)

omero2pandas/connect.py

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,8 +42,20 @@ def __init__(self, client=None, server=None, port=4064, username=None,
4242
self.session = None
4343
self.gateway = None
4444
self.temp_session = False
45-
self.server = server
46-
self.port = port
45+
if client is not None:
46+
# Infer details from client, fallback to params
47+
self.server = client.getProperty("omero.host")
48+
if server and self.server != server:
49+
LOGGER.warning(f"Host already set to '{self.server}' in "
50+
f"provided client, param will be ignored")
51+
elif server and not self.server:
52+
self.server = server
53+
self.port = client.getProperty("omero.port") or port
54+
if not self.server:
55+
LOGGER.error("Unknown host for provided client")
56+
else:
57+
self.server = server
58+
self.port = port
4759
self.username = username
4860
self.password = password
4961
self.session_key = session_key

0 commit comments

Comments
 (0)