Add 'workflow' type mapping and several other fixes #45

rsirvent · 2025-09-25T11:11:59Z

Fixes issues: #2 #3 #34 #36

Also fixed: if the RO-Crate has several publishers, map only the first one (Zenodo only accpets 1 publisher).
Unit tests added.

A COMPSs example with all new features tested directly at Zenodo can be found here: https://zenodo.org/records/17259350

…reator' and 'contributor'

…nodo

… Other fixes in name assignations

…uthors as contributors

@this

…DataCite out considering contributors. Solved bug at format_value when arrays where used combined with @@this

…do upload does not fail. Avoid the editor mapping

elichad · 2025-09-30T09:32:41Z

Thanks @rsirvent! I was on leave last week and I am at a conference this week, so I'm a bit slow to respond. I'll get to this when I can (probably next week).

rsirvent · 2025-10-01T14:33:30Z

Thanks @elichad, let me know if you need any clarification.

Once we solve this PR, do you think you could do a new release of the library? It will help me a lot to be able to recommend the installation of a specific version of the package to publish COMPSs experiments as Zenodo records.

…ngs or dicts. Still missing to fix merge_authors_and_creators

rsirvent · 2025-10-06T13:41:48Z

Sorry for the final minute updates, I wanted to add the support to have a mixed list of authors described as strings or dicts. I'm done now, I won't do any more commits for this branch.

elichad

Thanks @rsirvent , sorry I took a while to review again. I've added a few comments.

I'll be happy to make a new release once this PR is done :)

elichad · 2025-10-15T09:09:59Z

docs/all-mappings.md

-  Therefore, we assume the type to be `dataset`
+- RO-Crate does not have a field that describes the type of the entire directory
+- Therefore, we assume the type to be `dataset` by default
+- Only if the 'mainEntity' includes the type 'ComputationalWorkflow', DataCite type is set to 'workflow'


A complication: A Workflow Run Crate might also have this mainEntity, but it's less obvious if that should be a "workflow" or a "dataset" type in Datacite (I would lean toward "dataset" in that case since the focus of a WRROC might be more on the outputs)

Maybe it's better to check if the crate conforms to Workflow RO-Crate only? But harder to do with the existing mapping structure

I don't understand the issue. Could you provide maybe an example?

WRROC's Workflow Run Crate inherits requirements from Workflow RO-Crate, which states:

Main Workflow The Crate MUST contain a data entity of type ["File", "SoftwareSourceCode", "ComputationalWorkflow"] as the Main Workflow. The Crate MUST refer to the Main Workflow via mainEntity.

So, both must have a 'ComputationalWorkflow' as 'mainEntity'.

For example, https://doi.org/10.5281/zenodo.12987289 is a WRROC, but the focus (for humans) is on the data outputs - I don't think "workflow" would be an appropriate type for this record.

To me, if the mainEntity of the RDE is a ComputationalWorkflow, it semantically means that the package is mainly a workflow. So even in that case you show (where it is both a Dataset and a ComputationalWorkflow) I would select the type as a workflow to create awareness that the option exists.

Anyway, for sure the mainEntity can have many different types, and at the end it is a subjective choice of the user what to select. But I think it is a good choice to select by default the Workflow type for the record, and then let the user manually correct it if that is not the case.

Do you have something else in mind? Maybe include more restrictions to select the Workflow type for the record? I'm afraid if we impose many restrictions, then the Workflow type will rarely be selected.

docs/all-mappings.md

src/rocrate_inveniordm/mapping/crate_utils.py

elichad · 2025-10-15T11:52:44Z

src/rocrate_inveniordm/mapping/processing_functions.py

+    new_name["family_name"] = parts[-1] if len(parts) > 0 else ""
+    new_name["given_name"] = " ".join(parts[:-1]) if len(parts) > 1 else ""


I'm not a fan of this approach to setting the names (see ResearchObject/ro-crate#496 which I have been thinking about for a while because of this issue)

e.g. in this case "José María Fernández González" (family name "Fernández González") would not be split correctly. And "Chadwick, Eli" wouldn't follow the right format.

My (unsatisfying) suggestion would be to put the whole name into givenName and have an empty familyName, if possible (or vice versa).

Basically, I don't think this is a challenge that we can solve in this package alone, I think we need to change how we approach names in RO-Crate generally.

I did have the idea at some point to do an initial check through the authors and inform the user if the givenName/familyName weren't set, so they could have the option to set them manually before continuing

Well, the previous mapping was just doing the opposite to what you propose, since familyName is mandatory in Zenodo, so the whole string went to familyName and it was ugly anyway. As a Spaniard with two last names I'm aware that this may fail for some cases, but generally speaking I think it is a better approach than what was before (to put everything as family name). Besides, in the end, the record can be later fixed by hand.

In any case, as you may have seen, if the RO-Crate already includes 'familyName' and 'givenName' they are respected and not replaced with this nameProcessing function result.

Ok, on reflection I agree with you. I'm happy with this code then, and I've made issue #47 for making it clear to users that this process is happening. (not expecting you to address that in this PR, but you can if you like)

if the RO-Crate already includes 'familyName' and 'givenName' they are respected and not replaced with this nameProcessing function result.

Unfortunately I don't think many real-world crates include those terms at the moment - so I've proposed in ResearchObject/ro-crate#496 that we make this a more explicit best practice.

Maybe I could improve now the explanation in the all-mappings.md description by now? And in the future we can work on issue #47 . Printing a warning is very easy, but making the process more interactive will need more work.

src/rocrate_inveniordm/mapping/processing_functions.py

src/rocrate_inveniordm/mapping/converter.py

elichad · 2025-10-15T12:31:11Z

src/rocrate_inveniordm/mapping/converter.py

-    for path in paths:
-        print(f"PATH: {path}")
+    for i, path in enumerate(paths):
+        if mapping_key.startswith("publisher_mapping") and i > 0:


could you add a test for this case?

Would you prefer me to add a new integration test? Or just modify one of the existing ones to have several publishers?????

I mean a unit test, similar to these other ones for the publisher mapping: https://github.com/ResearchObject/ro-crate-inveniordm/blob/main/test/unit/test_publisher.py

Hmmm, I don't think I can check this with these kind of unit tests, since they check that the application of a mapping is correct. In that code, what really happens is that we avoid to apply the rule if it has been applied previously for a previous 'path'.

I'm still open to add two publishers to an integration test and check that only one is mapped to the DataCite output.

src/rocrate_inveniordm/mapping/converter.py

Remove debugging message. Co-authored-by: Eli Chadwick <eli.chadwick@manchester.ac.uk>

…rs_and_creators

…. Fixed unit tests to detect this

rsirvent and others added 11 commits May 22, 2025 17:53

Fixes in Zenodo record 'type' (supports 'workflow' now), 'author', 'c…

2413b69

…reator' and 'contributor'

Added 'family_name' to the DataCite json, since it is mandatory in Ze…

786994f

…nodo

JSON fix, black broke it

639b4ea

Merged 'author' and 'creator' at the RO-Crate to do a single mapping.…

2d0ede0

… Other fixes in name assignations

Fixed license Title specification

3445658

RO-Crate can have a list of publishers, but DataCite only accepts one

b52d119

Fixes when an 'author' is an entity but has no 'name' specified

e728c39

Merge branch 'ResearchObject:main' into workflow_type

4626f25

Tested licenses at Zenodo

d35c005

Fixed spdx compatibility with Zenodo to detect licenses

d6d16cb

Fixed ROR mapping, and added unit tests

3f887e4

rsirvent requested a review from elichad September 25, 2025 11:11

rsirvent self-assigned this Sep 25, 2025

rsirvent added 8 commits September 25, 2025 13:20

Style fixes and test_publisher.py fixes

88b3dc3

Line limit fix

bef9b68

Misplaced ifNonePresent. Fixes when type is not a list

bb130b9

Fixes in integration test result, including ror and avoiding to add a…

f88d7bf

…uthors as contributors

Fixes related to real-world-example test: fixed 'contributor', fixed …

0fcbf1c

…DataCite out considering contributors. Solved bug at format_value when arrays where used combined with @@this

Clean up test results

2cd44f2

flake8 fixes

2be4dfe

Fixes on utf-8-csv-crate test. Provide real orcid and ror so the Zeno…

96a33b5

…do upload does not fail. Avoid the editor mapping

Allow mapping arrays in RO-Crate's 'contentLocation'

8526384

rsirvent added 6 commits October 6, 2025 08:51

Extra feature: allow mixed list of authors and creators that are stri…

d1c3a87

…ngs or dicts. Still missing to fix merge_authors_and_creators

Added ifNonePresent rule for the resource_type property

48aebfe

Fixed merge_authors_and_creators to support strings and dicts

878aba4

Style fixes

f7af1ee

More fixes

105ccdb

Flak38 fix

068f8ac

elichad requested changes Oct 15, 2025

View reviewed changes

rsirvent and others added 6 commits October 20, 2025 16:55

Update src/rocrate_inveniordm/mapping/converter.py

0ce6aa9

Remove debugging message. Co-authored-by: Eli Chadwick <eli.chadwick@manchester.ac.uk>

Fixed conflict

94c2bca

Fixes in rightsProcessing unit test. Added unit tests for merge_autho…

972030d

…rs_and_creators

Black fixes

8b6b934

Fixes in rightsProcessing, only id or title is assigned, but not both…

a3e564d

…. Fixed unit tests to detect this

Black fixes

17cc496

		new_name["family_name"] = parts[-1] if len(parts) > 0 else ""
		new_name["given_name"] = " ".join(parts[:-1]) if len(parts) > 1 else ""

Add 'workflow' type mapping and several other fixes #45

Are you sure you want to change the base?

Add 'workflow' type mapping and several other fixes #45

Conversation

rsirvent commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elichad commented Sep 30, 2025

Uh oh!

rsirvent commented Oct 1, 2025

Uh oh!

rsirvent commented Oct 6, 2025

Uh oh!

elichad left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

elichad Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rsirvent Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rsirvent commented Sep 25, 2025 •

edited

Loading

elichad Oct 15, 2025 •

edited

Loading

rsirvent Oct 20, 2025 •

edited

Loading