Skip to content

Conversation

fefossa
Copy link

@fefossa fefossa commented Jun 26, 2023

Description

I'm proposing these changes to Collate and SingleCells for those to accept different numbers of cell compartments or even just one compartment.
Related to issue #272

This is still a work in progress.

So far, (1) I've added the option on collate.py to accept 3 flags, for no-cells, no-cytoplasm, or no-nuclei; (2) the checking on assert_linking_cols_complete only happens when there's more than one compartment.

Proposed changes/discussion

From some discussion with @bethac07, we saw two options:

  1. The first option is just to add more documentation to SingleCells (which I did), and expect the user to build their dictionary and provide it as compartment_linking_cols. If that's how you'd like to do it, I think the changes I already did would be enough to merge.

  2. OR build the dictionary for compartment_linking_cols based on the compartments given:

Now: the compartment_linking_cols is defined as the default_linking_cols if no dictionary is specified by the user. Also, for SingleCells to work right now with only one compartment, you must give a dictionary linking the compartment to itself, for example: { "cells": {"cells": "ObjectNumber"}}, which works, but I don't know if that's the right way to do it.

So, to build the dictionary:

  • Have some sort of template dictionary like {"compA": {"parent":{"compB":"object_Parent_compB"}, "child":{"compA":"ObjectNumber"}}}, where it will take the user compartments and create this dictionary. I just don't know which is the best way to handle this, because how do we know which is the child-parent from objects other than nuclei, cell, and cytoplasm, if the user is not giving that info?
  • To deal only with nuclei, cytoplasm, and cells objects combinations (if the user wants only one or two of those compartments), have a dictionary like par_child_dict = {"nuclei": {"cells", "cytoplasm"}, "cells":{"cytoplasm"}} from where the relationship can be inferred and the compartment_linking_cols is built based on par_child_dict.
  • On merge_single_cells, something needs to change on where the dataframe sc_df is being merged to work with only one compartment without the need to provide a dictionary that links the compartment to itself.

What is the nature of your change?

  • Bug fix (fixes an issue).
  • Enhancement (adds functionality).
  • Breaking change (fix or feature that would cause existing functionality to not work as expected).
  • This change requires a documentation update.

Checklist

  • I have read the CONTRIBUTING.md guidelines.
  • My code follows the style guidelines of this project.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • New and existing unit tests pass locally with my changes.
  • I have added tests that prove my fix is effective or that my feature works.
  • I have deleted all non-relevant text in this pull request template.

@d33bs
Copy link
Member

d33bs commented Aug 18, 2023

Hello, this is a courtesy notice that Pcytominer's master branch has been renamed to main as part of changes merged in #303. While changes here should be seamless, GitHub is unable to change your local git environment and offers the following guidance when it comes to this. Thank you for your understanding and please don't hesitate to reach out with any questions or concerns.

image

@d33bs
Copy link
Member

d33bs commented May 5, 2025

Hi @fefossa 👋 — just checking in to see if you’re still planning to continue work on this PR. It's a great contribution and we’d love to help move it forward if you're still interested! If you need any help or context, feel free to ask.

If you’ve moved on or think someone else should pick it up, that’s totally fine too — just let us know what you’d prefer so we can figure out the next steps. Thanks again for your work on this! 🙌

@fefossa
Copy link
Author

fefossa commented May 28, 2025

Hi @d33bs, sorry for the 2-year delay on this! 😅

I'm actually revisiting this now for some new analysis I'm doing, which only involves one compartment. I got stuck on a part of it, but I’ve just updated my fork and will be working on it again. If you all can wait a bit longer, I’ll likely have some questions soon!

@d33bs
Copy link
Member

d33bs commented May 28, 2025

Thanks @fefossa and no worries! Please let us know how we may assist and if you have any questions along the way.

@fefossa
Copy link
Author

fefossa commented Jul 30, 2025

Hi @d33bs,

I ran into a few bumps but things are now in a state where you can take a look.
I have some projects where I work only with cells objects (due to live cell imaging), and I tested those datasets, looks good so far!

Here's the logic I followed:

  • When default_linking_cols is used, the code now calls this new function that checks if only one compartment is present, in which case the linking cols are left empty. If two compartments are given, it uses this alternative dictionary logic to link them pairwise;
  • Assert_linking_cols_complete only runs when there is more than one compartment.
  • Merge_single_cells only merges left/right when more than one compartment is available.
  • User still have to provide of course the list of compartments to SingleCells.

So far, I've focused mainly on merge_single_cells and haven't checked all the other functions in cells.py for edge cases with single-compartment workflows.
If you notice anything that could break, just let me know!

Thanks!

Copy link
Member

@d33bs d33bs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job @fefossa ! I took a look at the work so far to provide some input / feedback based on what I saw. I recommend making changes so the tests may pass (either to the new production code or the tests themselves, depending on what you feel should take place). After tests are passing I'd love to take another look.

Please don't hesitate to let me know if you have any questions or would like more specific feedback in any areas. Thanks again for your efforts!

right_on=[*self.merge_cols, right_link_col],
suffixes=merge_suffix,
)
if len(self.compartments) == 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This seems like it'd be great for efficiency in these scenarios.

Comment on lines +37 to +39
no_nuclei=True,
no_cells=True,
no_cytoplasm=True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a type hint for each of these to enable gradual type specification.

Comment on lines +37 to +39
no_nuclei=True,
no_cells=True,
no_cytoplasm=True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider setting a more flexible parameter here which can help specify the compartments we'll work with below, e.g. compartments: List[str] = ["Nuclei", "Cells", "Cytoplasm"].

Comment on lines +85 to +87
filter_compartments = [no_nuclei, no_cells, no_cytoplasm]
to_filter = ["Nuclei", "Cells", "Cytoplasm"]
compartments = [to_filter[i] for i in [j for j in range(len(filter_compartments)) if filter_compartments[j]]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider simplifying this to use a compartments argument set from the function parameter. This opens up the door to additional compartments and follows similar patterns in the project (e.g. as in SingleCells).


include_list = []
for eachcompartment in compartments:
include = "--include */" + eachcompartment + ".csv"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to lowercase the compartments? I noticed that we do that below for SingleCells and wondered if we should follow a similar convention (if possible).

return linking_cols


def get_alternative_linking_cols():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding to the tests for these new functions to provide code coverage.

...
}
"""
alternative_linking_cols = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using a global variable instead of a function for this (keeping in mind globals are typically capitalized to help signify their scope).

return alternative_linking_cols


def get_linking_cols_from_compartments(compartments):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a type hint for the compartments parameter.

Suggested change
def get_linking_cols_from_compartments(compartments):
def get_linking_cols_from_compartments(compartments: List[str]):

if subdict:
linking_cols[comp] = subdict
return linking_cols
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it has the opportunity to capture a state of len(compartments) == 0 and len(compartments) > 2. Double checking, do we need additional logic to help branch differently here?

Comment on lines +31 to +35
Returns a dictionary defining alternative linking columns between compartments
for single cell morphological profiles.
The returned dictionary specifies, for each compartment, the linking columns
used to connect to other compartments (typically as used in CellProfiler output).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we should distinguish how this is different from get_default_linking_cols using explicit use cases. Should we think about replacing the existing default instead of introducing a new arrangement?

@gwaybio
Copy link
Member

gwaybio commented Aug 5, 2025

thanks for this important contribution @fefossa ! I'm excited for Dave and you to connect to get this functionality merged in :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants