-
Notifications
You must be signed in to change notification settings - Fork 40
Collate and SingleCells to accept less/different compartments #301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Hello, this is a courtesy notice that Pcytominer's |
Hi @fefossa 👋 — just checking in to see if you’re still planning to continue work on this PR. It's a great contribution and we’d love to help move it forward if you're still interested! If you need any help or context, feel free to ask. If you’ve moved on or think someone else should pick it up, that’s totally fine too — just let us know what you’d prefer so we can figure out the next steps. Thanks again for your work on this! 🙌 |
Hi @d33bs, sorry for the 2-year delay on this! 😅 I'm actually revisiting this now for some new analysis I'm doing, which only involves one compartment. I got stuck on a part of it, but I’ve just updated my fork and will be working on it again. If you all can wait a bit longer, I’ll likely have some questions soon! |
Thanks @fefossa and no worries! Please let us know how we may assist and if you have any questions along the way. |
Hi @d33bs, I ran into a few bumps but things are now in a state where you can take a look. Here's the logic I followed:
So far, I've focused mainly on merge_single_cells and haven't checked all the other functions in cells.py for edge cases with single-compartment workflows. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job @fefossa ! I took a look at the work so far to provide some input / feedback based on what I saw. I recommend making changes so the tests may pass (either to the new production code or the tests themselves, depending on what you feel should take place). After tests are passing I'd love to take another look.
Please don't hesitate to let me know if you have any questions or would like more specific feedback in any areas. Thanks again for your efforts!
right_on=[*self.merge_cols, right_link_col], | ||
suffixes=merge_suffix, | ||
) | ||
if len(self.compartments) == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! This seems like it'd be great for efficiency in these scenarios.
no_nuclei=True, | ||
no_cells=True, | ||
no_cytoplasm=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a type hint for each of these to enable gradual type specification.
no_nuclei=True, | ||
no_cells=True, | ||
no_cytoplasm=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider setting a more flexible parameter here which can help specify the compartments we'll work with below, e.g. compartments: List[str] = ["Nuclei", "Cells", "Cytoplasm"]
.
filter_compartments = [no_nuclei, no_cells, no_cytoplasm] | ||
to_filter = ["Nuclei", "Cells", "Cytoplasm"] | ||
compartments = [to_filter[i] for i in [j for j in range(len(filter_compartments)) if filter_compartments[j]]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider simplifying this to use a compartments
argument set from the function parameter. This opens up the door to additional compartments and follows similar patterns in the project (e.g. as in SingleCells
).
|
||
include_list = [] | ||
for eachcompartment in compartments: | ||
include = "--include */" + eachcompartment + ".csv" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to lowercase the compartments? I noticed that we do that below for SingleCells
and wondered if we should follow a similar convention (if possible).
return linking_cols | ||
|
||
|
||
def get_alternative_linking_cols(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding to the tests for these new functions to provide code coverage.
... | ||
} | ||
""" | ||
alternative_linking_cols = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using a global variable instead of a function for this (keeping in mind globals are typically capitalized to help signify their scope).
return alternative_linking_cols | ||
|
||
|
||
def get_linking_cols_from_compartments(compartments): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a type hint for the compartments
parameter.
def get_linking_cols_from_compartments(compartments): | |
def get_linking_cols_from_compartments(compartments: List[str]): |
if subdict: | ||
linking_cols[comp] = subdict | ||
return linking_cols | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it has the opportunity to capture a state of len(compartments) == 0
and len(compartments) > 2
. Double checking, do we need additional logic to help branch differently here?
Returns a dictionary defining alternative linking columns between compartments | ||
for single cell morphological profiles. | ||
The returned dictionary specifies, for each compartment, the linking columns | ||
used to connect to other compartments (typically as used in CellProfiler output). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we should distinguish how this is different from get_default_linking_cols
using explicit use cases. Should we think about replacing the existing default instead of introducing a new arrangement?
thanks for this important contribution @fefossa ! I'm excited for Dave and you to connect to get this functionality merged in :) |
Description
I'm proposing these changes to Collate and SingleCells for those to accept different numbers of cell compartments or even just one compartment.
Related to issue #272
This is still a work in progress.
So far, (1) I've added the option on collate.py to accept 3 flags, for no-cells, no-cytoplasm, or no-nuclei; (2) the checking on
assert_linking_cols_complete
only happens when there's more than one compartment.Proposed changes/discussion
From some discussion with @bethac07, we saw two options:
The first option is just to add more documentation to SingleCells (which I did), and expect the user to build their dictionary and provide it as
compartment_linking_cols
. If that's how you'd like to do it, I think the changes I already did would be enough to merge.OR build the dictionary for
compartment_linking_cols
based on the compartments given:Now: the
compartment_linking_cols
is defined as thedefault_linking_cols
if no dictionary is specified by the user. Also, for SingleCells to work right now with only one compartment, you must give a dictionary linking the compartment to itself, for example:{ "cells": {"cells": "ObjectNumber"}}
, which works, but I don't know if that's the right way to do it.So, to build the dictionary:
{"compA": {"parent":{"compB":"object_Parent_compB"}, "child":{"compA":"ObjectNumber"}}}
, where it will take the user compartments and create this dictionary. I just don't know which is the best way to handle this, because how do we know which is the child-parent from objects other than nuclei, cell, and cytoplasm, if the user is not giving that info?par_child_dict = {"nuclei": {"cells", "cytoplasm"}, "cells":{"cytoplasm"}}
from where the relationship can be inferred and thecompartment_linking_cols
is built based on par_child_dict.sc_df
is being merged to work with only one compartment without the need to provide a dictionary that links the compartment to itself.What is the nature of your change?
Checklist