Skip to content

Conversation

@anwai98
Copy link
Collaborator

@anwai98 anwai98 commented Apr 7, 2025

WIP!

@anwai98 anwai98 changed the base branch from master to dev April 7, 2025 11:37
@anwai98 anwai98 marked this pull request as ready for review April 7, 2025 12:51
@anwai98
Copy link
Collaborator Author

anwai98 commented Apr 7, 2025

Hi @constantinpape,

This should be good for a first review from my side!

The intended idea here is simple: 1) compute loss over logit masks (to make this possible, I convert the labels to match the logits dimensions) and 2) work with iterative prompts for logits.

Let me know how it looks!

EDIT: I am gonna run a training for this and see how the results look for LIVECell!

self.log_image_interval = trainer.log_image_interval

def add_image(self, x, y, samples, name, step):
def add_image(self, x, y, name, step):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this never used / not used anymore?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was used previously, I removed it now because the samples are logits now. Would you recommend to keep those predictions by upsampling them?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly were the samples? Examples for mask predictions? It would maybe be good to keep them, we can discuss later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, samples were mask predictions!

@anwai98 anwai98 marked this pull request as draft April 8, 2025 12:45
@anwai98
Copy link
Collaborator Author

anwai98 commented Apr 8, 2025

Here's a detailed description on the PR (we see when we can come back to this in future)

Core Idea: Computing loss over predicted logits masks and downsampled ground-truth masks.

The current implementation works, however it does not bring us significant memory advantages (below mentioned are quick try-outs on LIVECell):

# # for default settings
# loss over logits: 47.89 GB
# loss over masks: 49.42 GB

# # for resource-efficient setting (i.e. n_objects=5 for 'vit_b')
# loss over logits: 24.01 GB
# loss over masks: 24.38 GB

Base automatically changed from dev to master August 12, 2025 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants