What is the ideal query initializer?

Dear Sir,

I am working on transformer models for multi-label image classification, and your paper titled "ML-Decoder: Scalable and Versatile Classification Head" attracted my attention.
However I couldn't understand one point in the article/code: In your code, random data is given from the query input and is set as non-learnable. What is the logic behind this? Generally, I saw that as queries some form of images/text is given (not randomly given). Is it reasonable to extract the relationship between image embeddings and random data with cross-attention? Could you tell me what the fundamental idea behind this is? I would be grateful if you could help me understand this issue.

Yours sincerely.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What is the ideal query initializer? #80

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

What is the ideal query initializer? #80

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions