Request for Guidance on Decoding ScreenAI Model Outputs

Dear Kye Gomez and the ScreenAI Team,

I am currently working on a project using your ScreenAI model implementation from [your GitHub repository](https://github.com/kyegomez/ScreenAI). I have successfully run the model and obtained output tensors, but I am having difficulty decoding these outputs into structured, human-readable UI annotations as shown in the examples from the ScreenAI paper and blog.

Specifically, I am looking for guidance on:

The correct way to map the model's output tensor (token indices or probability distributions) to structured UI elements (such as TEXT, BUTTON, LIST_ITEM, PICTOGRAM, etc.) with their associated text and bounding box coordinates.

Whether there is a vocabulary file, label schema, or decoding utility available that was used during model training to convert token indices into meaningful UI annotations.

Any example code or best practices for post-processing model outputs into the structured format demonstrated in your published examples.

I have reviewed the documentation and codebase, but could not find an example or utility for this decoding step. If there are internal tools, scripts, or additional documentation that could help with this, I would greatly appreciate your guidance or a pointer to relevant resources.

Thank you for your time and for making this valuable resource available to the community. I look forward to your response.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Request for Guidance on Decoding ScreenAI Model Outputs #20

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Request for Guidance on Decoding ScreenAI Model Outputs #20

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions