-
-
Notifications
You must be signed in to change notification settings - Fork 33
Description
Dear Kye Gomez and the ScreenAI Team,
I am currently working on a project using your ScreenAI model implementation from your GitHub repository. I have successfully run the model and obtained output tensors, but I am having difficulty decoding these outputs into structured, human-readable UI annotations as shown in the examples from the ScreenAI paper and blog.
Specifically, I am looking for guidance on:
The correct way to map the model's output tensor (token indices or probability distributions) to structured UI elements (such as TEXT, BUTTON, LIST_ITEM, PICTOGRAM, etc.) with their associated text and bounding box coordinates.
Whether there is a vocabulary file, label schema, or decoding utility available that was used during model training to convert token indices into meaningful UI annotations.
Any example code or best practices for post-processing model outputs into the structured format demonstrated in your published examples.
I have reviewed the documentation and codebase, but could not find an example or utility for this decoding step. If there are internal tools, scripts, or additional documentation that could help with this, I would greatly appreciate your guidance or a pointer to relevant resources.
Thank you for your time and for making this valuable resource available to the community. I look forward to your response.